Skip to content

Commit 559e5d7

Browse files
committed
#2630: Implement PEP 3138.
The repr() of a string now contains printable Unicode characters unescaped. The new ascii() builtin can be used to get a repr() with only ASCII characters in it. PEP and patch were written by Atsuo Ishimoto.
1 parent ea6d58d commit 559e5d7

25 files changed

+1329
-1032
lines changed

Doc/c-api/object.rst

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,8 +116,18 @@ Object Protocol
116116

117117
Compute a string representation of object *o*. Returns the string
118118
representation on success, *NULL* on failure. This is the equivalent of the
119-
Python expression ``repr(o)``. Called by the :func:`repr` built-in function and
120-
by reverse quotes.
119+
Python expression ``repr(o)``. Called by the :func:`repr` built-in function.
120+
121+
122+
.. cfunction:: PyObject* PyObject_ASCII(PyObject *o)
123+
124+
.. index:: builtin: ascii
125+
126+
As :cfunc:`PyObject_Repr`, compute a string representation of object *o*, but
127+
escape the non-ASCII characters in the string returned by
128+
:cfunc:`PyObject_Repr` with ``\x``, ``\u`` or ``\U`` escapes. This generates
129+
a string similar to that returned by :cfunc:`PyObject_Repr` in Python 2.
130+
Called by the :func:`ascii` built-in function.
121131

122132

123133
.. cfunction:: PyObject* PyObject_Str(PyObject *o)

Doc/c-api/unicode.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,18 @@ the Python configuration.
144144

145145
Return 1 or 0 depending on whether *ch* is an alphanumeric character.
146146

147+
148+
.. cfunction:: int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
149+
150+
Return 1 or 0 depending on whether *ch* is a printable character.
151+
Nonprintable characters are those characters defined in the Unicode character
152+
database as "Other" or "Separator", excepting the ASCII space (0x20) which is
153+
considered printable. (Note that printable characters in this context are
154+
those which should not be escaped when :func:`repr` is invoked on a string.
155+
It has no bearing on the handling of strings written to :data:`sys.stdout` or
156+
:data:`sys.stderr`.)
157+
158+
147159
These APIs can be used for fast direct character conversions:
148160

149161

@@ -266,6 +278,9 @@ APIs:
266278
| | | of what the platform's |
267279
| | | ``printf`` yields. |
268280
+-------------------+---------------------+--------------------------------+
281+
| :attr:`%A` | PyObject\* | The result of calling |
282+
| | | :func:`ascii`. |
283+
+-------------------+---------------------+--------------------------------+
269284
| :attr:`%U` | PyObject\* | A unicode object. |
270285
+-------------------+---------------------+--------------------------------+
271286
| :attr:`%V` | PyObject\*, char \* | A unicode object (which may be |

Doc/library/functions.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,14 @@ are always available. They are listed here in alphabetical order.
9191
return False
9292

9393

94+
.. function:: ascii(object)
95+
96+
As :func:`repr`, return a string containing a printable representation of an
97+
object, but escape the non-ASCII characters in the string returned by
98+
:func:`repr` using ``\x``, ``\u`` or ``\U`` escapes. This generates a string
99+
similar to that returned by :func:`repr` in Python 2.
100+
101+
94102
.. function:: bin(x)
95103

96104
Convert an integer number to a binary string. The result is a valid Python

Doc/library/stdtypes.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -774,6 +774,17 @@ functions based on regular expressions.
774774
least one cased character, false otherwise.
775775

776776

777+
.. method:: str.isprintable()
778+
779+
Return true if all characters in the string are printable or the string is
780+
empty, false otherwise. Nonprintable characters are those characters defined
781+
in the Unicode character database as "Other" or "Separator", excepting the
782+
ASCII space (0x20) which is considered printable. (Note that printable
783+
characters in this context are those which should not be escaped when
784+
:func:`repr` is invoked on a string. It has no bearing on the handling of
785+
strings written to :data:`sys.stdout` or :data:`sys.stderr`.)
786+
787+
777788
.. method:: str.isspace()
778789

779790
Return true if there are only whitespace characters in the string and there is

Doc/library/string.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -229,8 +229,9 @@ as a string, overriding its own definition of formatting. By converting the
229229
value to a string before calling :meth:`__format__`, the normal formatting logic
230230
is bypassed.
231231

232-
Two conversion flags are currently supported: ``'!s'`` which calls :func:`str`
233-
on the value, and ``'!r'`` which calls :func:`repr`.
232+
Three conversion flags are currently supported: ``'!s'`` which calls :func:`str`
233+
on the value, ``'!r'`` which calls :func:`repr` and ``'!a'`` which calls
234+
:func:`ascii`.
234235

235236
Some examples::
236237

Doc/using/cmdline.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -425,6 +425,9 @@ These environment variables influence Python's behavior.
425425
``encodingname:errorhandler``. The ``:errorhandler`` part is optional and
426426
has the same meaning as in :func:`str.encode`.
427427

428+
For stderr, the ``:errorhandler`` part is ignored; the handler will always be
429+
``'backslashreplace'``.
430+
428431

429432
.. envvar:: PYTHONNOUSERSITE
430433

Include/object.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -425,6 +425,7 @@ PyAPI_FUNC(void) _Py_BreakPoint(void);
425425
PyAPI_FUNC(void) _PyObject_Dump(PyObject *);
426426
PyAPI_FUNC(PyObject *) PyObject_Repr(PyObject *);
427427
PyAPI_FUNC(PyObject *) PyObject_Str(PyObject *);
428+
PyAPI_FUNC(PyObject *) PyObject_ASCII(PyObject *);
428429
PyAPI_FUNC(int) PyObject_Compare(PyObject *, PyObject *);
429430
PyAPI_FUNC(PyObject *) PyObject_RichCompare(PyObject *, PyObject *, int);
430431
PyAPI_FUNC(int) PyObject_RichCompareBool(PyObject *, PyObject *, int);

Include/unicodeobject.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,7 @@ typedef PY_UNICODE_TYPE Py_UNICODE;
220220
# define _PyUnicode_IsLinebreak _PyUnicodeUCS2_IsLinebreak
221221
# define _PyUnicode_IsLowercase _PyUnicodeUCS2_IsLowercase
222222
# define _PyUnicode_IsNumeric _PyUnicodeUCS2_IsNumeric
223+
# define _PyUnicode_IsPrintable _PyUnicodeUCS2_IsPrintable
223224
# define _PyUnicode_IsTitlecase _PyUnicodeUCS2_IsTitlecase
224225
# define _PyUnicode_IsXidStart _PyUnicodeUCS2_IsXidStart
225226
# define _PyUnicode_IsXidContinue _PyUnicodeUCS2_IsXidContinue
@@ -317,6 +318,7 @@ typedef PY_UNICODE_TYPE Py_UNICODE;
317318
# define _PyUnicode_IsLinebreak _PyUnicodeUCS4_IsLinebreak
318319
# define _PyUnicode_IsLowercase _PyUnicodeUCS4_IsLowercase
319320
# define _PyUnicode_IsNumeric _PyUnicodeUCS4_IsNumeric
321+
# define _PyUnicode_IsPrintable _PyUnicodeUCS4_IsPrintable
320322
# define _PyUnicode_IsTitlecase _PyUnicodeUCS4_IsTitlecase
321323
# define _PyUnicode_IsXidStart _PyUnicodeUCS4_IsXidStart
322324
# define _PyUnicode_IsXidContinue _PyUnicodeUCS4_IsXidContinue
@@ -357,6 +359,7 @@ typedef PY_UNICODE_TYPE Py_UNICODE;
357359
#define Py_UNICODE_ISDECIMAL(ch) _PyUnicode_IsDecimalDigit(ch)
358360
#define Py_UNICODE_ISDIGIT(ch) _PyUnicode_IsDigit(ch)
359361
#define Py_UNICODE_ISNUMERIC(ch) _PyUnicode_IsNumeric(ch)
362+
#define Py_UNICODE_ISPRINTABLE(ch) _PyUnicode_IsPrintable(ch)
360363

361364
#define Py_UNICODE_TODECIMAL(ch) _PyUnicode_ToDecimalDigit(ch)
362365
#define Py_UNICODE_TODIGIT(ch) _PyUnicode_ToDigit(ch)
@@ -387,6 +390,7 @@ extern const unsigned char _Py_ascii_whitespace[];
387390
#define Py_UNICODE_ISDECIMAL(ch) _PyUnicode_IsDecimalDigit(ch)
388391
#define Py_UNICODE_ISDIGIT(ch) _PyUnicode_IsDigit(ch)
389392
#define Py_UNICODE_ISNUMERIC(ch) _PyUnicode_IsNumeric(ch)
393+
#define Py_UNICODE_ISPRINTABLE(ch) _PyUnicode_IsPrintable(ch)
390394

391395
#define Py_UNICODE_TODECIMAL(ch) _PyUnicode_ToDecimalDigit(ch)
392396
#define Py_UNICODE_TODIGIT(ch) _PyUnicode_ToDigit(ch)
@@ -1533,6 +1537,10 @@ PyAPI_FUNC(int) _PyUnicode_IsNumeric(
15331537
Py_UNICODE ch /* Unicode character */
15341538
);
15351539

1540+
PyAPI_FUNC(int) _PyUnicode_IsPrintable(
1541+
Py_UNICODE ch /* Unicode character */
1542+
);
1543+
15361544
PyAPI_FUNC(int) _PyUnicode_IsAlpha(
15371545
Py_UNICODE ch /* Unicode character */
15381546
);

Lib/doctest.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1440,6 +1440,12 @@ class OutputChecker:
14401440
and returns true if they match; and `output_difference`, which
14411441
returns a string describing the differences between two outputs.
14421442
"""
1443+
def _toAscii(self, s):
1444+
"""
1445+
Convert string to hex-escaped ASCII string.
1446+
"""
1447+
return str(s.encode('ASCII', 'backslashreplace'), "ASCII")
1448+
14431449
def check_output(self, want, got, optionflags):
14441450
"""
14451451
Return True iff the actual output from an example (`got`)
@@ -1450,6 +1456,15 @@ def check_output(self, want, got, optionflags):
14501456
documentation for `TestRunner` for more information about
14511457
option flags.
14521458
"""
1459+
1460+
# If `want` contains hex-escaped character such as "\u1234",
1461+
# then `want` is a string of six characters(e.g. [\,u,1,2,3,4]).
1462+
# On the other hand, `got` could be an another sequence of
1463+
# characters such as [\u1234], so `want` and `got` should
1464+
# be folded to hex-escaped ASCII string to compare.
1465+
got = self._toAscii(got)
1466+
want = self._toAscii(want)
1467+
14531468
# Handle the common case first, for efficiency:
14541469
# if they're string-identical, always return true.
14551470
if got == want:

Lib/test/test_array.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -768,7 +768,7 @@ def test_unicode(self):
768768
a = array.array('u', s)
769769
self.assertEqual(
770770
repr(a),
771-
"array('u', '\\x00=\"\\'a\\\\b\\x80\\xff\\x00\\x01\\u1234')")
771+
"array('u', '\\x00=\"\\'a\\\\b\\x80\xff\\x00\\x01\u1234')")
772772

773773
self.assertRaises(TypeError, a.fromunicode)
774774

0 commit comments

Comments
 (0)