Skip to content

Commit c7b6c50

Browse files
committed
Describe 'surrogateescape' in the documentation.
Also, improve some docstring descriptions of the 'errors' parameter. Closes python#14015.
1 parent 893f2ff commit c7b6c50

File tree

5 files changed

+41
-15
lines changed

5 files changed

+41
-15
lines changed

Doc/library/codecs.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,11 @@ It defines the following functions:
7878
reference (for encoding only)
7979
* ``'backslashreplace'``: replace with backslashed escape sequences (for
8080
encoding only)
81-
* ``'surrogateescape'``: replace with surrogate U+DCxx, see :pep:`383`
81+
* ``'surrogateescape'``: on decoding, replace with code points in the Unicode
82+
Private Use Area ranging from U+DC80 to U+DCFF. These private code
83+
points will then be turned back into the same bytes when the
84+
``surrogateescape`` error handler is used when encoding the data.
85+
(See :pep:`383` for more.)
8286

8387
as well as any other error handling name defined via :func:`register_error`.
8488

Doc/library/functions.rst

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -895,16 +895,36 @@ are always available. They are listed here in alphabetical order.
895895
the list of supported encodings.
896896

897897
*errors* is an optional string that specifies how encoding and decoding
898-
errors are to be handled--this cannot be used in binary mode. Pass
899-
``'strict'`` to raise a :exc:`ValueError` exception if there is an encoding
900-
error (the default of ``None`` has the same effect), or pass ``'ignore'`` to
901-
ignore errors. (Note that ignoring encoding errors can lead to data loss.)
902-
``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted
903-
where there is malformed data. When writing, ``'xmlcharrefreplace'``
904-
(replace with the appropriate XML character reference) or
905-
``'backslashreplace'`` (replace with backslashed escape sequences) can be
906-
used. Any other error handling name that has been registered with
907-
:func:`codecs.register_error` is also valid.
898+
errors are to be handled--this cannot be used in binary mode.
899+
A variety of standard error handlers are available, though any
900+
error handling name that has been registered with
901+
:func:`codecs.register_error` is also valid. The standard names
902+
are:
903+
904+
* ``'strict'`` to raise a :exc:`ValueError` exception if there is
905+
an encoding error. The default value of ``None`` has the same
906+
effect.
907+
908+
* ``'ignore'`` ignores errors. Note that ignoring encoding errors
909+
can lead to data loss.
910+
911+
* ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted
912+
where there is malformed data.
913+
914+
* ``'surrogateescape'`` will represent any incorrect bytes as code
915+
points in the Unicode Private Use Area ranging from U+DC80 to
916+
U+DCFF. These private code points will then be turned back into
917+
the same bytes when the ``surrogateescape`` error handler is used
918+
when writing data. This is useful for processing files in an
919+
unknown encoding.
920+
921+
* ``'xmlcharrefreplace'`` is only supported when writing to a file.
922+
Characters not supported by the encoding are replaced with the
923+
appropriate XML character reference ``&#nnn;``.
924+
925+
* ``'backslashreplace'`` (also only supported when writing)
926+
replaces unsupported characters with Python's backslashed escape
927+
sequences.
908928

909929
.. index::
910930
single: universal newlines; open() built-in function

Lib/codecs.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ class Codec:
105105
Python will use the official U+FFFD REPLACEMENT
106106
CHARACTER for the builtin Unicode codecs on
107107
decoding and '?' on encoding.
108+
'surrogateescape' - replace with private codepoints U+DCnn.
108109
'xmlcharrefreplace' - Replace with the appropriate XML
109110
character reference (only for encoding).
110111
'backslashreplace' - Replace with backslashed escape sequences

Modules/_io/_iomodule.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -168,8 +168,8 @@ PyDoc_STRVAR(open_doc,
168168
"'strict' to raise a ValueError exception if there is an encoding error\n"
169169
"(the default of None has the same effect), or pass 'ignore' to ignore\n"
170170
"errors. (Note that ignoring encoding errors can lead to data loss.)\n"
171-
"See the documentation for codecs.register for a list of the permitted\n"
172-
"encoding error strings.\n"
171+
"See the documentation for codecs.register or run 'help(codecs.Codec)'\n"
172+
"for a list of the permitted encoding error strings.\n"
173173
"\n"
174174
"newline controls how universal newlines works (it only applies to text\n"
175175
"mode). It can be None, '', '\\n', '\\r', and '\\r\\n'. It works as\n"

Modules/_io/textio.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -642,8 +642,9 @@ PyDoc_STRVAR(textiowrapper_doc,
642642
"encoding gives the name of the encoding that the stream will be\n"
643643
"decoded or encoded with. It defaults to locale.getpreferredencoding(False).\n"
644644
"\n"
645-
"errors determines the strictness of encoding and decoding (see the\n"
646-
"codecs.register) and defaults to \"strict\".\n"
645+
"errors determines the strictness of encoding and decoding (see\n"
646+
"help(codecs.Codec) or the documentation for codecs.register) and\n"
647+
"defaults to \"strict\".\n"
647648
"\n"
648649
"newline controls how line endings are handled. It can be None, '',\n"
649650
"'\\n', '\\r', and '\\r\\n'. It works as follows:\n"

0 commit comments

Comments
 (0)