Skip to content

Commit f6a271a

Browse files
committed
Issue python#18395: Rename _Py_char2wchar() to :c:func:Py_DecodeLocale, rename
``_Py_wchar2char()`` to :c:func:`Py_EncodeLocale`, and document these functions.
1 parent c6f8c0a commit f6a271a

File tree

13 files changed

+138
-68
lines changed

13 files changed

+138
-68
lines changed

Doc/c-api/sys.rst

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,60 @@ Operating System Utilities
4747
not call those functions directly! :c:type:`PyOS_sighandler_t` is a typedef
4848
alias for :c:type:`void (\*)(int)`.
4949

50+
.. c:function:: wchar_t* Py_DecodeLocale(const char* arg, size_t *size)
51+
52+
Decode a byte string from the locale encoding with the :ref:`surrogateescape
53+
error handler <surrogateescape>`: undecodable bytes are decoded as
54+
characters in range U+DC80..U+DCFF. If a byte sequence can be decoded as a
55+
surrogate character, escape the bytes using the surrogateescape error
56+
handler instead of decoding them.
57+
58+
Return a pointer to a newly allocated wide character string, use
59+
:c:func:`PyMem_RawFree` to free the memory. If size is not ``NULL``, write
60+
the number of wide characters excluding the null character into ``*size``
61+
62+
Return ``NULL`` on decoding error or memory allocation error. If *size* is
63+
not ``NULL``, ``*size`` is set to ``(size_t)-1`` on memory error or set to
64+
``(size_t)-2`` on decoding error.
65+
66+
Decoding errors should never happen, unless there is a bug in the C
67+
library.
68+
69+
Use the :c:func:`Py_EncodeLocale` function to encode the character string
70+
back to a byte string.
71+
72+
.. seealso::
73+
74+
The :c:func:`PyUnicode_DecodeFSDefaultAndSize` and
75+
:c:func:`PyUnicode_DecodeLocaleAndSize` functions.
76+
77+
.. versionadded:: 3.5
78+
79+
80+
.. c:function:: char* Py_EncodeLocale(const wchar_t *text, size_t *error_pos)
81+
82+
Encode a wide character string to the locale encoding with the
83+
:ref:`surrogateescape error handler <surrogateescape>`: surrogate characters
84+
in the range U+DC80..U+DCFF are converted to bytes 0x80..0xFF.
85+
86+
Return a pointer to a newly allocated byte string, use :c:func:`PyMem_Free`
87+
to free the memory. Return ``NULL`` on encoding error or memory allocation
88+
error
89+
90+
If error_pos is not ``NULL``, ``*error_pos`` is set to the index of the
91+
invalid character on encoding error, or set to ``(size_t)-1`` otherwise.
92+
93+
Use the :c:func:`Py_DecodeLocale` function to decode the bytes string back
94+
to a wide character string.
95+
96+
.. seealso::
97+
98+
The :c:func:`PyUnicode_EncodeFSDefault` and
99+
:c:func:`PyUnicode_EncodeLocale` functions.
100+
101+
.. versionadded:: 3.5
102+
103+
50104
.. _systemfunctions:
51105
52106
System Functions

Doc/c-api/unicode.rst

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -758,11 +758,13 @@ system.
758758
*errors* is ``NULL``. *str* must end with a null character but
759759
cannot contain embedded null characters.
760760
761+
Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
762+
:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
763+
Python startup).
764+
761765
.. seealso::
762766
763-
Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
764-
:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
765-
Python startup).
767+
The :c:func:`Py_DecodeLocale` function.
766768
767769
.. versionadded:: 3.3
768770
@@ -783,11 +785,13 @@ system.
783785
*errors* is ``NULL``. Return a :class:`bytes` object. *str* cannot
784786
contain embedded null characters.
785787
788+
Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
789+
:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
790+
Python startup).
791+
786792
.. seealso::
787793
788-
Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
789-
:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
790-
Python startup).
794+
The :c:func:`Py_EncodeLocale` function.
791795
792796
.. versionadded:: 3.3
793797
@@ -832,12 +836,14 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
832836
If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
833837
locale encoding.
834838
839+
:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
840+
locale encoding and cannot be modified later. If you need to decode a string
841+
from the current locale encoding, use
842+
:c:func:`PyUnicode_DecodeLocaleAndSize`.
843+
835844
.. seealso::
836845
837-
:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
838-
locale encoding and cannot be modified later. If you need to decode a
839-
string from the current locale encoding, use
840-
:c:func:`PyUnicode_DecodeLocaleAndSize`.
846+
The :c:func:`Py_DecodeLocale` function.
841847
842848
.. versionchanged:: 3.2
843849
Use ``"strict"`` error handler on Windows.
@@ -867,12 +873,13 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
867873
If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
868874
locale encoding.
869875
876+
:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
877+
locale encoding and cannot be modified later. If you need to encode a string
878+
to the current locale encoding, use :c:func:`PyUnicode_EncodeLocale`.
879+
870880
.. seealso::
871881
872-
:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
873-
locale encoding and cannot be modified later. If you need to encode a
874-
string to the current locale encoding, use
875-
:c:func:`PyUnicode_EncodeLocale`.
882+
The :c:func:`Py_EncodeLocale` function.
876883
877884
.. versionadded:: 3.2
878885

Doc/library/codecs.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,7 @@ and writing to platform dependent files:
318318
encodings.
319319

320320

321+
.. _surrogateescape:
321322
.. _codec-base-classes:
322323

323324
Codec Base Classes

Doc/library/os.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,9 +78,10 @@ uses the file system encoding to perform this conversion (see
7878

7979
.. versionchanged:: 3.1
8080
On some systems, conversion using the file system encoding may fail. In this
81-
case, Python uses the ``surrogateescape`` encoding error handler, which means
82-
that undecodable bytes are replaced by a Unicode character U+DCxx on
83-
decoding, and these are again translated to the original byte on encoding.
81+
case, Python uses the :ref:`surrogateescape encoding error handler
82+
<surrogateescape>`, which means that undecodable bytes are replaced by a
83+
Unicode character U+DCxx on decoding, and these are again translated to the
84+
original byte on encoding.
8485

8586

8687
The file system encoding must guarantee to successfully decode all bytes

Include/fileutils.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,11 @@ extern "C" {
77

88
PyAPI_FUNC(PyObject *) _Py_device_encoding(int);
99

10-
PyAPI_FUNC(wchar_t *) _Py_char2wchar(
10+
PyAPI_FUNC(wchar_t *) Py_DecodeLocale(
1111
const char *arg,
1212
size_t *size);
1313

14-
PyAPI_FUNC(char*) _Py_wchar2char(
14+
PyAPI_FUNC(char*) Py_EncodeLocale(
1515
const wchar_t *text,
1616
size_t *error_pos);
1717

Misc/NEWS

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ Release date: TBA
1010
Core and Builtins
1111
-----------------
1212

13+
- Issue #18395: Rename ``_Py_char2wchar()`` to :c:func:`Py_DecodeLocale`,
14+
rename ``_Py_wchar2char()`` to :c:func:`Py_EncodeLocale`, and document
15+
these functions.
16+
1317
- Issue #20179: Apply Argument Clinic to bytes and bytearray.
1418
Patch by Tal Einat.
1519

Misc/coverity_model.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ PyObject *PyErr_SetFromErrnoWithFilename(PyObject *exc, const char *filename)
8585
}
8686

8787
/* Python/fileutils.c */
88-
wchar_t *_Py_char2wchar(const char* arg, size_t *size)
88+
wchar_t *Py_DecodeLocale(const char* arg, size_t *size)
8989
{
9090
wchar_t *w;
9191
__coverity_tainted_data_sink__(arg);

Modules/getpath.c

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -336,7 +336,7 @@ search_for_prefix(wchar_t *argv0_path, wchar_t *home, wchar_t *_prefix,
336336
joinpath(prefix, L"Modules/Setup");
337337
if (isfile(prefix)) {
338338
/* Check VPATH to see if argv0_path is in the build directory. */
339-
vpath = _Py_char2wchar(VPATH, NULL);
339+
vpath = Py_DecodeLocale(VPATH, NULL);
340340
if (vpath != NULL) {
341341
wcsncpy(prefix, argv0_path, MAXPATHLEN);
342342
prefix[MAXPATHLEN] = L'\0';
@@ -491,10 +491,10 @@ calculate_path(void)
491491
wchar_t *_pythonpath, *_prefix, *_exec_prefix;
492492
wchar_t *lib_python;
493493

494-
_pythonpath = _Py_char2wchar(PYTHONPATH, NULL);
495-
_prefix = _Py_char2wchar(PREFIX, NULL);
496-
_exec_prefix = _Py_char2wchar(EXEC_PREFIX, NULL);
497-
lib_python = _Py_char2wchar("lib/python" VERSION, NULL);
494+
_pythonpath = Py_DecodeLocale(PYTHONPATH, NULL);
495+
_prefix = Py_DecodeLocale(PREFIX, NULL);
496+
_exec_prefix = Py_DecodeLocale(EXEC_PREFIX, NULL);
497+
lib_python = Py_DecodeLocale("lib/python" VERSION, NULL);
498498

499499
if (!_pythonpath || !_prefix || !_exec_prefix || !lib_python) {
500500
Py_FatalError(
@@ -503,7 +503,7 @@ calculate_path(void)
503503
}
504504

505505
if (_path) {
506-
path_buffer = _Py_char2wchar(_path, NULL);
506+
path_buffer = Py_DecodeLocale(_path, NULL);
507507
path = path_buffer;
508508
}
509509

@@ -584,7 +584,7 @@ calculate_path(void)
584584
** be running the interpreter in the build directory, so we use the
585585
** build-directory-specific logic to find Lib and such.
586586
*/
587-
wchar_t* wbuf = _Py_char2wchar(modPath, NULL);
587+
wchar_t* wbuf = Py_DecodeLocale(modPath, NULL);
588588
if (wbuf == NULL) {
589589
Py_FatalError("Cannot decode framework location");
590590
}
@@ -709,7 +709,7 @@ calculate_path(void)
709709

710710
if (_rtpypath && _rtpypath[0] != '\0') {
711711
size_t rtpypath_len;
712-
rtpypath = _Py_char2wchar(_rtpypath, &rtpypath_len);
712+
rtpypath = Py_DecodeLocale(_rtpypath, &rtpypath_len);
713713
if (rtpypath != NULL)
714714
bufsz += rtpypath_len + 1;
715715
}

Modules/main.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -647,7 +647,7 @@ Py_Main(int argc, wchar_t **argv)
647647
/* Used by Mac/Tools/pythonw.c to forward
648648
* the argv0 of the stub executable
649649
*/
650-
wchar_t* wbuf = _Py_char2wchar(pyvenv_launcher, NULL);
650+
wchar_t* wbuf = Py_DecodeLocale(pyvenv_launcher, NULL);
651651

652652
if (wbuf == NULL) {
653653
Py_FatalError("Cannot decode __PYVENV_LAUNCHER__");
@@ -730,7 +730,7 @@ Py_Main(int argc, wchar_t **argv)
730730
char *cfilename_buffer;
731731
const char *cfilename;
732732
int err = errno;
733-
cfilename_buffer = _Py_wchar2char(filename, NULL);
733+
cfilename_buffer = Py_EncodeLocale(filename, NULL);
734734
if (cfilename_buffer != NULL)
735735
cfilename = cfilename_buffer;
736736
else

Objects/unicodeobject.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3255,7 +3255,7 @@ PyUnicode_EncodeLocale(PyObject *unicode, const char *errors)
32553255
/* "surrogateescape" error handler */
32563256
char *str;
32573257

3258-
str = _Py_wchar2char(wstr, &error_pos);
3258+
str = Py_EncodeLocale(wstr, &error_pos);
32593259
if (str == NULL) {
32603260
if (error_pos == (size_t)-1) {
32613261
PyErr_NoMemory();
@@ -3308,7 +3308,7 @@ PyUnicode_EncodeLocale(PyObject *unicode, const char *errors)
33083308

33093309
if (errmsg != NULL) {
33103310
size_t errlen;
3311-
wstr = _Py_char2wchar(errmsg, &errlen);
3311+
wstr = Py_DecodeLocale(errmsg, &errlen);
33123312
if (wstr != NULL) {
33133313
reason = PyUnicode_FromWideChar(wstr, errlen);
33143314
PyMem_RawFree(wstr);
@@ -3526,7 +3526,7 @@ PyUnicode_DecodeLocaleAndSize(const char *str, Py_ssize_t len,
35263526

35273527
if (surrogateescape) {
35283528
/* "surrogateescape" error handler */
3529-
wstr = _Py_char2wchar(str, &wlen);
3529+
wstr = Py_DecodeLocale(str, &wlen);
35303530
if (wstr == NULL) {
35313531
if (wlen == (size_t)-1)
35323532
PyErr_NoMemory();
@@ -3581,7 +3581,7 @@ PyUnicode_DecodeLocaleAndSize(const char *str, Py_ssize_t len,
35813581
error_pos = mbstowcs_errorpos(str, len);
35823582
if (errmsg != NULL) {
35833583
size_t errlen;
3584-
wstr = _Py_char2wchar(errmsg, &errlen);
3584+
wstr = Py_DecodeLocale(errmsg, &errlen);
35853585
if (wstr != NULL) {
35863586
reason = PyUnicode_FromWideChar(wstr, errlen);
35873587
PyMem_RawFree(wstr);

0 commit comments

Comments
 (0)