11.. _text-bytes :
2+ .. _bytes_mode :
23
34Bytes/text management
45=====================
56
6- Python 3 introduces a hard distinction between *text * (``str ``) – sequences of
7- characters (formally, *Unicode codepoints *) – and ``bytes `` – sequences of
8- 8-bit values used to encode *any * kind of data for storage or transmission.
9-
10- Python 2 has the same distinction between ``str `` (bytes) and
11- ``unicode `` (text).
12- However, values can be implicitly converted between these types as needed,
13- e.g. when comparing or writing to disk or the network.
14- The implicit encoding and decoding can be a source of subtle bugs when not
15- designed and tested adequately.
16-
17- In python-ldap 2.x (for Python 2), bytes were used for all fields,
18- including those guaranteed to be text.
19-
20- From version 3.0, python-ldap uses text where appropriate.
21- On Python 2, the :ref: `bytes mode <bytes_mode >` setting influences how text is
22- handled.
23-
24-
25- What's text, and what's bytes
26- -----------------------------
27-
287The LDAP protocol states that some fields (distinguished names, relative
298distinguished names, attribute names, queries) be encoded in UTF-8.
30- In python-ldap, these are represented as text (``str `` on Python 3,
31- ``unicode `` on Python 2).
9+ In python-ldap, these are represented as text (``str `` on Python 3).
3210
3311Attribute *values *, on the other hand, **MAY **
3412contain any type of data, including text.
@@ -38,102 +16,26 @@ Thus, attribute values are *always* treated as ``bytes``.
3816Encoding/decoding to other formats – text, images, etc. – is left to the caller.
3917
4018
41- .. _bytes_mode :
42-
43- The bytes mode
44- --------------
45-
46- In Python 3, text values are represented as ``str ``, the Unicode text type.
47-
48- In Python 2, the behavior of python-ldap 3.0 is influenced by a ``bytes_mode ``
49- argument to :func: `ldap.initialize `:
50-
51- ``bytes_mode=True `` (backwards compatible):
52- Text values are represented as bytes (``str ``) encoded using UTF-8.
53-
54- ``bytes_mode=False `` (future compatible):
55- Text values are represented as ``unicode ``.
56-
57- If not given explicitly, python-ldap will default to ``bytes_mode=True ``,
58- but if a ``unicode `` value is supplied to it, it will warn and use that value.
59-
60- Backwards-compatible behavior is not scheduled for removal until Python 2
61- itself reaches end of life.
62-
63-
64- Errors, warnings, and automatic encoding
65- ----------------------------------------
66-
67- While the type of values *returned * from python-ldap is always given by
68- ``bytes_mode ``, for Python 2 the behavior for “wrong-type” values *passed in *
69- can be controlled by the ``bytes_strictness `` argument to
70- :func: `ldap.initialize `:
19+ Historical note
20+ ---------------
7121
72- ``bytes_strictness='error' `` (default if ``bytes_mode `` is specified):
73- A ``TypeError `` is raised.
74-
75- ``bytes_strictness='warn' `` (default when ``bytes_mode `` is not given explicitly):
76- A warning is raised, and the value is encoded/decoded
77- using the UTF-8 encoding.
78-
79- The warnings are of type :class: `~ldap.LDAPBytesWarning `, which
80- is a subclass of :class: `BytesWarning ` designed to be easily
81- :ref: `filtered out <filter-bytes-warning >` if needed.
82-
83- ``bytes_strictness='silent' ``:
84- The value is automatically encoded/decoded using the UTF-8 encoding.
85-
86- On Python 3, ``bytes_strictness `` is ignored and a ``TypeError `` is always
87- raised.
88-
89- When setting ``bytes_strictness ``, an explicit value for ``bytes_mode `` needs
90- to be given as well.
91-
92-
93- Porting recommendations
94- -----------------------
95-
96- Since end of life of Python 2 is coming in a few years, projects are strongly
97- urged to make their code compatible with Python 3. General instructions for
98- this are provided :ref: `in Python documentation <pyporting-howto >` and in the
99- `Conservative porting guide `_.
100-
101- .. _Conservative porting guide : https://portingguide.readthedocs.io/en/latest/
102-
103-
104- When porting from python-ldap 2.x, users are advised to update their code
105- to set ``bytes_mode=False ``, and fix any resulting failures.
106-
107- The typical usage is as follows.
108- Note that only the result's *values * are of the ``bytes `` type:
109-
110- .. code-block :: pycon
111-
112- >>> import ldap
113- >>> con = ldap.initialize('ldap://localhost:389', bytes_mode=False)
114- >>> con.simple_bind_s(u'login', u'secret_password')
115- >>> results = con.search_s(u'ou=people,dc=example,dc=org', ldap.SCOPE_SUBTREE, u"(cn=Raphaël)")
116- >>> results
117- [
118- ("cn=Raphaël,ou=people,dc=example,dc=org", {
119- 'cn': [b'Rapha\xc3\xabl'],
120- 'sn': [b'Barrois'],
121- }),
122- ]
123-
124-
125- .. _filter-bytes-warning :
126-
127- Filtering warnings
128- ------------------
22+ Python 3 introduced a hard distinction between *text * (``str ``) – sequences of
23+ characters (formally, *Unicode codepoints *) – and ``bytes `` – sequences of
24+ 8-bit values used to encode *any * kind of data for storage or transmission.
12925
130- The bytes mode warnings can be filtered out and ignored with a
131- simple filter.
26+ Python 2 had the same distinction between ``str `` (bytes) and
27+ ``unicode `` (text).
28+ However, values could be implicitly converted between these types as needed,
29+ e.g. when comparing or writing to disk or the network.
30+ The implicit encoding and decoding can be a source of subtle bugs when not
31+ designed and tested adequately.
13232
133- .. code-block :: python
33+ In python-ldap 2.x (for Python 2), bytes were used for all fields,
34+ including those guaranteed to be text.
13435
135- import warnings
136- import ldap
36+ From version 3.0 to 3.3, python-ldap uses text where appropriate.
37+ On Python 2, special ``bytes_mode `` and ``bytes_strictness `` settings
38+ influenced how text was handled.
13739
138- if hasattr (ldap, ' LDAPBytesWarning ' ):
139- warnings.simplefilter( ' ignore ' , ldap.LDAPBytesWarning)
40+ From version 3.3 on, only Python 3 is supported. The “bytes mode” settings
41+ are deprecated and do nothing.
0 commit comments