Skip to content

Commit c2d9a02

Browse files
committed
Merge doc fix with 3.2.
2 parents 7194efe + 222b208 commit c2d9a02

1 file changed

Lines changed: 2 additions & 7 deletions

File tree

Doc/library/codecs.rst

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,7 @@ There's another encoding that is able to encoding the full range of Unicode
840840
characters: UTF-8. UTF-8 is an 8-bit encoding, which means there are no issues
841841
with byte order in UTF-8. Each byte in a UTF-8 byte sequence consists of two
842842
parts: Marker bits (the most significant bits) and payload bits. The marker bits
843-
are a sequence of zero to six 1 bits followed by a 0 bit. Unicode characters are
843+
are a sequence of zero to four ``1`` bits followed by a ``0`` bit. Unicode characters are
844844
encoded like this (with x being payload bits, which when concatenated give the
845845
Unicode character):
846846

@@ -853,12 +853,7 @@ Unicode character):
853853
+-----------------------------------+----------------------------------------------+
854854
| ``U-00000800`` ... ``U-0000FFFF`` | 1110xxxx 10xxxxxx 10xxxxxx |
855855
+-----------------------------------+----------------------------------------------+
856-
| ``U-00010000`` ... ``U-001FFFFF`` | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
857-
+-----------------------------------+----------------------------------------------+
858-
| ``U-00200000`` ... ``U-03FFFFFF`` | 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
859-
+-----------------------------------+----------------------------------------------+
860-
| ``U-04000000`` ... ``U-7FFFFFFF`` | 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
861-
| | 10xxxxxx |
856+
| ``U-00010000`` ... ``U-0010FFFF`` | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
862857
+-----------------------------------+----------------------------------------------+
863858

864859
The least significant bit of the Unicode character is the rightmost x bit.

0 commit comments

Comments
 (0)