Commit c8cc931
committed
Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in
RFC 3629, but leave it commented out since it's not backward compatible;
4) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
5) Add an extensive set of tests in test_unicode;
6) Fix test_codeccallbacks because it was failing after this change.1 parent cab5c5c commit c8cc931
3 files changed
Lines changed: 226 additions & 73 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
162 | 161 | | |
163 | 162 | | |
164 | 163 | | |
165 | | - | |
| 164 | + | |
166 | 165 | | |
167 | 166 | | |
168 | 167 | | |
169 | 168 | | |
170 | | - | |
171 | | - | |
| 169 | + | |
172 | 170 | | |
| 171 | + | |
173 | 172 | | |
174 | 173 | | |
175 | 174 | | |
| 175 | + | |
| 176 | + | |
176 | 177 | | |
177 | | - | |
| 178 | + | |
| 179 | + | |
178 | 180 | | |
179 | 181 | | |
180 | 182 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
600 | 600 | | |
601 | 601 | | |
602 | 602 | | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
603 | 761 | | |
604 | 762 | | |
605 | 763 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1863 | 1863 | | |
1864 | 1864 | | |
1865 | 1865 | | |
1866 | | - | |
1867 | | - | |
| 1866 | + | |
| 1867 | + | |
| 1868 | + | |
1868 | 1869 | | |
| 1870 | + | |
1869 | 1871 | | |
1870 | 1872 | | |
1871 | 1873 | | |
1872 | 1874 | | |
1873 | | - | |
1874 | | - | |
1875 | | - | |
1876 | | - | |
| 1875 | + | |
| 1876 | + | |
1877 | 1877 | | |
1878 | 1878 | | |
1879 | | - | |
1880 | | - | |
1881 | | - | |
1882 | | - | |
1883 | | - | |
| 1879 | + | |
| 1880 | + | |
| 1881 | + | |
| 1882 | + | |
| 1883 | + | |
1884 | 1884 | | |
1885 | 1885 | | |
1886 | 1886 | | |
| |||
1897 | 1897 | | |
1898 | 1898 | | |
1899 | 1899 | | |
| 1900 | + | |
1900 | 1901 | | |
1901 | 1902 | | |
1902 | 1903 | | |
| |||
1939 | 1940 | | |
1940 | 1941 | | |
1941 | 1942 | | |
1942 | | - | |
| 1943 | + | |
| 1944 | + | |
| 1945 | + | |
1943 | 1946 | | |
1944 | 1947 | | |
1945 | 1948 | | |
1946 | 1949 | | |
1947 | 1950 | | |
1948 | 1951 | | |
1949 | 1952 | | |
1950 | | - | |
| 1953 | + | |
1951 | 1954 | | |
1952 | 1955 | | |
1953 | 1956 | | |
| |||
1960 | 1963 | | |
1961 | 1964 | | |
1962 | 1965 | | |
1963 | | - | |
| 1966 | + | |
1964 | 1967 | | |
1965 | | - | |
| 1968 | + | |
1966 | 1969 | | |
1967 | 1970 | | |
1968 | 1971 | | |
1969 | | - | |
1970 | | - | |
1971 | | - | |
1972 | | - | |
1973 | | - | |
1974 | | - | |
1975 | | - | |
1976 | | - | |
| 1972 | + | |
| 1973 | + | |
1977 | 1974 | | |
1978 | 1975 | | |
1979 | 1976 | | |
| 1977 | + | |
| 1978 | + | |
| 1979 | + | |
| 1980 | + | |
| 1981 | + | |
1980 | 1982 | | |
1981 | | - | |
1982 | | - | |
| 1983 | + | |
| 1984 | + | |
| 1985 | + | |
| 1986 | + | |
| 1987 | + | |
| 1988 | + | |
1983 | 1989 | | |
1984 | | - | |
| 1990 | + | |
| 1991 | + | |
| 1992 | + | |
| 1993 | + | |
| 1994 | + | |
| 1995 | + | |
| 1996 | + | |
| 1997 | + | |
1985 | 1998 | | |
1986 | 1999 | | |
1987 | 2000 | | |
1988 | | - | |
1989 | | - | |
1990 | | - | |
1991 | | - | |
1992 | | - | |
1993 | | - | |
1994 | | - | |
1995 | | - | |
1996 | | - | |
1997 | | - | |
1998 | | - | |
1999 | | - | |
2000 | | - | |
2001 | | - | |
2002 | | - | |
| 2001 | + | |
| 2002 | + | |
2003 | 2003 | | |
2004 | 2004 | | |
2005 | 2005 | | |
2006 | 2006 | | |
2007 | 2007 | | |
2008 | | - | |
2009 | | - | |
| 2008 | + | |
| 2009 | + | |
| 2010 | + | |
| 2011 | + | |
| 2012 | + | |
| 2013 | + | |
2010 | 2014 | | |
2011 | | - | |
| 2015 | + | |
| 2016 | + | |
| 2017 | + | |
| 2018 | + | |
| 2019 | + | |
| 2020 | + | |
2012 | 2021 | | |
2013 | 2022 | | |
2014 | 2023 | | |
2015 | | - | |
2016 | | - | |
2017 | | - | |
2018 | | - | |
2019 | | - | |
2020 | | - | |
2021 | | - | |
2022 | | - | |
2023 | | - | |
2024 | | - | |
2025 | | - | |
2026 | | - | |
| 2024 | + | |
| 2025 | + | |
| 2026 | + | |
2027 | 2027 | | |
2028 | 2028 | | |
2029 | 2029 | | |
| |||
2039 | 2039 | | |
2040 | 2040 | | |
2041 | 2041 | | |
2042 | | - | |
2043 | | - | |
2044 | | - | |
2045 | | - | |
2046 | | - | |
2047 | | - | |
2048 | | - | |
2049 | 2042 | | |
2050 | 2043 | | |
2051 | 2044 | | |
| |||
0 commit comments