-
-
Notifications
You must be signed in to change notification settings - Fork 34.6k
Update encodings.aliases #149891
Copy link
Copy link
Open
Labels
3.15pre-release feature fixes, bugs and security fixespre-release feature fixes, bugs and security fixes3.16new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or errortype-featureA feature request or enhancementA feature request or enhancement
Metadata
Metadata
Assignees
Labels
3.15pre-release feature fixes, bugs and security fixespre-release feature fixes, bugs and security fixes3.16new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or errortype-featureA feature request or enhancementA feature request or enhancement
Projects
Status
No status
Bug report
I compared
encodings.aliaseswith the IANA registry (see https://www.iana.org/assignments/character-sets/character-sets.xhtml, I used the CSV format) and found more missing aliases. Most of them are with the "cs" prefix, but there were also CCSID01140, iso-ir-149, KS_C_5601-1989 (we only have KS_C_5601-1987), GB_2312-80, windows-936, etc. One alias, csHPRoman8, was not normalized, so it did not work.There are some errors in the IANA registry. ISO-8859-11 is an alias of TIS-620, while in Python they differ by one character (euro). MS_Kanji is an alias of Shift_JIS, while in Python it is an alias of cp932 (not IANA registered). I suppose Python is more correct here.
cc @malemburg
Linked PRs