Skip to content

Commit 6775231

Browse files
committed
Unicode 9.0.0
Not completely mechanical since support for East Asian Width changes—emoji codepoints became Wide—had to be added to unicodedata.
1 parent 7ec6456 commit 6775231

File tree

10 files changed

+26006
-24348
lines changed

10 files changed

+26006
-24348
lines changed

Doc/library/unicodedata.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717

1818
This module provides access to the Unicode Character Database (UCD) which
1919
defines character properties for all Unicode characters. The data contained in
20-
this database is compiled from the `UCD version 8.0.0
21-
<http://www.unicode.org/Public/8.0.0/ucd>`_.
20+
this database is compiled from the `UCD version 9.0.0
21+
<http://www.unicode.org/Public/9.0.0/ucd>`_.
2222

2323
The module uses the same names and symbols as defined by Unicode
2424
Standard Annex #44, `"Unicode Character Database"
@@ -168,6 +168,6 @@ Examples:
168168

169169
.. rubric:: Footnotes
170170

171-
.. [#] http://www.unicode.org/Public/8.0.0/ucd/NameAliases.txt
171+
.. [#] http://www.unicode.org/Public/9.0.0/ucd/NameAliases.txt
172172
173-
.. [#] http://www.unicode.org/Public/8.0.0/ucd/NamedSequences.txt
173+
.. [#] http://www.unicode.org/Public/9.0.0/ucd/NamedSequences.txt

Doc/whatsnew/3.6.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -966,6 +966,13 @@ representing :class:`contextlib.AbstractContextManager`.
966966
(Contributed by Brett Cannon in :issue:`25609`.)
967967

968968

969+
unicodedata
970+
-----------
971+
972+
The internal database has been upgraded to use Unicode 9.0.0. (Contributed by
973+
Benjamin Peterson.)
974+
975+
969976
unittest.mock
970977
-------------
971978

Lib/test/test_unicodedata.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
class UnicodeMethodsTest(unittest.TestCase):
2121

2222
# update this, if the database changes
23-
expectedchecksum = '5971760872b2f98bb9c701e6c0db3273d756b3ec'
23+
expectedchecksum = 'c1fa98674a683aa8a8d8dee0c84494f8d36346e6'
2424

2525
def test_method_checksum(self):
2626
h = hashlib.sha1()
@@ -80,7 +80,7 @@ class UnicodeFunctionsTest(UnicodeDatabaseTest):
8080

8181
# Update this if the database changes. Make sure to do a full rebuild
8282
# (e.g. 'make distclean && make') to get the correct checksum.
83-
expectedchecksum = '5e74827cd07f9e546a30f34b7bcf6cc2eac38c8c'
83+
expectedchecksum = 'f891b1e6430c712531b9bc935a38e22d78ba1bf3'
8484
def test_function_checksum(self):
8585
data = []
8686
h = hashlib.sha1()
@@ -222,6 +222,10 @@ def test_east_asian_width(self):
222222
self.assertEqual(eaw('\u2010'), 'A')
223223
self.assertEqual(eaw('\U00020000'), 'W')
224224

225+
def test_east_asian_width_9_0_changes(self):
226+
self.assertEqual(self.db.ucd_3_2_0.east_asian_width('\u231a'), 'N')
227+
self.assertEqual(self.db.east_asian_width('\u231a'), 'W')
228+
225229
class UnicodeMiscTest(UnicodeDatabaseTest):
226230

227231
def test_failed_import_during_compiling(self):

Misc/NEWS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ What's New in Python 3.6.0 beta 2
1010
Core and Builtins
1111
-----------------
1212

13+
- Upgrade internal unicode databases to Unicode version 9.0.0.
14+
1315
- Issue #28131: Fix a regression in zipimport's compile_source(). zipimport
1416
should use the same optimization level as the interpreter.
1517

Modules/unicodedata.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ typedef struct change_record {
4545
const unsigned char category_changed;
4646
const unsigned char decimal_changed;
4747
const unsigned char mirrored_changed;
48+
const unsigned char east_asian_width_changed;
4849
const double numeric_changed;
4950
} change_record;
5051

@@ -375,6 +376,8 @@ unicodedata_UCD_east_asian_width_impl(PyObject *self, int chr)
375376
const change_record *old = get_old_record(self, c);
376377
if (old->category_changed == 0)
377378
index = 0; /* unassigned */
379+
else if (old->east_asian_width_changed != 0xFF)
380+
index = old->east_asian_width_changed;
378381
}
379382
return PyUnicode_FromString(_PyUnicode_EastAsianWidthNames[index]);
380383
}

Modules/unicodedata_db.h

Lines changed: 1684 additions & 1557 deletions
Large diffs are not rendered by default.

Modules/unicodename_db.h

Lines changed: 23433 additions & 22065 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)