Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
42d55b4
add UUIDv7 implementation
picnixz Jun 28, 2024
6826fa1
add tests
picnixz Jun 28, 2024
edc2cab
blurb
picnixz Jun 28, 2024
c6d26b6
update CHANGELOG
picnixz Jun 28, 2024
2ddb4b8
update RFC number
picnixz Jun 28, 2024
bcd1417
add TODO in the docs
picnixz Jun 28, 2024
4630c8f
Merge branch 'main' into uuid-v7-method-1
picnixz Jul 22, 2024
cd80afb
Merge branch 'main' into uuid-v7-89083
picnixz Aug 21, 2024
c3d4745
add UUIDv8 implementation
picnixz Aug 22, 2024
392d289
add tests
picnixz Aug 22, 2024
26889ea
blurb
picnixz Aug 22, 2024
44b66e6
add What's New entry
picnixz Aug 22, 2024
7be6dc4
add docs
picnixz Aug 22, 2024
8ba3d8b
Improve hexadecimal masks reading
picnixz Sep 25, 2024
a14ae9b
add uniqueness test
picnixz Sep 25, 2024
7a169c9
Update mentions to RFC 4122 to RFC 4122/9562 when possible.
picnixz Sep 25, 2024
b082c90
Update docs
picnixz Sep 25, 2024
94c70e9
Merge branch 'main' into uuid-v8-89083
picnixz Sep 25, 2024
05b7a2b
Merge branch 'main' into uuid-v7-method-1
hugovk Nov 2, 2024
275deb7
Merge branch 'main' into uuid-v8-89083
hugovk Nov 2, 2024
5e97cc3
Apply suggestions from code review
picnixz Nov 11, 2024
051f34e
Update Lib/test/test_uuid.py
picnixz Nov 11, 2024
bdf9a77
Apply suggestions from code review
picnixz Nov 11, 2024
00661fc
Merge remote-tracking branch 'origin/uuid-v8-89083'
picnixz Nov 13, 2024
0474de4
Merge remote-tracking branch 'origin/uuid-v8-89083' into uuid-v7-89083
picnixz Nov 14, 2024
a446d53
Merge remote-tracking branch 'upstream/main' into uuid-v7-89083
picnixz Nov 14, 2024
2e39072
update CLI
picnixz Nov 14, 2024
ebc1a07
Merge branch 'main' into uuid-v7-89083
picnixz Nov 14, 2024
694e07f
post-merge
picnixz Nov 14, 2024
965dbc8
Merge remote-tracking branch 'origin/uuid-v7-method-1' into uuid-v7-8…
picnixz Nov 14, 2024
7ff4368
improve readability
picnixz Nov 14, 2024
7c3cab6
post-merge
picnixz Nov 14, 2024
e758741
uniqueness test
picnixz Nov 14, 2024
c18d0c4
improve test comments
picnixz Nov 14, 2024
2df6f41
Merge remote-tracking branch 'upstream/main'
picnixz Nov 15, 2024
6fcb6a1
fix lint
picnixz Nov 15, 2024
f6048c9
Merge branch 'main' into uuid-v7-89083
picnixz Nov 15, 2024
be3f024
post-merge
picnixz Nov 15, 2024
99c6761
Merge branch 'main' into uuid-v7-89083
picnixz Nov 15, 2024
06befca
use versionchanged instead of versionadded
picnixz Nov 15, 2024
2aacadf
Merge branch 'main' into uuid-v7-method-1
picnixz Nov 16, 2024
f7f536e
Merge branch 'main' into uuid-v7-method-1
picnixz Dec 5, 2024
aee2898
improve UUIDv7 tests readability
picnixz Dec 19, 2024
1a5ac19
improve UUIDv7 uniqueness tests
picnixz Dec 19, 2024
8764b28
Merge branch 'main' into uuid-v7-method-1
picnixz Dec 21, 2024
af0baef
Merge branch 'main' into uuid-v7-method-1
picnixz Jan 11, 2025
939b5a8
Merge branch 'main' into feat/uuid/v7-89083
picnixz Jan 20, 2025
ef85b20
use `UUID._from_int` for UUIDv7 and remove `divmod` usage
picnixz Jan 20, 2025
2d08821
Merge branch 'main' into uuid-v7-method-1
picnixz Jan 20, 2025
eaa9ad4
Merge branch 'main' into uuid-v7-method-1
picnixz Feb 17, 2025
571d2fe
backport Victor's review on UUIDv6
picnixz Feb 23, 2025
f9ac658
address Victor's review
picnixz Feb 25, 2025
a756b9d
remove mention of UNIX_EPOCH + 10k years as the proof is long
picnixz Feb 25, 2025
4406796
import `time` globally as UUIDv7 is likely to be used now
picnixz Feb 25, 2025
d4eeded
run half-black
picnixz Feb 25, 2025
0e54a72
update docs
picnixz Feb 25, 2025
40ab2fa
Revert "run half-black"
picnixz Feb 25, 2025
5ee85ad
Merge branch 'main' into uuid-v7-method-1
picnixz Feb 25, 2025
3ce8943
add blank line for readability
picnixz Feb 25, 2025
59e6d7e
update comment
picnixz Feb 25, 2025
437d8cf
Update Lib/uuid.py
picnixz Feb 25, 2025
2d917b0
Merge remote-tracking branch 'upstream/main' into feat/uuid/v7-89083
picnixz Mar 2, 2025
73ab656
improve online docs
picnixz Mar 3, 2025
54d07ae
`constructor` -> `factory` in labels
picnixz Mar 3, 2025
6d76389
reword prolog
picnixz Mar 3, 2025
bd4ab55
'is outside the scope' -> 'exceeds the scope'
picnixz Mar 3, 2025
e9ddb74
Apply suggestions from code review
picnixz Mar 3, 2025
8755de0
apply PEP-8 only for UUID6, UUID7 and UUID8
picnixz Mar 3, 2025
12d7ad4
small fix
merwok Mar 3, 2025
560d87c
avoid complex language :)
picnixz Mar 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge remote-tracking branch 'upstream/main' into feat/uuid/v7-89083
  • Loading branch information
picnixz committed Mar 2, 2025
commit 2d917b0edeb18c0dbac7927dcf8c262e32bce84d
26 changes: 21 additions & 5 deletions Doc/library/uuid.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@

This module provides immutable :class:`UUID` objects (the :class:`UUID` class)
and the functions :func:`uuid1`, :func:`uuid3`, :func:`uuid4`, :func:`uuid5`,
:func:`uuid7`, and :func:`uuid8` for generating version 1, 3, 4, 5, 7, and 8
UUIDs as specified in :rfc:`9562` (which supersedes :rfc:`4122`).
:func:`uuid6`, :func:`uuid7`, and :func:`uuid8` for generating UUIDs version 1,
3, 4, 5, 6, 7, and 8 as specified in :rfc:`9562` (which supersedes :rfc:`4122`).
Comment thread
picnixz marked this conversation as resolved.
Outdated

If all you want is a unique ID, you should probably call :func:`uuid1` or
:func:`uuid4`. Note that :func:`uuid1` may compromise privacy since it creates
Expand Down Expand Up @@ -154,7 +154,7 @@ which relays any information about the UUID's safety, using this enumeration:
:const:`RFC_4122`).

.. versionchanged:: next
Added UUID versions 7 and 8.
Added UUID versions 6, 7 and 8.


.. attribute:: UUID.is_safe
Expand Down Expand Up @@ -212,6 +212,22 @@ The :mod:`uuid` module defines the following functions:
that will be encoded using UTF-8).


.. function:: uuid6(node=None, clock_seq=None)

Generate a UUID from a sequence number and the current time according to
:rfc:`9562`.
This is an alternative to :func:`uuid1` to improve database locality.

When *node* is not specified, :func:`getnode` is used to obtain the hardware
address as a 48-bit positive integer. When a sequence number *clock_seq* is
not specified, a pseudo-random 14-bit positive integer is generated.

If *node* or *clock_seq* exceed their expected bit count, only their least
significant bits are kept.

.. versionadded:: next


.. function:: uuid7()

Generate a time-based UUID according to
Expand Down Expand Up @@ -326,7 +342,7 @@ The :mod:`uuid` module can be executed as a script from the command line.

.. code-block:: sh

python -m uuid [-h] [-u {uuid1,uuid3,uuid4,uuid5,uuid7,uuid8}] [-n NAMESPACE] [-N NAME]
python -m uuid [-h] [-u {uuid1,uuid3,uuid4,uuid5,uuid6,uuid7,uuid8}] [-n NAMESPACE] [-N NAME]

The following options are accepted:

Expand All @@ -343,7 +359,7 @@ The following options are accepted:
is used.

.. versionchanged:: next
Allow generating UUID versions 7 and 8.
Allow generating UUID versions 6, 7 and 8.

.. option:: -n <namespace>
--namespace <namespace>
Expand Down
5 changes: 3 additions & 2 deletions Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -919,8 +919,9 @@ urllib
uuid
----

* Add support for UUID versions 7 and 8 via :func:`uuid.uuid7` and
:func:`uuid.uuid8` respectively, as specified in :rfc:`9562`.
* Add support for UUID versions 6, 7, and 8 via :func:`uuid.uuid6`,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to be pedantic, but is via correct? It seems to suggest that the functions are the only thing that support these versions, but the support is added in the UUID class and the functions are there too as a convenience.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in general, we don't really want people to directly use the UUID class. Strictly speaking, we're only adding the support for the version value but we don't check how it's been generated.

I prefer users to actually use the factories. Otherwise, I can say that the UUID class now accepts version to be 6, 7, or 8.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly disagree but won’t argue 🙂

:func:`uuid.uuid7`, and :func:`uuid.uuid8` respectively, as specified
in :rfc:`9562`.
(Contributed by Bénédikt Tran in :gh:`89083`.)

* :const:`uuid.NIL` and :const:`uuid.MAX` are now available to represent the
Expand Down
146 changes: 146 additions & 0 deletions Lib/test/test_uuid.py
Original file line number Diff line number Diff line change
Expand Up @@ -725,6 +725,152 @@ def test_uuid5(self):
equal(u, self.uuid.UUID(v))
equal(str(u), v)

def test_uuid6(self):
equal = self.assertEqual
u = self.uuid.uuid6()
equal(u.variant, self.uuid.RFC_4122)
equal(u.version, 6)

fake_nanoseconds = 0x1571_20a1_de1a_c533
fake_node_value = 0x54e1_acf6_da7f
fake_clock_seq = 0x14c5
with (
mock.patch.object(self.uuid, '_last_timestamp_v6', None),
mock.patch.object(self.uuid, 'getnode', return_value=fake_node_value),
mock.patch('time.time_ns', return_value=fake_nanoseconds),
mock.patch('random.getrandbits', return_value=fake_clock_seq)
):
u = self.uuid.uuid6()
equal(u.variant, self.uuid.RFC_4122)
equal(u.version, 6)

# 32 (top) | 16 (mid) | 12 (low) == 60 (timestamp)
equal(u.time, 0x1e901fca_7a55_b92)
equal(u.fields[0], 0x1e901fca) # 32 top bits of time
equal(u.fields[1], 0x7a55) # 16 mid bits of time
# 4 bits of version + 12 low bits of time
equal((u.fields[2] >> 12) & 0xf, 6)
equal((u.fields[2] & 0xfff), 0xb92)
# 2 bits of variant + 6 high bits of clock_seq
equal((u.fields[3] >> 6) & 0xf, 2)
equal(u.fields[3] & 0x3f, fake_clock_seq >> 8)
# 8 low bits of clock_seq
equal(u.fields[4], fake_clock_seq & 0xff)
equal(u.fields[5], fake_node_value)

def test_uuid6_uniqueness(self):
# Test that UUIDv6-generated values are unique.

# Unlike UUIDv8, only 62 bits can be randomized for UUIDv6.
# In practice, however, it remains unlikely to generate two
# identical UUIDs for the same 60-bit timestamp if neither
# the node ID nor the clock sequence is specified.
uuids = {self.uuid.uuid6() for _ in range(1000)}
self.assertEqual(len(uuids), 1000)
versions = {u.version for u in uuids}
self.assertSetEqual(versions, {6})

timestamp = 0x1ec9414c_232a_b00
fake_nanoseconds = (timestamp - 0x1b21dd21_3814_000) * 100

with mock.patch('time.time_ns', return_value=fake_nanoseconds):
def gen():
with mock.patch.object(self.uuid, '_last_timestamp_v6', None):
return self.uuid.uuid6(node=0, clock_seq=None)

# By the birthday paradox, sampling N = 1024 UUIDs with identical
# node IDs and timestamps results in duplicates with probability
# close to 1 (not having a duplicate happens with probability of
# order 1E-15) since only the 14-bit clock sequence is randomized.
N = 1024
uuids = {gen() for _ in range(N)}
self.assertSetEqual({u.node for u in uuids}, {0})
self.assertSetEqual({u.time for u in uuids}, {timestamp})
self.assertLess(len(uuids), N, 'collision property does not hold')

def test_uuid6_node(self):
# Make sure the given node ID appears in the UUID.
#
# Note: when no node ID is specified, the same logic as for UUIDv1
# is applied to UUIDv6. In particular, there is no need to test that
# getnode() correctly returns positive integers of exactly 48 bits
# since this is done in test_uuid1_eui64().
self.assertLessEqual(self.uuid.uuid6().node.bit_length(), 48)

self.assertEqual(self.uuid.uuid6(0).node, 0)

# tests with explicit values
max_node = 0xffff_ffff_ffff
self.assertEqual(self.uuid.uuid6(max_node).node, max_node)
big_node = 0xE_1234_5678_ABCD # 52-bit node
res_node = 0x0_1234_5678_ABCD # truncated to 48 bits
self.assertEqual(self.uuid.uuid6(big_node).node, res_node)

# randomized tests
for _ in range(10):
# node with > 48 bits is truncated
for b in [24, 48, 72]:
node = (1 << (b - 1)) | random.getrandbits(b)
with self.subTest(node=node, bitlen=b):
self.assertEqual(node.bit_length(), b)
u = self.uuid.uuid6(node=node)
self.assertEqual(u.node, node & 0xffff_ffff_ffff)

def test_uuid6_clock_seq(self):
# Make sure the supplied clock sequence appears in the UUID.
#
# For UUIDv6, clock sequence bits are stored from bit 48 to bit 62,
# with the convention that the least significant bit is bit 0 and
# the most significant bit is bit 127.
get_clock_seq = lambda u: (u.int >> 48) & 0x3fff

u = self.uuid.uuid6()
self.assertLessEqual(get_clock_seq(u).bit_length(), 14)

# tests with explicit values
big_clock_seq = 0xffff # 16-bit clock sequence
res_clock_seq = 0x3fff # truncated to 14 bits
u = self.uuid.uuid6(clock_seq=big_clock_seq)
self.assertEqual(get_clock_seq(u), res_clock_seq)

# some randomized tests
for _ in range(10):
# clock_seq with > 14 bits is truncated
for b in [7, 14, 28]:
node = random.getrandbits(48)
clock_seq = (1 << (b - 1)) | random.getrandbits(b)
with self.subTest(node=node, clock_seq=clock_seq, bitlen=b):
self.assertEqual(clock_seq.bit_length(), b)
u = self.uuid.uuid6(node=node, clock_seq=clock_seq)
self.assertEqual(get_clock_seq(u), clock_seq & 0x3fff)

def test_uuid6_test_vectors(self):
equal = self.assertEqual
# https://www.rfc-editor.org/rfc/rfc9562#name-test-vectors
# (separators are put at the 12th and 28th bits)
timestamp = 0x1ec9414c_232a_b00
fake_nanoseconds = (timestamp - 0x1b21dd21_3814_000) * 100
# https://www.rfc-editor.org/rfc/rfc9562#name-example-of-a-uuidv6-value
node = 0x9f6bdeced846
clock_seq = (3 << 12) | 0x3c8

with (
mock.patch.object(self.uuid, '_last_timestamp_v6', None),
mock.patch('time.time_ns', return_value=fake_nanoseconds)
):
u = self.uuid.uuid6(node=node, clock_seq=clock_seq)
equal(str(u).upper(), '1EC9414C-232A-6B00-B3C8-9F6BDECED846')
# 32 16 4 12 2 14 48
# time_hi | time_mid | ver | time_lo | var | clock_seq | node
equal(u.time, timestamp)
equal(u.int & 0xffff_ffff_ffff, node)
equal((u.int >> 48) & 0x3fff, clock_seq)
equal((u.int >> 62) & 0x3, 0b10)
equal((u.int >> 64) & 0xfff, 0xb00)
equal((u.int >> 76) & 0xf, 0x6)
equal((u.int >> 80) & 0xffff, 0x232a)
equal((u.int >> 96) & 0xffff_ffff, 0x1ec9_414c)

def test_uuid7(self):
equal = self.assertEqual
u = self.uuid.uuid7()
Expand Down
44 changes: 42 additions & 2 deletions Lib/uuid.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
r"""UUID objects (universally unique identifiers) according to RFC 4122/9562.

This module provides immutable UUID objects (class UUID) and the functions
uuid1(), uuid3(), uuid4(), uuid5(), uuid7(), and uuid8() for generating
version 1, 3, 4, 5, 7, and 8 UUIDs as specified in RFC 4122/9562.
uuid{N}() for generating UUIDs version N as specified in RFC 4122/9562 for
N = 1, 3, 4, 5, 6, 7, and 8.
Comment thread
picnixz marked this conversation as resolved.
Outdated
Comment thread
picnixz marked this conversation as resolved.
Outdated

If all you want is a unique ID, you should probably call uuid1() or uuid4().
Note that uuid1() may compromise privacy since it creates a UUID containing
Expand Down Expand Up @@ -102,6 +102,7 @@ class SafeUUID:
_RFC_4122_VERSION_3_FLAGS = ((3 << 76) | (0x8000 << 48))
_RFC_4122_VERSION_4_FLAGS = ((4 << 76) | (0x8000 << 48))
_RFC_4122_VERSION_5_FLAGS = ((5 << 76) | (0x8000 << 48))
_RFC_4122_VERSION_6_FLAGS = ((6 << 76) | (0x8000 << 48))
_RFC_4122_VERSION_7_FLAGS = ((7 << 76) | (0x8000 << 48))
_RFC_4122_VERSION_8_FLAGS = ((8 << 76) | (0x8000 << 48))

Expand Down Expand Up @@ -770,6 +771,44 @@ def uuid5(namespace, name):
int_uuid_5 |= _RFC_4122_VERSION_5_FLAGS
return UUID._from_int(int_uuid_5)

_last_timestamp_v6 = None

def uuid6(node=None, clock_seq=None):
"""Similar to :func:`uuid1` but where fields are ordered differently
for improved DB locality.

More precisely, given a 60-bit timestamp value as specified for UUIDv1,
for UUIDv6 the first 48 most significant bits are stored first, followed
by the 4-bit version (same position), followed by the remaining 12 bits
of the original 60-bit timestamp.
"""
global _last_timestamp_v6
import time
nanoseconds = time.time_ns()
# 0x01b21dd213814000 is the number of 100-ns intervals between the
# UUID epoch 1582-10-15 00:00:00 and the Unix epoch 1970-01-01 00:00:00.
timestamp = nanoseconds // 100 + 0x01b21dd213814000
if _last_timestamp_v6 is not None and timestamp <= _last_timestamp_v6:
timestamp = _last_timestamp_v6 + 1
_last_timestamp_v6 = timestamp
if clock_seq is None:
import random
clock_seq = random.getrandbits(14) # instead of stable storage
time_hi_and_mid = (timestamp >> 12) & 0xffff_ffff_ffff
time_lo = timestamp & 0x0fff # keep 12 bits and clear version bits
clock_s = clock_seq & 0x3fff # keep 14 bits and clear variant bits
if node is None:
node = getnode()
# --- 32 + 16 --- -- 4 -- -- 12 -- -- 2 -- -- 14 --- 48
# time_hi_and_mid | version | time_lo | variant | clock_seq | node
int_uuid_6 = time_hi_and_mid << 80
int_uuid_6 |= time_lo << 64
int_uuid_6 |= clock_s << 48
int_uuid_6 |= node & 0xffff_ffff_ffff
# by construction, the variant and version bits are already cleared
int_uuid_6 |= _RFC_4122_VERSION_6_FLAGS
return UUID._from_int(int_uuid_6)

_last_timestamp_v7 = None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_last_timestamp_v7 = None
_last_timestamp_v7 = None

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to apply a PEP-8 change in a separate PR because the module has inconsistencies. It seems a bit weird to only PEP-8ify this part of the code while the rest is not really PEP-8ified. See #121119 (comment).

Copy link
Copy Markdown
Member

@merwok merwok Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python-dev doesn’t have a practice of doing reformatting-only PRs.
Remember that consistency for its own sake is not a goal (see PEP 20)

Instead, follow good conventions in code that is added or already changed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... if a core dev endorses the change, I think it's fine. I don't mind endorsing it. I didn't do it for uuid6() nor for uuid8() when I wrote the function as there were more 1-blank lines separations rather than 2 blank lines separations. But if you insist on adding 2 blank lines, I'll also add them around the other functions because I prefer being consistent in this case (honestly, having 2 blank lines around only UUIDv7 makes it harder to read IMO).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say PEP-8 tells me that we can also ignore the PEP if the surrounding code already breaks it. But I will make a commit to just add blank lines around the functions I've added (uuid6 to uuid8).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it's worth it to reformat the whole uuid.py file to PEP 8, but respecting PEP 8 for new code (or code near changed code) is a good practice.

Copy link
Copy Markdown
Member

@merwok merwok Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, adding a few blank lines is innocuous (it does not change git blame, or risk changing the meaning of code), so it’s fine to do in existing code in this PR.

Generally people saying they want to «apply PEP 8» think of more bigger changes.

[note: marking this convo as unresolved just to help Victor or Hugo see it, not because there’s something left to do for the PR author]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say PEP-8 tells me that we can also ignore the PEP if the surrounding code already breaks it.

This is about for example methods using camelCase in unittest or logging, not spaces!

_last_counter_v7 = 0 # 42-bit counter

Expand Down Expand Up @@ -876,6 +915,7 @@ def main():
"uuid3": uuid3,
"uuid4": uuid4,
"uuid5": uuid5,
"uuid6": uuid6,
"uuid7": uuid7,
"uuid8": uuid8,
}
Expand Down
You are viewing a condensed version of this merge commit. You can view the full changes here.