CPython has an internal API Py_hash_t _Py_HashBytes(const void*, Py_ssize_t) that implements hashing of a buffer of bytes, consistently with the hash output of the bytes object. It was added (by me) in python/cpython@ce4a9da
It is currently used internally for hashing bytes objects (of course), but also str objects, memoryview objects, some datetime objects, and a couple other duties:
>>> hash(b"xxx")
-7933225254679225263
>>> hash(memoryview(b"xxx"))
-7933225254679225263
>>> bio = io.BytesIO()
>>> bio.write(b"xxx")
3
>>> bio.getbuffer()
<memory at 0x7f288ed29a80>
>>> hash(bio.getbuffer().toreadonly())
-7933225254679225263
Third-party libraries may want to define buffer-like objects and ensure that they are hashable in a way that's compatible with built-in bytes objects. Currently this would mean relying on the aforementioned internal API. An example I'm familiar with is the Buffer object in PyArrow.
>>> import pyarrow as pa
>>> buf = pa.py_buffer(b"xxx")
>>> buf
<pyarrow.Buffer address=0x7f28e05bdd00 size=3 is_cpu=True is_mutable=False>
>>> buf == b"xxx"
True
>>> hash(buf)
Traceback (most recent call last):
...
TypeError: unhashable type: 'pyarrow.lib.Buffer'
I simply propose making the API public and renaming it to Py_HashBytes, such that third-party libraries have access to the same facility.
CPython has an internal API
Py_hash_t _Py_HashBytes(const void*, Py_ssize_t)that implements hashing of a buffer of bytes, consistently with the hash output of thebytesobject. It was added (by me) in python/cpython@ce4a9daIt is currently used internally for hashing
bytesobjects (of course), but alsostrobjects,memoryviewobjects, somedatetimeobjects, and a couple other duties:Third-party libraries may want to define buffer-like objects and ensure that they are hashable in a way that's compatible with built-in
bytesobjects. Currently this would mean relying on the aforementioned internal API. An example I'm familiar with is theBufferobject in PyArrow.I simply propose making the API public and renaming it to
Py_HashBytes, such that third-party libraries have access to the same facility.