Skip to content

Add Py_hash_t Py_HashBuffer(const void*, Py_ssize_t) #13

@pitrou

Description

@pitrou

CPython has an internal API Py_hash_t _Py_HashBytes(const void*, Py_ssize_t) that implements hashing of a buffer of bytes, consistently with the hash output of the bytes object. It was added (by me) in python/cpython@ce4a9da

It is currently used internally for hashing bytes objects (of course), but also str objects, memoryview objects, some datetime objects, and a couple other duties:

>>> hash(b"xxx")
-7933225254679225263
>>> hash(memoryview(b"xxx"))
-7933225254679225263
>>> bio = io.BytesIO()
>>> bio.write(b"xxx")
3
>>> bio.getbuffer()
<memory at 0x7f288ed29a80>
>>> hash(bio.getbuffer().toreadonly())
-7933225254679225263

Third-party libraries may want to define buffer-like objects and ensure that they are hashable in a way that's compatible with built-in bytes objects. Currently this would mean relying on the aforementioned internal API. An example I'm familiar with is the Buffer object in PyArrow.

>>> import pyarrow as pa
>>> buf = pa.py_buffer(b"xxx")
>>> buf
<pyarrow.Buffer address=0x7f28e05bdd00 size=3 is_cpu=True is_mutable=False>
>>> buf == b"xxx"
True
>>> hash(buf)
Traceback (most recent call last):
  ...
TypeError: unhashable type: 'pyarrow.lib.Buffer'

I simply propose making the API public and renaming it to Py_HashBytes, such that third-party libraries have access to the same facility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions