Skip to content

Commit 2daf6ae

Browse files
committed
Issue python#13703: add a way to randomize the hash values of basic types (str, bytes, datetime)
in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated. The environment variable PYTHONHASHSEED and the new command line flag -R control this behavior.
1 parent ec1712a commit 2daf6ae

32 files changed

+660
-152
lines changed

Doc/library/sys.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,8 +220,12 @@ always available.
220220
:const:`ignore_environment` :option:`-E`
221221
:const:`verbose` :option:`-v`
222222
:const:`bytes_warning` :option:`-b`
223+
:const:`hash_randomization` :option:`-R`
223224
============================= =============================
224225

226+
.. versionadded:: 3.1.5
227+
The ``hash_randomization`` attribute.
228+
225229

226230
.. data:: float_info
227231

Doc/reference/datamodel.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1265,6 +1265,8 @@ Basic customization
12651265
inheritance of :meth:`__hash__` will be blocked, just as if :attr:`__hash__`
12661266
had been explicitly set to :const:`None`.
12671267

1268+
See also the :option:`-R` command-line option.
1269+
12681270

12691271
.. method:: object.__bool__(self)
12701272

Doc/using/cmdline.rst

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Command line
2121

2222
When invoking Python, you may specify any of these options::
2323

24-
python [-bBdEhiOsSuvVWx?] [-c command | -m module-name | script | - ] [args]
24+
python [-bBdEhiORsSuvVWx?] [-c command | -m module-name | script | - ] [args]
2525

2626
The most common use case is, of course, a simple invocation of a script::
2727

@@ -215,6 +215,29 @@ Miscellaneous options
215215
Discard docstrings in addition to the :option:`-O` optimizations.
216216

217217

218+
.. cmdoption:: -R
219+
220+
Turn on hash randomization, so that the :meth:`__hash__` values of str, bytes
221+
and datetime objects are "salted" with an unpredictable random value.
222+
Although they remain constant within an individual Python process, they are
223+
not predictable between repeated invocations of Python.
224+
225+
This is intended to provide protection against a denial-of-service caused by
226+
carefully-chosen inputs that exploit the worst case performance of a dict
227+
insertion, O(n^2) complexity. See
228+
http://www.ocert.org/advisories/ocert-2011-003.html for details.
229+
230+
Changing hash values affects the order in which keys are retrieved from a
231+
dict. Although Python has never made guarantees about this ordering (and it
232+
typically varies between 32-bit and 64-bit builds), enough real-world code
233+
implicitly relies on this non-guaranteed behavior that the randomization is
234+
disabled by default.
235+
236+
See also :envvar:`PYTHONHASHSEED`.
237+
238+
.. versionadded:: 3.1.5
239+
240+
218241
.. cmdoption:: -s
219242

220243
Don't add user site directory to sys.path
@@ -314,6 +337,7 @@ Miscellaneous options
314337

315338
.. note:: The line numbers in error messages will be off by one.
316339

340+
317341
Options you shouldn't use
318342
~~~~~~~~~~~~~~~~~~~~~~~~~
319343

@@ -328,6 +352,7 @@ Options you shouldn't use
328352
Reserved for alternative implementations of Python to use for their own
329353
purposes.
330354

355+
331356
.. _using-on-envvars:
332357

333358
Environment variables
@@ -435,6 +460,27 @@ These environment variables influence Python's behavior.
435460
import of source modules.
436461

437462

463+
.. envvar:: PYTHONHASHSEED
464+
465+
If this variable is set to ``random``, the effect is the same as specifying
466+
the :option:`-R` option: a random value is used to seed the hashes of str,
467+
bytes and datetime objects.
468+
469+
If :envvar:`PYTHONHASHSEED` is set to an integer value, it is used as a fixed
470+
seed for generating the hash() of the types covered by the hash
471+
randomization.
472+
473+
Its purpose is to allow repeatable hashing, such as for selftests for the
474+
interpreter itself, or to allow a cluster of python processes to share hash
475+
values.
476+
477+
The integer must be a decimal number in the range [0,4294967295]. Specifying
478+
the value 0 will lead to the same hash values as when hash randomization is
479+
disabled.
480+
481+
.. versionadded:: 3.1.5
482+
483+
438484
.. envvar:: PYTHONIOENCODING
439485

440486
Overrides the encoding used for stdin/stdout/stderr, in the syntax

Include/object.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -473,6 +473,12 @@ PyAPI_FUNC(void) Py_ReprLeave(PyObject *);
473473
PyAPI_FUNC(long) _Py_HashDouble(double);
474474
PyAPI_FUNC(long) _Py_HashPointer(void*);
475475

476+
typedef struct {
477+
long prefix;
478+
long suffix;
479+
} _Py_HashSecret_t;
480+
PyAPI_DATA(_Py_HashSecret_t) _Py_HashSecret;
481+
476482
/* Helper for passing objects to printf and the like */
477483
#define PyObject_REPR(obj) _PyUnicode_AsString(PyObject_Repr(obj))
478484

Include/pydebug.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ PyAPI_DATA(int) Py_DivisionWarningFlag;
1919
PyAPI_DATA(int) Py_DontWriteBytecodeFlag;
2020
PyAPI_DATA(int) Py_NoUserSiteDirectory;
2121
PyAPI_DATA(int) Py_UnbufferedStdioFlag;
22+
PyAPI_DATA(int) Py_HashRandomizationFlag;
2223

2324
/* this is a wrapper around getenv() that pays attention to
2425
Py_IgnoreEnvironmentFlag. It should be used for getting variables like

Include/pythonrun.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,8 @@ typedef void (*PyOS_sighandler_t)(int);
174174
PyAPI_FUNC(PyOS_sighandler_t) PyOS_getsig(int);
175175
PyAPI_FUNC(PyOS_sighandler_t) PyOS_setsig(int, PyOS_sighandler_t);
176176

177+
/* Random */
178+
PyAPI_FUNC(int) _PyOS_URandom (void *buffer, Py_ssize_t size);
177179

178180
#ifdef __cplusplus
179181
}

Lib/json/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,9 @@
3131
Compact encoding::
3232
3333
>>> import json
34-
>>> json.dumps([1,2,3,{'4': 5, '6': 7}], separators=(',', ':'))
34+
>>> from collections import OrderedDict
35+
>>> mydict = OrderedDict([('4', 5), ('6', 7)])
36+
>>> json.dumps([1,2,3,mydict], separators=(',', ':'))
3537
'[1,2,3,{"4":5,"6":7}]'
3638
3739
Pretty printing::

Lib/os.py

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -611,23 +611,6 @@ def _pickle_statvfs_result(sr):
611611
except NameError: # statvfs_result may not exist
612612
pass
613613

614-
if not _exists("urandom"):
615-
def urandom(n):
616-
"""urandom(n) -> str
617-
618-
Return a string of n random bytes suitable for cryptographic use.
619-
620-
"""
621-
try:
622-
_urandomfd = open("/dev/urandom", O_RDONLY)
623-
except (OSError, IOError):
624-
raise NotImplementedError("/dev/urandom (or equivalent) not found")
625-
bs = b""
626-
while len(bs) < n:
627-
bs += read(_urandomfd, n - len(bs))
628-
close(_urandomfd)
629-
return bs
630-
631614
# Supply os.popen()
632615
def popen(cmd, mode="r", buffering=-1):
633616
if not isinstance(cmd, str):

Lib/test/mapping_tests.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ class BasicTestMappingProtocol(unittest.TestCase):
1414
def _reference(self):
1515
"""Return a dictionary of values which are invariant by storage
1616
in the object under test."""
17-
return {1:2, "key1":"value1", "key2":(1,2,3)}
17+
return {"1": "2", "key1":"value1", "key2":(1,2,3)}
1818
def _empty_mapping(self):
1919
"""Return an empty mapping object"""
2020
return self.type2test()

Lib/test/regrtest.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -428,6 +428,11 @@ def main(tests=None, testdir=None, verbose=0, quiet=False, generate=False,
428428
except ValueError:
429429
print("Couldn't find starting test (%s), using all tests" % start)
430430
if randomize:
431+
hashseed = os.getenv('PYTHONHASHSEED')
432+
if not hashseed:
433+
os.environ['PYTHONHASHSEED'] = str(random_seed)
434+
os.execv(sys.executable, [sys.executable] + sys.argv)
435+
return
431436
random.seed(random_seed)
432437
print("Using random seed", random_seed)
433438
random.shuffle(tests)

0 commit comments

Comments
 (0)