@@ -24,11 +24,11 @@ function, in the sense of simulating randomness. Python doesn't: its most
2424important hash functions (for strings and ints) are very regular in common
2525cases:
2626
27- >>> map(hash, (0, 1, 2, 3))
28- [0, 1, 2, 3]
29- >>> map(hash, ("namea", "nameb", "namec", "named"))
30- [-1658398457, -1658398460, -1658398459, -1658398462]
31- >>>
27+ >>> map(hash, (0, 1, 2, 3))
28+ [0, 1, 2, 3]
29+ >>> map(hash, ("namea", "nameb", "namec", "named"))
30+ [-1658398457, -1658398460, -1658398459, -1658398462]
31+ >>>
3232
3333This isn't necessarily bad! To the contrary, in a table of size 2**i, taking
3434the low-order i bits as the initial table index is extremely fast, and there
@@ -39,9 +39,9 @@ gives better-than-random behavior in common cases, and that's very desirable.
3939OTOH, when collisions occur, the tendency to fill contiguous slices of the
4040hash table makes a good collision resolution strategy crucial. Taking only
4141the last i bits of the hash code is also vulnerable: for example, consider
42- [i << 16 for i in range(20000)] as a set of keys. Since ints are their own
43- hash codes, and this fits in a dict of size 2**15, the last 15 bits of every
44- hash code are all 0: they *all* map to the same table index.
42+ the list [i << 16 for i in range(20000)] as a set of keys. Since ints are
43+ their own hash codes, and this fits in a dict of size 2**15, the last 15 bits
44+ of every hash code are all 0: they *all* map to the same table index.
4545
4646But catering to unusual cases should not slow the usual ones, so we just take
4747the last i bits anyway. It's up to collision resolution to do the rest. If
@@ -97,19 +97,19 @@ the high-order hash bits have an effect on early iterations. 5 was "the
9797best" in minimizing total collisions across experiments Tim Peters ran (on
9898both normal and pathological cases), but 4 and 6 weren't significantly worse.
9999
100- Historical: Reimer Behrends contributed the idea of using a polynomial-based
100+ Historical: Reimer Behrends contributed the idea of using a polynomial-based
101101approach, using repeated multiplication by x in GF(2**n) where an irreducible
102102polynomial for each table size was chosen such that x was a primitive root.
103103Christian Tismer later extended that to use division by x instead, as an
104104efficient way to get the high bits of the hash code into play. This scheme
105- also gave excellent collision statistics, but was more expensive: two
106- if-tests were required inside the loop; computing "the next" index took about
107- the same number of operations but without as much potential parallelism
108- (e.g., computing 5*j can go on at the same time as computing 1+perturb in the
109- above, and then shifting perturb can be done while the table index is being
110- masked); and the dictobject struct required a member to hold the table's
111- polynomial. In Tim's experiments the current scheme ran faster, produced
112- equally good collision statistics, needed less code & used less memory.
105+ also gave excellent collision statistics, but was more expensive: two if-tests
106+ were required inside the loop; computing "the next" index took about the same
107+ number of operations but without as much potential parallelism (e.g.,
108+ computing 5*j can go on at the same time as computing 1+perturb in the above,
109+ and then shifting perturb can be done while the table index is being masked);
110+ and the dictobject struct required a member to hold the table's polynomial.
111+ In Tim's experiments the current scheme ran faster, produced equally good
112+ collision statistics, needed less code & used less memory.
113113
114114Theoretical Python 2.5 headache: hash codes are only C "long", but
115115sizeof(Py_ssize_t) > sizeof(long) may be possible. In that case, and if a
@@ -223,9 +223,9 @@ probe indices are computed as explained earlier.
223223
224224All arithmetic on hash should ignore overflow.
225225
226- ( The details in this version are due to Tim Peters, building on many past
226+ The details in this version are due to Tim Peters, building on many past
227227contributions by Reimer Behrends, Jyrki Alakuijala, Vladimir Marangozov and
228- Christian Tismer) .
228+ Christian Tismer.
229229
230230lookdict() is general-purpose, and may return NULL if (and only if) a
231231comparison raises an exception (this was new in Python 2.5).
@@ -1485,7 +1485,10 @@ dict_equal(dictobject *a, dictobject *b)
14851485 /* temporarily bump aval's refcount to ensure it stays
14861486 alive until we're done with it */
14871487 Py_INCREF (aval );
1488+ /* ditto for key */
1489+ Py_INCREF (key );
14881490 bval = PyDict_GetItemWithError ((PyObject * )b , key );
1491+ Py_DECREF (key );
14891492 if (bval == NULL ) {
14901493 Py_DECREF (aval );
14911494 if (PyErr_Occurred ())
0 commit comments