Skip to content

Commit dc5f6b2

Browse files
committed
Got test_mutants.py working. One set of changes was straightforward:
use __eq__ instead of __cmp__. The other change is unexplained: with a random hash code as before, it would run forever; with a constant hash code, it fails quickly. This found a refcount bug in dict_equal() -- I wonder if that bug is also present in 2.5...
1 parent 801f0d7 commit dc5f6b2

2 files changed

Lines changed: 32 additions & 23 deletions

File tree

Lib/test/test_mutants.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
# ran it. Indeed, at the start, the driver never got beyond 6 iterations
2828
# before the test died.
2929

30-
# The dicts are global to make it easy to mutate them from within functions.
30+
# The dicts are global to make it easy to mutate tham from within functions.
3131
dict1 = {}
3232
dict2 = {}
3333

@@ -88,15 +88,22 @@ def __init__(self, i):
8888
# have any systematic relationship between comparison outcomes
8989
# (based on self.i and other.i) and relative position within the
9090
# hash vector (based on hashcode).
91-
self.hashcode = random.randrange(1000000000)
91+
# XXX This is no longer effective.
92+
##self.hashcode = random.randrange(1000000000)
9293

9394
def __hash__(self):
94-
return self.hashcode
95+
##return self.hashcode
96+
return 42
9597

9698
def __eq__(self, other):
9799
maybe_mutate() # The point of the test.
98100
return self.i == other.i
99101

102+
def __ne__(self, other):
103+
raise RuntimeError("I didn't expect some kind of Spanish inquisition!")
104+
105+
__lt__ = __le__ = __gt__ = __ge__ = __ne__
106+
100107
def __repr__(self):
101108
return "Horrid(%d)" % self.i
102109

@@ -133,7 +140,6 @@ def test_one(n):
133140
if verbose:
134141
print ".",
135142
c = dict1 == dict2
136-
XXX # Can't figure out how to make this work
137143
if verbose:
138144
print
139145

Objects/dictobject.c

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,11 @@ function, in the sense of simulating randomness. Python doesn't: its most
2424
important hash functions (for strings and ints) are very regular in common
2525
cases:
2626
27-
>>> map(hash, (0, 1, 2, 3))
28-
[0, 1, 2, 3]
29-
>>> map(hash, ("namea", "nameb", "namec", "named"))
30-
[-1658398457, -1658398460, -1658398459, -1658398462]
31-
>>>
27+
>>> map(hash, (0, 1, 2, 3))
28+
[0, 1, 2, 3]
29+
>>> map(hash, ("namea", "nameb", "namec", "named"))
30+
[-1658398457, -1658398460, -1658398459, -1658398462]
31+
>>>
3232
3333
This isn't necessarily bad! To the contrary, in a table of size 2**i, taking
3434
the low-order i bits as the initial table index is extremely fast, and there
@@ -39,9 +39,9 @@ gives better-than-random behavior in common cases, and that's very desirable.
3939
OTOH, when collisions occur, the tendency to fill contiguous slices of the
4040
hash table makes a good collision resolution strategy crucial. Taking only
4141
the last i bits of the hash code is also vulnerable: for example, consider
42-
[i << 16 for i in range(20000)] as a set of keys. Since ints are their own
43-
hash codes, and this fits in a dict of size 2**15, the last 15 bits of every
44-
hash code are all 0: they *all* map to the same table index.
42+
the list [i << 16 for i in range(20000)] as a set of keys. Since ints are
43+
their own hash codes, and this fits in a dict of size 2**15, the last 15 bits
44+
of every hash code are all 0: they *all* map to the same table index.
4545
4646
But catering to unusual cases should not slow the usual ones, so we just take
4747
the last i bits anyway. It's up to collision resolution to do the rest. If
@@ -97,19 +97,19 @@ the high-order hash bits have an effect on early iterations. 5 was "the
9797
best" in minimizing total collisions across experiments Tim Peters ran (on
9898
both normal and pathological cases), but 4 and 6 weren't significantly worse.
9999
100-
Historical: Reimer Behrends contributed the idea of using a polynomial-based
100+
Historical: Reimer Behrends contributed the idea of using a polynomial-based
101101
approach, using repeated multiplication by x in GF(2**n) where an irreducible
102102
polynomial for each table size was chosen such that x was a primitive root.
103103
Christian Tismer later extended that to use division by x instead, as an
104104
efficient way to get the high bits of the hash code into play. This scheme
105-
also gave excellent collision statistics, but was more expensive: two
106-
if-tests were required inside the loop; computing "the next" index took about
107-
the same number of operations but without as much potential parallelism
108-
(e.g., computing 5*j can go on at the same time as computing 1+perturb in the
109-
above, and then shifting perturb can be done while the table index is being
110-
masked); and the dictobject struct required a member to hold the table's
111-
polynomial. In Tim's experiments the current scheme ran faster, produced
112-
equally good collision statistics, needed less code & used less memory.
105+
also gave excellent collision statistics, but was more expensive: two if-tests
106+
were required inside the loop; computing "the next" index took about the same
107+
number of operations but without as much potential parallelism (e.g.,
108+
computing 5*j can go on at the same time as computing 1+perturb in the above,
109+
and then shifting perturb can be done while the table index is being masked);
110+
and the dictobject struct required a member to hold the table's polynomial.
111+
In Tim's experiments the current scheme ran faster, produced equally good
112+
collision statistics, needed less code & used less memory.
113113
114114
Theoretical Python 2.5 headache: hash codes are only C "long", but
115115
sizeof(Py_ssize_t) > sizeof(long) may be possible. In that case, and if a
@@ -223,9 +223,9 @@ probe indices are computed as explained earlier.
223223
224224
All arithmetic on hash should ignore overflow.
225225
226-
(The details in this version are due to Tim Peters, building on many past
226+
The details in this version are due to Tim Peters, building on many past
227227
contributions by Reimer Behrends, Jyrki Alakuijala, Vladimir Marangozov and
228-
Christian Tismer).
228+
Christian Tismer.
229229
230230
lookdict() is general-purpose, and may return NULL if (and only if) a
231231
comparison raises an exception (this was new in Python 2.5).
@@ -1485,7 +1485,10 @@ dict_equal(dictobject *a, dictobject *b)
14851485
/* temporarily bump aval's refcount to ensure it stays
14861486
alive until we're done with it */
14871487
Py_INCREF(aval);
1488+
/* ditto for key */
1489+
Py_INCREF(key);
14881490
bval = PyDict_GetItemWithError((PyObject *)b, key);
1491+
Py_DECREF(key);
14891492
if (bval == NULL) {
14901493
Py_DECREF(aval);
14911494
if (PyErr_Occurred())

0 commit comments

Comments
 (0)