Skip to content

Commit 1584ae3

Browse files
committed
Issue python#13165: stringbench is now available in the Tools/stringbench folder.
It used to live in its own SVN project.
1 parent 75d9aca commit 1584ae3

File tree

4 files changed

+1560
-0
lines changed

4 files changed

+1560
-0
lines changed

Misc/NEWS

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,12 @@ Tests
5757
- Issue #14355: Regrtest now supports the standard unittest test loading, and
5858
will use it if a test file contains no `test_main` method.
5959

60+
Tools / Demos
61+
-------------
62+
63+
- Issue #13165: stringbench is now available in the Tools/stringbench folder.
64+
It used to live in its own SVN project.
65+
6066

6167
What's New in Python 3.3.0 Alpha 2?
6268
===================================

Tools/README

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,9 @@ scripts A number of useful single-file programs, e.g. tabnanny.py
3232
tabs and spaces, and 2to3, which converts Python 2 code
3333
to Python 3 code.
3434

35+
stringbench A suite of micro-benchmarks for various operations on
36+
strings (both 8-bit and unicode).
37+
3538
test2to3 A demonstration of how to use 2to3 transparently in setup.py.
3639

3740
unicode Tools for generating unicodedata and codecs from unicode.org

Tools/stringbench/README

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
stringbench is a set of performance tests comparing byte string
2+
operations with unicode operations. The two string implementations
3+
are loosely based on each other and sometimes the algorithm for one is
4+
faster than the other.
5+
6+
These test set was started at the Need For Speed sprint in Reykjavik
7+
to identify which string methods could be sped up quickly and to
8+
identify obvious places for improvement.
9+
10+
Here is an example of a benchmark
11+
12+
13+
@bench('"Andrew".startswith("A")', 'startswith single character', 1000)
14+
def startswith_single(STR):
15+
s1 = STR("Andrew")
16+
s2 = STR("A")
17+
s1_startswith = s1.startswith
18+
for x in _RANGE_1000:
19+
s1_startswith(s2)
20+
21+
The bench decorator takes three parameters. The first is a short
22+
description of how the code works. In most cases this is Python code
23+
snippet. It is not the code which is actually run because the real
24+
code is hand-optimized to focus on the method being tested.
25+
26+
The second parameter is a group title. All benchmarks with the same
27+
group title are listed together. This lets you compare different
28+
implementations of the same algorithm, such as "t in s"
29+
vs. "s.find(t)".
30+
31+
The last is a count. Each benchmark loops over the algorithm either
32+
100 or 1000 times, depending on the algorithm performance. The output
33+
time is the time per benchmark call so the reader needs a way to know
34+
how to scale the performance.
35+
36+
These parameters become function attributes.
37+
38+
39+
Here is an example of the output
40+
41+
42+
========== count newlines
43+
38.54 41.60 92.7 ...text.with.2000.newlines.count("\n") (*100)
44+
========== early match, single character
45+
1.14 1.18 96.8 ("A"*1000).find("A") (*1000)
46+
0.44 0.41 105.6 "A" in "A"*1000 (*1000)
47+
1.15 1.17 98.1 ("A"*1000).index("A") (*1000)
48+
49+
The first column is the run time in milliseconds for byte strings.
50+
The second is the run time for unicode strings. The third is a
51+
percentage; byte time / unicode time. It's the percentage by which
52+
unicode is faster than byte strings.
53+
54+
The last column contains the code snippet and the repeat count for the
55+
internal benchmark loop.
56+
57+
The times are computed with 'timeit.py' which repeats the test more
58+
and more times until the total time takes over 0.2 seconds, returning
59+
the best time for a single iteration.
60+
61+
The final line of the output is the cumulative time for byte and
62+
unicode strings, and the overall performance of unicode relative to
63+
bytes. For example
64+
65+
4079.83 5432.25 75.1 TOTAL
66+
67+
However, this has no meaning as it evenly weights every test.
68+

0 commit comments

Comments
 (0)