Skip to content

Make PyStr.hash an AtomicI64#2586

Merged
coolreader18 merged 5 commits intomasterfrom
coolreader18/hash-neg1-to-neg2
May 22, 2021
Merged

Make PyStr.hash an AtomicI64#2586
coolreader18 merged 5 commits intomasterfrom
coolreader18/hash-neg1-to-neg2

Conversation

@coolreader18
Copy link
Copy Markdown
Member

Copy link
Copy Markdown
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to copy this behavior for real compatibility issues? when i worked on hash()es, I couldn't find any reason except for internal hard compitibility at the moment.
(but yes, hard compatibility is good when possible)

@fanninpm
Copy link
Copy Markdown
Contributor

There are instances in the CPython source where a return value of -1 is automatically transformed into -2.

@youknowone
Copy link
Copy Markdown
Member

Yes, they do it to use -1 as marker for 'error raised' meaning. But in RustPython, we returns PyResult<PyHash> for that purpose.

@coolreader18
Copy link
Copy Markdown
Member Author

Yeah, or just Option<PyHash> I think. I'm not really sure about internal hard compatibility @youknowone, I guess one reason could be to use less memory - if we did use -1 as a sentinel, something like str.hash could be 8 bytes instead of 16 bytes for the Option<i64>

@coolreader18 coolreader18 force-pushed the coolreader18/hash-neg1-to-neg2 branch from dafdeee to f31f88f Compare May 19, 2021 23:45
@coolreader18
Copy link
Copy Markdown
Member Author

@youknowone I changed PyStr to use -1 to mark "no hash" - it turns out that not only is Option<i64> 16 bytes, but AtomicCell<Option<i64>> isn't even atomic - there aren't 128 byte atomics, so it was just falling back to a global lock; this should improve performance w.r.t. hashes/dicts and especially with threading disabled, since it's now a Cell<i64> instead of a AtomicI64

@coolreader18 coolreader18 force-pushed the coolreader18/hash-neg1-to-neg2 branch from f31f88f to 3889529 Compare May 20, 2021 00:14
@coolreader18
Copy link
Copy Markdown
Member Author

Pretty solid albeit small improvement:

Benchmark #1: ./rustpython-norm benchmarks/pystone.py
  Time (mean ± σ):      2.843 s ±  0.143 s    [User: 2.831 s, System: 0.006 s]
  Range (min … max):    2.557 s …  2.979 s    10 runs
 
Benchmark #2: ./rustpython-opt benchmarks/pystone.py
  Time (mean ± σ):      2.768 s ±  0.170 s    [User: 2.753 s, System: 0.009 s]
  Range (min … max):    2.498 s …  3.008 s    10 runs
 
Summary
  './rustpython-opt benchmarks/pystone.py' ran
    1.03 ± 0.08 times faster than './rustpython-norm benchmarks/pystone.py'

no-threading:

Benchmark #1: ./rustpython-norm benchmarks/pystone.py
  Time (mean ± σ):      2.139 s ±  0.049 s    [User: 2.126 s, System: 0.008 s]
  Range (min … max):    2.064 s …  2.220 s    10 runs
 
Benchmark #2: ./rustpython-opt benchmarks/pystone.py
  Time (mean ± σ):      2.113 s ±  0.054 s    [User: 2.103 s, System: 0.006 s]
  Range (min … max):    2.059 s …  2.197 s    10 runs
 
Summary
  './rustpython-opt benchmarks/pystone.py' ran
    1.01 ± 0.03 times faster than './rustpython-norm benchmarks/pystone.py'

I should run these again, my machine is pretty busy atm.

@coolreader18
Copy link
Copy Markdown
Member Author

coolreader18 commented May 20, 2021

Here's standalone benchmarks for checking value.is_ascii() to try and use value.len() and skip value.chars().count(), on a AMD Ryzen 7 3750H. strlen_bench.rs

                        time:   [1.6801 ns 1.7228 ns 1.7687 ns]
1byte_strlen/no_is_ascii/10                                                                             
                        time:   [5.4741 ns 5.6102 ns 5.7743 ns]
1byte_strlen/with_is_ascii/100                                                                             
                        time:   [6.6191 ns 6.7985 ns 6.9790 ns]
1byte_strlen/no_is_ascii/100                                                                             
                        time:   [40.217 ns 41.288 ns 42.490 ns]
1byte_strlen/with_is_ascii/1000                                                                            
                        time:   [56.007 ns 57.763 ns 59.625 ns]
1byte_strlen/no_is_ascii/1000                                                                            
                        time:   [368.47 ns 374.10 ns 381.12 ns]
1byte_strlen/with_is_ascii/10000                                                                            
                        time:   [369.02 ns 377.03 ns 386.70 ns]
1byte_strlen/no_is_ascii/10000                                                                             
                        time:   [3.6249 us 3.6864 us 3.7540 us]
1byte_strlen/with_is_ascii/20000                                                                             
                        time:   [718.57 ns 736.15 ns 757.25 ns]
1byte_strlen/no_is_ascii/20000                                                                             
                        time:   [7.1541 us 7.3091 us 7.4953 us]

2byte_strlen/with_is_ascii/10                                                                             
                        time:   [9.2915 ns 9.4337 ns 9.5949 ns]
2byte_strlen/no_is_ascii/10                                                                             
                        time:   [8.9496 ns 9.0865 ns 9.2336 ns]
2byte_strlen/with_is_ascii/100                                                                            
                        time:   [72.130 ns 73.058 ns 74.143 ns]
2byte_strlen/no_is_ascii/100                                                                            
                        time:   [73.956 ns 75.010 ns 76.300 ns]
2byte_strlen/with_is_ascii/1000                                                                             
                        time:   [720.52 ns 732.36 ns 748.13 ns]
2byte_strlen/no_is_ascii/1000                                                                             
                        time:   [713.86 ns 717.47 ns 722.36 ns]
2byte_strlen/with_is_ascii/10000                                                                             
                        time:   [7.2186 us 7.2894 us 7.3671 us]
2byte_strlen/no_is_ascii/10000                                                                             
                        time:   [7.2393 us 7.3343 us 7.4443 us]
2byte_strlen/with_is_ascii/20000                                                                             
                        time:   [14.677 us 15.007 us 15.365 us]
2byte_strlen/no_is_ascii/20000                                                                             
                        time:   [14.236 us 14.392 us 14.579 us]

3byte_strlen/with_is_ascii/10                                                                             
                        time:   [12.764 ns 12.889 ns 13.038 ns]
3byte_strlen/no_is_ascii/10                                                                             
                        time:   [12.517 ns 12.703 ns 12.912 ns]
3byte_strlen/with_is_ascii/100                                                                            
                        time:   [108.43 ns 110.01 ns 111.85 ns]
3byte_strlen/no_is_ascii/100                                                                            
                        time:   [107.93 ns 109.27 ns 111.02 ns]
3byte_strlen/with_is_ascii/1000                                                                             
                        time:   [1.0822 us 1.1010 us 1.1219 us]
3byte_strlen/no_is_ascii/1000                                                                             
                        time:   [1.0716 us 1.0809 us 1.0927 us]
3byte_strlen/with_is_ascii/10000                                                                             
                        time:   [10.758 us 10.913 us 11.093 us]
3byte_strlen/no_is_ascii/10000                                                                             
                        time:   [10.652 us 10.764 us 10.912 us]
3byte_strlen/with_is_ascii/20000                                                                             
                        time:   [21.706 us 22.077 us 22.488 us]
3byte_strlen/no_is_ascii/20000                                                                             
                        time:   [21.326 us 21.544 us 21.826 us]

4byte_strlen/with_is_ascii/10                                                                             
                        time:   [16.324 ns 16.568 ns 16.859 ns]
4byte_strlen/no_is_ascii/10                                                                             
                        time:   [16.303 ns 16.497 ns 16.732 ns]
4byte_strlen/with_is_ascii/100                                                                            
                        time:   [147.86 ns 149.24 ns 150.91 ns]
4byte_strlen/no_is_ascii/100                                                                            
                        time:   [147.07 ns 148.80 ns 150.92 ns]
4byte_strlen/with_is_ascii/1000                                                                             
                        time:   [1.4426 us 1.4656 us 1.4913 us]
4byte_strlen/no_is_ascii/1000                                                                             
                        time:   [1.4225 us 1.4331 us 1.4472 us]
4byte_strlen/with_is_ascii/10000                                                                             
                        time:   [14.131 us 14.294 us 14.497 us]
4byte_strlen/no_is_ascii/10000                                                                             
                        time:   [14.145 us 14.265 us 14.421 us]
4byte_strlen/with_is_ascii/20000                                                                             
                        time:   [28.315 us 28.636 us 29.018 us]
4byte_strlen/no_is_ascii/20000                                                                             
                        time:   [28.736 us 29.095 us 29.506 us]

assorted_strlen/with_is_ascii/10                                                                             
                        time:   [16.240 ns 16.422 ns 16.643 ns]
assorted_strlen/no_is_ascii/10                                                                             
                        time:   [16.404 ns 16.602 ns 16.837 ns]
assorted_strlen/with_is_ascii/100                                                                            
                        time:   [148.79 ns 150.46 ns 152.52 ns]
assorted_strlen/no_is_ascii/100                                                                            
                        time:   [146.41 ns 148.58 ns 151.14 ns]
assorted_strlen/with_is_ascii/1000                                                                             
                        time:   [1.4228 us 1.4489 us 1.4807 us]
assorted_strlen/no_is_ascii/1000                                                                             
                        time:   [1.4182 us 1.4427 us 1.4715 us]
assorted_strlen/with_is_ascii/10000                                                                             
                        time:   [14.097 us 14.263 us 14.460 us]
assorted_strlen/no_is_ascii/10000                                                                             
                        time:   [14.086 us 14.340 us 14.633 us]
assorted_strlen/with_is_ascii/20000                                                                             
                        time:   [28.212 us 28.611 us 29.070 us]
assorted_strlen/no_is_ascii/20000                                                                             
                        time:   [28.148 us 28.464 us 28.861 us]

@coolreader18
Copy link
Copy Markdown
Member Author

coolreader18 commented May 20, 2021

I'm actually really happy with this, cause now we make a tradeoff the same way CPython does. For CPython, if you have a string that has just one non-ascii character, its size in memory can double or even quadruple depending on how big the codepoint is. Here, we use utf-8 for strings, which is much nicer on space due to being a variable-width encoding, but if a string is non-ascii we lose time rather than space, and only when we directly index into the string. I feel that this is very comparable in terms of costs/benefits to the tradeoff CPython makes, since it really just depends on the type of program running. If a Python program just passes strings around as user input or treats them as atoms, then it would benefit from lower memory usage for all the strings it has. However, if a program has to do text processing on non-ascii characters, it'll take a hit when it needs to index/slice a string, so may want to make the tradeoff to make indexing O(1) in exchange for taking up more memory when there are non-ascii chars. Basically I'm just content that indexing is now O(1) for "common" (ascii) cases.

@coolreader18 coolreader18 changed the title Turn hash() results of -1 into -2 Make PyStr.hash an AtomicU64 May 20, 2021
@coolreader18 coolreader18 force-pushed the coolreader18/hash-neg1-to-neg2 branch from fb35318 to 1b7ccdc Compare May 20, 2021 23:45
@coolreader18 coolreader18 changed the title Make PyStr.hash an AtomicU64 Make PyStr.hash an AtomicI64 May 22, 2021
@coolreader18
Copy link
Copy Markdown
Member Author

Merging this, since I suppose it's already approved.

@coolreader18 coolreader18 merged commit c5f11a7 into master May 22, 2021
@coolreader18 coolreader18 deleted the coolreader18/hash-neg1-to-neg2 branch May 22, 2021 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants