Replies: 5 comments 5 replies
-
|
Your benchmark is almost entirely a 'number parsing' benchmark (with some overhead due to the JSON structure). It is not a benchmark I would use because you are repeatedly parsing the same number. I recommend using something more realistic.
It is not using The simdjson library uses a state-of-the-art number parser, described in this paper:
Notice the title: "a gigabyte per second". Obviously, you can beat a gigabyte per second, you are seemingly achieving 2 GB/s (although you are repeatedly parsing the same number so be careful). But you are just not going to achieve much higher speed than that. Using single-core processors, I don't think we know how to parse numbers at much greater speed.
Your benchmark does suggest that we could improve integer parsing speed. That's interesting. Let us improve that.
You should get better speed with ClangCL which is what we recommend. Your numbers are expected by the way. Please see my blog post: Float-parsing benchmark: Regular Visual Studio, ClangCL and Linux GCC. Here was my conclusion at the time:
If you choose to develop under Windows and you are disappointed by the performance, I recommend reporting it to Microsoft. There is little that I, or anyone involved in simdjson, can do about it. It is in the hands of Microsoft. I have reported at least twice disappointing performance to Microsoft engineers. The one good piece of advice I got from them was to switch to ClangCL, which I recommend you do too. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for your quick, great reply! My Windows MSVC SIMD experience has been mixed; sometimes I get better results than g++/clang++. But is always involves lots of tuning and trial/error. My experience with reporting stuff to MS has been generally poor. I've modified the benchmark, adding
Updated codeHere are the results for throughput vs. (f,l,r) and compiler, 6 iterations each (error bars indicate span from min to max throughput). Windows is O2, WSL is O3. Each bar-group is for a different compiler; blue is Windows11, green is WSL2 (Ubuntu). Hatched bars are for get_double(). My computer was otherwise idle when I ran these. I've noticed that the normal VS2022 compiler is using the routine int64_t seems to be slower than double only for WSL-g++. I can't explain why 'l' (linspace) is so different from 'r' (random), but maybe 'r' hits certain float-parser exception cases and 'l' doesn't? (*) More on linspace: this was motivated to just save time when creating the 350M-entry json-string. I create 9859 strings representing values from [min..max], then cycle through them again and again when creating the full json. Did it this way to try and get a good mix of digits. |
Beta Was this translation helpful? Give feedback.
-
Can you elaborate? To my knowledge they should all use exactly the same parsing routines, with slight specialization for MSVC. |
Beta Was this translation helpful? Give feedback.
-
|
I thought I'd provide an update. I'm really perplexed. Here's a summary of the current setup:
It's (4) that I'm really focusing on here, comparing similar code in two separate files, compiled into one executable with standard VS2022 compiler. Here are sample timing results for VS2022. (For comparison, LLVM is 1.54 GB/s for Yes, it's running ~3x faster outside of simdjson.h. I have no idea why. Below are some Performance Profiler clips from |
Beta Was this translation helpful? Give feedback.
-
|
I've done some exploring these past few days. Let me share my findings.
The reason is that the benchmark used I also tried some newer gcc, e.g., gcc 12.0, So I think the result only appears on older gcc, like gcc 8.5 If you use I also wrote some microbenchmarks to test dom::parse, not surprised, parse double is slower. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using simdjson (CPU=14700K) to read JSON files with large 2D double arrays, and was wondering if I'm doing so at an expected speed or not.
Here's my simple C++ benchmark
Using Win11/MSVC (O2, AVX2, c++20), I get:
Using WSL/g++ (O3), I get:
I'm a little surprised that
a) gcc+/WSL is so much faster (my main use case is Windows)
b) get_double is faster than get_int64, in WSL
I see #2135, but it seems to me that simdjson::parse_double is being used, not anything like from_chars from fast_float e.g.?
Any suggestions for JSON-based speed improvement are appreciated.
Beta Was this translation helpful? Give feedback.
All reactions