accurate number parsing by lemire · Pull Request #558 · simdjson/simdjson

lemire · 2020-03-14T01:47:11Z

This PR introduces accurate number parsing. We have ULP of zero.

Fixes #242

To do:

The "#define" should be "constexpr".
Minimize the big arrays (not one per architecture).
Get rid of i!=0 check
In the parse_number function, we have two identical fall-back code paths (nearly copied and pasted)
We can do a slight better job commenting the code logic (e.g., when going from the fast path to the slower path).

…son into jkeiser/structural-iterator

lemire · 2020-03-14T01:52:04Z

Performance results (first line for each file is master, second is this PR):

$ for i in jsonexamples/*.json; do echo $i ; ./parsemaster $i |grep GB |head -1 ; ./parse  $i |grep GB |head -1; done
jsonexamples/apache_builds.json
|    Speed        :  25.1438 ns per block ( 98.29%) -   0.3929 ns per byte -   4.0449 ns per structural -    2.545 GB/s
|    Speed        :  23.9643 ns per block ( 98.20%) -   0.3745 ns per byte -   3.8551 ns per structural -    2.670 GB/s
jsonexamples/canada.json
|    Speed        :  59.3249 ns per block ( 99.35%) -   0.9270 ns per byte -   6.2404 ns per structural -    1.079 GB/s
|    Speed        :  66.3533 ns per block ( 99.42%) -   1.0368 ns per byte -   6.9798 ns per structural -    0.965 GB/s
jsonexamples/citm_catalog.json
|    Speed        :  22.3248 ns per block ( 99.07%) -   0.3488 ns per byte -   4.4305 ns per structural -    2.867 GB/s
|    Speed        :  21.2473 ns per block ( 98.98%) -   0.3320 ns per byte -   4.2167 ns per structural -    3.012 GB/s
jsonexamples/github_events.json
|    Speed        :  23.9244 ns per block ( 97.75%) -   0.3739 ns per byte -   5.2309 ns per structural -    2.674 GB/s
|    Speed        :  23.0246 ns per block ( 97.85%) -   0.3599 ns per byte -   5.0341 ns per structural -    2.779 GB/s
jsonexamples/gsoc-2018.json
|    Speed        :  17.4411 ns per block ( 98.07%) -   0.2725 ns per byte -  11.9580 ns per structural -    3.669 GB/s
|    Speed        :  17.0409 ns per block ( 98.39%) -   0.2663 ns per byte -  11.6835 ns per structural -    3.756 GB/s
jsonexamples/instruments.json
|    Speed        :  28.0224 ns per block ( 98.13%) -   0.4379 ns per byte -   3.5506 ns per structural -    2.284 GB/s
|    Speed        :  26.5388 ns per block ( 98.20%) -   0.4147 ns per byte -   3.3626 ns per structural -    2.412 GB/s
jsonexamples/marine_ik.json
|    Speed        :  64.4501 ns per block ( 99.03%) -   1.0070 ns per byte -   4.6725 ns per structural -    0.993 GB/s
|    Speed        :  63.3550 ns per block ( 98.51%) -   0.9899 ns per byte -   4.5931 ns per structural -    1.010 GB/s
jsonexamples/mesh.json
|    Speed        :  64.1624 ns per block ( 99.12%) -   1.0026 ns per byte -   4.7332 ns per structural -    0.997 GB/s
|    Speed        :  64.6612 ns per block ( 99.01%) -   1.0104 ns per byte -   4.7700 ns per structural -    0.990 GB/s
jsonexamples/mesh.pretty.json
|    Speed        :  38.1874 ns per block ( 99.26%) -   0.5967 ns per byte -   6.1407 ns per structural -    1.676 GB/s
|    Speed        :  40.3659 ns per block ( 99.29%) -   0.6307 ns per byte -   6.4910 ns per structural -    1.585 GB/s
jsonexamples/numbers.json
|    Speed        :  55.4130 ns per block ( 99.14%) -   0.8659 ns per byte -   6.4990 ns per structural -    1.155 GB/s
|    Speed        :  56.8662 ns per block ( 99.16%) -   0.8887 ns per byte -   6.6694 ns per structural -    1.125 GB/s
jsonexamples/random.json
|    Speed        :  40.4087 ns per block ( 98.68%) -   0.6314 ns per byte -   3.6622 ns per structural -    1.584 GB/s
|    Speed        :  38.5309 ns per block ( 98.67%) -   0.6021 ns per byte -   3.4921 ns per structural -    1.661 GB/s
jsonexamples/twitterescaped.json
|    Speed        :  53.1281 ns per block ( 99.00%) -   0.8302 ns per byte -   8.4485 ns per structural -    1.205 GB/s
|    Speed        :  54.2410 ns per block ( 99.24%) -   0.8476 ns per byte -   8.6255 ns per structural -    1.180 GB/s
jsonexamples/twitter.json
|    Speed        :  25.8956 ns per block ( 98.44%) -   0.4046 ns per byte -   4.6240 ns per structural -    2.471 GB/s
|    Speed        :  25.3712 ns per block ( 98.47%) -   0.3964 ns per byte -   4.5304 ns per structural -    2.522 GB/s
jsonexamples/update-center.json
|    Speed        :  31.9209 ns per block ( 98.00%) -   0.4988 ns per byte -   4.1933 ns per structural -    2.005 GB/s
|    Speed        :  30.6059 ns per block ( 98.45%) -   0.4782 ns per byte -   4.0205 ns per structural -    2.091 GB/s

jkeiser · 2020-03-14T02:01:14Z

It's more accurate and speeds it up? Nice!

lemire · 2020-03-14T02:05:01Z

@jkeiser Oh. Not really. I think. You merged your PR in the meantime. Let me update that.

lemire · 2020-03-14T02:09:34Z

@jkeiser New numbers after updating my master to account for your merging...

Same deal, first line for each file is for master, second line is this PR...

$ for i in jsonexamples/*.json; do echo $i ; ./parsemaster $i |grep GB |head -1 ; ./parse  $i |grep GB |head -1; done
jsonexamples/apache_builds.json
|    Speed        :  24.4279 ns per block ( 98.29%) -   0.3817 ns per byte -   3.9297 ns per structural -    2.620 GB/s
|    Speed        :  23.8175 ns per block ( 98.27%) -   0.3722 ns per byte -   3.8315 ns per structural -    2.687 GB/s
jsonexamples/canada.json
|    Speed        :  55.5616 ns per block ( 99.34%) -   0.8682 ns per byte -   5.8446 ns per structural -    1.152 GB/s
|    Speed        :  66.3780 ns per block ( 99.42%) -   1.0372 ns per byte -   6.9824 ns per structural -    0.964 GB/s
jsonexamples/citm_catalog.json
|    Speed        :  21.4793 ns per block ( 99.05%) -   0.3356 ns per byte -   4.2627 ns per structural -    2.980 GB/s
|    Speed        :  21.2645 ns per block ( 98.96%) -   0.3323 ns per byte -   4.2201 ns per structural -    3.010 GB/s
jsonexamples/github_events.json
|    Speed        :  23.4106 ns per block ( 97.88%) -   0.3659 ns per byte -   5.1186 ns per structural -    2.733 GB/s
|    Speed        :  22.8350 ns per block ( 97.86%) -   0.3569 ns per byte -   4.9927 ns per structural -    2.802 GB/s
jsonexamples/gsoc-2018.json
|    Speed        :  17.0689 ns per block ( 98.30%) -   0.2667 ns per byte -  11.7028 ns per structural -    3.749 GB/s
|    Speed        :  17.0445 ns per block ( 98.40%) -   0.2663 ns per byte -  11.6860 ns per structural -    3.755 GB/s
jsonexamples/instruments.json
|    Speed        :  27.0482 ns per block ( 98.11%) -   0.4226 ns per byte -   3.4272 ns per structural -    2.366 GB/s
|    Speed        :  26.4923 ns per block ( 98.24%) -   0.4140 ns per byte -   3.3568 ns per structural -    2.416 GB/s
jsonexamples/marine_ik.json
|    Speed        :  61.5280 ns per block ( 98.63%) -   0.9614 ns per byte -   4.4606 ns per structural -    1.040 GB/s
|    Speed        :  63.3877 ns per block ( 98.69%) -   0.9904 ns per byte -   4.5955 ns per structural -    1.010 GB/s
jsonexamples/mesh.json
|    Speed        :  61.1966 ns per block ( 98.99%) -   0.9563 ns per byte -   4.5145 ns per structural -    1.046 GB/s
|    Speed        :  64.6872 ns per block ( 99.06%) -   1.0108 ns per byte -   4.7720 ns per structural -    0.989 GB/s
jsonexamples/mesh.pretty.json
|    Speed        :  36.9282 ns per block ( 99.19%) -   0.5770 ns per byte -   5.9382 ns per structural -    1.733 GB/s
|    Speed        :  40.3148 ns per block ( 99.27%) -   0.6299 ns per byte -   6.4828 ns per structural -    1.587 GB/s
jsonexamples/numbers.json
|    Speed        :  52.7954 ns per block ( 98.72%) -   0.8250 ns per byte -   6.1920 ns per structural -    1.212 GB/s
|    Speed        :  56.8483 ns per block ( 99.10%) -   0.8884 ns per byte -   6.6673 ns per structural -    1.126 GB/s
jsonexamples/random.json
|    Speed        :  38.6605 ns per block ( 98.59%) -   0.6041 ns per byte -   3.5038 ns per structural -    1.655 GB/s
|    Speed        :  38.5352 ns per block ( 98.66%) -   0.6022 ns per byte -   3.4925 ns per structural -    1.661 GB/s
jsonexamples/twitterescaped.json
|    Speed        :  54.2516 ns per block ( 99.13%) -   0.8477 ns per byte -   8.6272 ns per structural -    1.180 GB/s
|    Speed        :  54.2890 ns per block ( 99.10%) -   0.8483 ns per byte -   8.6331 ns per structural -    1.179 GB/s
jsonexamples/twitter.json
|    Speed        :  25.5915 ns per block ( 98.44%) -   0.3999 ns per byte -   4.5697 ns per structural -    2.501 GB/s
|    Speed        :  25.2051 ns per block ( 98.51%) -   0.3939 ns per byte -   4.5007 ns per structural -    2.539 GB/s
jsonexamples/update-center.json
|    Speed        :  30.6783 ns per block ( 98.39%) -   0.4794 ns per byte -   4.0300 ns per structural -    2.086 GB/s
|    Speed        :  30.5450 ns per block ( 98.50%) -   0.4773 ns per byte -   4.0125 ns per structural -    2.095 GB/s

We take a hit for canada.json, mesh.json, mesh.pretty.json, and numbers.json.

lemire · 2020-03-14T02:11:02Z

@michaeleisel This might be the PR you have been asking for!

jkeiser

It'll take me a longer while to grok the whole thing, and you and Michael have already been reviewing each other for accuracy here. On that score, it seems good as far as I can tell and I know you've done a ton of testing on the actual values. The comments are largely pretty grokkable, too. And compute_float_64's fast path is a thing of beauty!

Consider comments, but this looks good :)

jkeiser · 2020-03-14T02:06:27Z

+// 10^FASTFLOAT_LARGEST_POWER (inclusively). The mantissa is truncated, and
+// never rounded up.
+// Uses about 10KB.
+static const components power_of_ten_components[] = {


Maybe the compiler will already do this, but we're compiling this once for each architecture. Will all these tables show up more than once in the executable? Does it matter? I could imagine putting them into a common top level namespace.

Good point.

jkeiser · 2020-03-14T02:22:54Z

Looks like it's a wash (some you win some you lose), which is amazing for adding exact parsing as a feature.

lemire · 2020-03-14T02:26:02Z

which is amazing for adding exact parsing as a feature.

It is.

I was highly skeptical.

I will fully take into account your comments before merging, and there are few small things (in addition to what you flag) that I want to fix.

…sing mode".

lemire · 2020-03-14T02:41:38Z

$ ./parsingcompetition jsonexamples/numbers.json
getline                                 	:    0.162 cycles per input byte (best)    0.167 cycles per input byte (avg)   19.987 GB/s (error margin: 0.573 GB/s)
simdjson (dynamic mem)                  	:    3.123 cycles per input byte (best)    3.142 cycles per input byte (avg)    1.088 GB/s (error margin: 0.007 GB/s)
simdjson                                	:    3.121 cycles per input byte (best)    3.135 cycles per input byte (avg)    1.089 GB/s (error margin: 0.005 GB/s)
RapidJSON                               	:    6.077 cycles per input byte (best)    6.485 cycles per input byte (avg)    0.560 GB/s (error margin: 0.035 GB/s)
RapidJSON (accurate number parsing)     	:   14.378 cycles per input byte (best)   14.426 cycles per input byte (avg)    0.237 GB/s (error margin: 0.001 GB/s)
RapidJSON (insitu)                      	:    6.482 cycles per input byte (best)    6.518 cycles per input byte (avg)    0.525 GB/s (error margin: 0.003 GB/s)
RapidJSON (insitu, accurate number parsing)	:   14.260 cycles per input byte (best)   14.297 cycles per input byte (avg)    0.239 GB/s (error margin: 0.001 GB/s)
sajson (dynamic mem)                    	:    6.304 cycles per input byte (best)    6.320 cycles per input byte (avg)    0.540 GB/s (error margin: 0.001 GB/s)
sajson                                  	:    5.203 cycles per input byte (best)    5.213 cycles per input byte (avg)    0.654 GB/s (error margin: 0.001 GB/s)
nlohmann-json                           	:   61.433 cycles per input byte (best)   61.533 cycles per input byte (avg)    0.055 GB/s (error margin: 0.000 GB/s)
memcpy                                  	:    0.084 cycles per input byte (best)    0.086 cycles per input byte (avg)   36.831 GB/s (error margin: 0.612 GB/s)

michaeleisel · 2020-03-15T01:26:02Z

i wonder about unit test coverage. it would be cool to have really high unit test coverage, does anyone have experience doing unit testing on gpus? i've heard of others enumerating 2^64 cases for their program using gpus

lemire · 2020-03-16T00:31:39Z

I am now happy with this PR. I just want feedback from @michaeleisel about his idea how how to possibly improve the code routine where we have a good indication that we might be right in-between two floats. I think something clever can be done there...

lemire · 2020-03-16T00:34:48Z

But I also think we should merge this sooner rather than later.

lemire · 2020-03-16T00:42:32Z

@michaeleisel Can you take a last look at your earliest convenience?

michaeleisel · 2020-03-16T00:46:55Z

+// A complement from power_of_ten_components
+// complete to a 128-bit mantissa.
+const uint64_t mantissa_128[] = {0x419ea3bd35385e2d,
+                                 0x52064cac828675b9,


it looks like we can use // clang-format on and // clang-format off to disable formatting for a section, in case you'd be interested in doing that

michaeleisel · 2020-03-16T00:52:16Z

so in src/jsoncharutils.h around line 524, we have small integer powers of 10 that are represented precisely. this is the run of numbers that have one or more 0's at the end. if we're multiplying by one of these, we know that we are indeed precisely between two floating-point numbers. all the other mantissas in that lookup table are rounded down. so, with true power-of-10 mantissa F, our approximation f, and the difference between the two, e > 0, such that F = f + e, we have i * f < i * F. i * f is our approximation and i * F is the true answer, after computing i * f we should round up

michaeleisel · 2020-03-16T00:53:20Z

are we going to merge before adding unit tests from other libraries?

michaeleisel · 2020-03-16T01:14:03Z

to add to that, i don't care either way if those cases are disambiguated. the perf win doesn't seem huge

lemire · 2020-03-16T01:32:25Z

are we going to merge before adding unit tests from other libraries?

Pull requests with more in-depth testing are invited.

This being said, this PR is under as much or more testing than the existing code (in master).

michaeleisel · 2020-03-16T01:32:52Z

fair enough

lemire · 2020-03-16T01:46:52Z

after computing i * f we should round up

Ah. Now I am getting it. We should split up the cases into the exact ones and the rounded down ones. When it is rounded down, you always want to round up when hitting exactly the middle.

When we have the exact mantissa, then it is easy.

Ok. This won't change the performance but I will implement it.

lemire · 2020-03-16T01:49:49Z

fair enough

I am not claiming that there is no bug (there always are, in a sense), but we have reviewed the code thoroughly, as a team, and we have lots of tests.

michaeleisel · 2020-03-16T01:51:02Z

i'm fine with merging

lemire · 2020-03-16T02:29:55Z

I'll merge and come back later with the change we discussed, as it is basically irrelevant from a performance point of view.

Nelson-numerical-software · 2025-07-30T20:25:45Z

  // yet we may want to be able to parse subnormal values.
  // However, we do not want to tolerate NAN or infinite values.
+  //
+  // Values like infinity or NaN are not allowed in the JSON specification.


RapidJSON supports non-standard values like NaN and Inf. Similarly, environments such as MATLAB, Octave, Scilab, and Nelson, which use C++ libraries for JSON parsing also accept these values. For example: jsondecode('NaN') works as expected. While this behavior is not part of the official JSON standard, it is commonly accepted in legacy code and widely supported in practice...

The resulting pseudo-JSON won't be parsed in Python and JavaScript.

It seems much more sensible to fix the JSON generators.

In simdjson, we do support (in an undocumented manner) JSON variants (hidden behind a macro). If you need such an extension, I encourage you to produce a pull request.

It could be quite useful to support additional parsing options, even if they're not strictly standard, similar to fast_float::parse_options options { fast_float::chars_format::json_or_infnan }; :)

jkeiser and others added 8 commits March 12, 2020 14:12

Fix parse benchmarker

15bf6ca

Structural iterator

b6665a4

Simplify atom parsing

b2833af

Structural iterator

61ac6ae

Simplify atom parsing

355b228

Accurate number parsing.

67ddc36

Merge branch 'jkeiser/structural-iterator' of github.com:lemire/simdj…

e89bda6

…son into jkeiser/structural-iterator

Merge branch 'jkeiser/structural-iterator' into dlemire/newnumberparsing

ef9c55c

lemire requested a review from jkeiser March 14, 2020 01:47

Merge branch 'master' into dlemire/newnumberparsing

e5fbdf9

lemire mentioned this pull request Mar 14, 2020

[WIP] Exact float parsing #333

Closed

6 tasks

Updating documentation.

0c46325

jkeiser approved these changes Mar 14, 2020

View reviewed changes

Adding the ability to benchmark RapidJSON in the "accurate number par…

0eff5b2

…sing mode".

lemire commented Mar 14, 2020

View reviewed changes

Comment thread src/generic/numberparsing.h Outdated