Conversation
|
Turns out that @jkeiser fix only partially resolves the issue. I need to do more work. (Work done.) |
|
@strager It is possible that this PR could resolve your issues. If you could check it out, that would be great. I have included your tests as well as extensions of your tests. |
|
@jkeiser For obvious reasons, I would really like your input on this since it is yet another rather large change deep into the front-end. I do think, however, that it is an easy to understand change. |
|
This patch works for my application. 👍 |
|
@strager The feedback is much appreciated. Please keep finding bugs!!! :-) |
|
I am going to merge this eagerly. We can revert some of the changes following feedback by @jkeiser later. |
|
I never verified whether peek/next or advance made any difference performance wise, so either model is fine (I switched back and forth several times, in fact). Did you happen to see a performance change after this? |
|
There is potentially a large performance drop since v0.9.
I will open an issue. |
|
@jkeiser Note that we need a solution. We can't just advance always. But we certainly should study the performance issue. |
|
Interactive plot at https://simdjson.org/plots.html |
The answer is no, I think. Our performance has evolved, but this particular PR did not lead to a performance drop. |

In issue 1632, @jkeiser referred to two commits in an unrelated PR which constitute a fix for issue 1632. I have taken these two commits and turned them into a specific pull request.
To recap, the problem is that if you have a bad JSON input, and if you somehow ask for a bad type (e.g., as for an object where there is no {, or a boolean where there is false or true), then the front-end gets confused. The workaround is to ask for the type before, as a precaution. It works because asking for the type does not advance the iterator... but asking for a value does advance the pointer even when the type is not right.
This PR changes this behavior.
I have had to extend the work done by @jkeiser in a non-trivial manner. My change is rather intrusive, more so that I'd like... but it felt right.
The original On Demand front-end always advance if you are asking for a value (array, object, bool, double) and you are just before it. If you ask for a Boolean value, it will look at the current character, decide whether it is a Boolean or not, and then move forward (whether it is a Boolean or not).
I found this quite confusing when dealing with errored document. So I have moved the code to a peek/advance model.
The general idea of the advance_... methods and the peek_* methods is that you first peek and check that you have desired type. If you do, and only if you do, then you advance.
Suppose you always advance. Look at the 'value' matching the key "shadowable" in the following example...
({"globals":{"a":{"shadowable":[}}}})If the user thinks it is a Boolean and asks for it, then we check the '[', decide it is not a Boolean, but still move into the next character ('}'). Now we are left pointing at '}' right after a '['. And we have not yet reported an error, only that we do not have a Boolean.
If, instead, you just stand your ground until it is content that you know, then you will only even move beyond the '[' if the user tells you that you have an array. So you will be at the '}' character inside the array and, hopefully, you will then catch the error because an array cannot start with '}', but the code processing Boolean values does not know this.
So the contract is: first call 'peek_...' and then call 'advance_...' only if you have determined that it is a type you can handle.
Unfortunately, it makes the code more verbose, longer and maybe more error prone. And there might be (???) a downside performance-wise. It should be streamlined, if not now, then in the future.
This PR also integrates elements of @jkeiser 's code that are meant to do away with input padding but I think it is relatively minor and the offending parts were commented out with explanations.
Fixes #1664
Fixes #1632
Importantly, this is the basis for a new PR that would lift the padding requirement #1665