Skip to content

Optimize value_iterator::get_string()#2211

Closed
CarlosEduR wants to merge 1 commit intosimdjson:masterfrom
CarlosEduR:optimization-get_string
Closed

Optimize value_iterator::get_string()#2211
CarlosEduR wants to merge 1 commit intosimdjson:masterfrom
CarlosEduR:optimization-get_string

Conversation

@CarlosEduR
Copy link
Copy Markdown
Member

Relates to: #1470

@CarlosEduR
Copy link
Copy Markdown
Member Author

CarlosEduR commented Jul 11, 2024

@lemire I'd like to confirm it before making the PR ready for review

I believe the current changes are not covering all cases we've talked about.

In case the character \ is not present, we should just return the string_view instance.
Now if it contains the character \:

  1. and encounter " before any \, then we also just return the string_view instance
  2. if there is not ", we fallback on the existing code

is it correct?

const char* start = raw_json_string_result.value_unsafe().raw();
const char* json = start;

for (; *json != '\0'; ++json) {
Copy link
Copy Markdown
Member

@lemire lemire Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't think you need *json != '\0'... and, in fact, it is likely incorrect. The JSON spec does not forbid null characters inside the string as far as I can tell, and it does not enforce that the string be null terminated.

@lemire
Copy link
Copy Markdown
Member

lemire commented Jul 11, 2024

Let me do some work of my own (next few hours).

@lemire
Copy link
Copy Markdown
Member

lemire commented Jan 21, 2026

Reproducing my comment here:

Here is my current view. We should have a simple/cheap inlined function (which can use NEON/SSE2) equivalent to the following...

std::none_of(s.begin(), s.end(), 
                       [](char c) { 
                        return c == '\\'; });

One thing that @CarlosEduR did not have is that you can call raw_json_token() to get a std::string_view that spans the whole string including the quotes. So all we need to do is call raw_json_token() , then have a fast function check that there is no backslash, and if so we avoid the copy.

Unfortunately the std::none_of thing won't work because compilers cannot be trusted to generate good code. But it is the idea.

@lemire
Copy link
Copy Markdown
Member

lemire commented Apr 10, 2026

Closing this for now.

@lemire lemire closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants