Skip to content

replace get<T>.tie(v,e) with get(v)#950

Merged
jkeiser merged 3 commits intomasterfrom
jkeiser/tie-type
Jun 21, 2020
Merged

replace get<T>.tie(v,e) with get(v)#950
jkeiser merged 3 commits intomasterfrom
jkeiser/tie-type

Conversation

@jkeiser
Copy link
Copy Markdown
Member

@jkeiser jkeiser commented Jun 19, 2020

This adds a get(v) method that acts as ergonomic shorthand for get().tie(v, error). It returns the error code so you can use it in contorl flow. I like it even better than auto [v,error] as well, in most cases. It simplifies things considerably.

Comparison

  • New version:
    int64_t val;
    if ((error = obj["val"].get(val))) { cerr << error << endl; return; }
    cout << val << endl;
  • Old version with auto (C++14): this feels like it buries the lead (the fact that we're getting an int64_t out)
    auto [val, error] = obj["val"].get<int64_t>();
    if (error) { cerr << error << endl; return; }
    cout << val << endl;
  • Old version with tie (C++11): this is verbose and not control flow friendly
    int64_t val;
    obj["val"].get<int64_t>().tie(val, error);
    if (error) { cerr << error << endl; return; }
    cout << val << endl;
  • New version without if: For comparison, if you don't like stuff in your if statements :)
    int64_t val;
    error = obj["val"].get(val);
    if (error) { cerr << error << endl; return; }
    cout << val << endl;

Changes

  • You don't have to write the type name twice anymore.
  • You can include the call in control flow (as an if or while condition) if you so desire.
  • It's even better than auto in many cases because it's easier see the type of your variable when you declare it (since it's prominent on the left where you are usually looking).
  • I added it to simdjson_result and it feels consistent throughout the library.
  • I had it stop assigning to value if there is an error, and uninitialized warnings revealed a few places where we were using value even when there was an error! I would like to leave this for developer safety--make it impossible to do this without a warning.
    • To do this, I added SIMDJSON_UNREACHABLE and SIMDJSON_ASSUME macros to use in perf tests.

Example

I've converted the README, tools, benchmarks, and a lot of the examples to it.

@jkeiser jkeiser requested a review from lemire June 19, 2020 23:11
@jkeiser
Copy link
Copy Markdown
Member Author

jkeiser commented Jun 19, 2020

This fixes #606.

@jkeiser
Copy link
Copy Markdown
Member Author

jkeiser commented Jun 19, 2020

This also significantly mitigates #949: in most cases, if you're using error codes, you don't need to use templates. There are still cases, though.

@jkeiser
Copy link
Copy Markdown
Member Author

jkeiser commented Jun 19, 2020

One other form I kind of wanted to do was a single-value form:

template<T>
WARN_UNUSED error_code get(T &value);

But I didn't, because the other get() returns true on error. That would make it so when you are using the two-argument form, if (get(v,err)) is how you check for success, but when you use the one-argument form, if (!get(v)) is how you check for success. Having two functions with the same name but opposite result values seemed like a bridge too far.

Having that function but returning a bool seems possible, but I'm not sure we should offer a supported way for users to check for errors but ignore the detailed error code. Though ... there will be plenty of cases where people just roll up all errors into a generic "couldn't read the damn json," so maybe it's OK for them.

@lemire
Copy link
Copy Markdown
Member

lemire commented Jun 20, 2020

Let me start with a stupid question... why not do...

if(error = get(val)) {
  // I have the error "error"
}

(To avoid warnings, you may need to do...

if((error = get(val))) {
  // I have the error "error"
}

)

In C++17, you can even do...

if(error_code error;  
     error = get(val)) {
  // I have the error "error"
}

so you don't even need to dirty you code with the error variable.

@lemire
Copy link
Copy Markdown
Member

lemire commented Jun 20, 2020

One use might be this...

struct mystruct {
  double x;
  int64_t y;
  std:;string_view z;
};
mystruct s;
if(doc.get(s.x,error) && doc.get(s.y,error) && doc.get(s.z,error)) {
 // got struct!
}

This could still work as...

struct mystruct {
  double x;
  int64_t y;
  std:;string_view z;
};
mystruct s;
if(error = doc.get(s.x) && doc.get(s.y) && doc.get(s.z)) {
 // got struct!
}

but extra code might be needed.

@lemire
Copy link
Copy Markdown
Member

lemire commented Jun 20, 2020

I see that a follow-up comment answers my first question.

Comment thread benchmark/bench_dom_api.cpp Outdated
parser.load(NUMBERS_JSON).get<dom::array>().tie(arr, error);
if(error) {
cerr << "could not read " << NUMBERS_JSON << " as an array" << endl;
if (!parser.load(NUMBERS_JSON).get(arr, error)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we went the other way (return error), we would have...

if (error  = parser.load(NUMBERS_JSON).get(arr)) {
}

instead of

if (!parser.load(NUMBERS_JSON).get(arr, error)) {
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked this a lot, so I changed it to this :)

You do need double parentheses to deal with a warning about = in conditions:

if ((error = parser.load(NUMBERS_JSON).get(arr))) {
}

I find it just as readable. More so in some ways: the double paren can serve as a kind of marker that you're seeing one of these error = function() constructions.

And I really like the technique of assigning the error. Otherwise you end up with !function() being success and function() meaning an error, which is just not intuitive.

if (error) { return; }
uint64_t value;
if (!doc["search_metadata"]["count"].get(value, error)) { return; }
if (value != 100) { return; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us compare here... We had....

auto [value, error] = doc["search_metadata"]["count"].get<uint64_t>();
if (error) { return; }

and we are moving to...

uint64_t value;
if (!doc["search_metadata"]["count"].get(value, error)) { return; }

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-comparing, we had:

auto [value, error] = doc["search_metadata"]["count"].get<uint64_t>();
if (error) { return; }

And are now moving to:

uint64_t value;
if ((error = doc["search_metadata"]["count"].get(value)) { return; }

Or, if you don't like stuff in your if statement:

uint64_t value;
auto error = doc["search_metadata"]["count"].get(value);
if (error) { return; }

It's kind of nice to be able to read the types of important variables like uint64_t on the left hand side when you are scanning. I am finding myself turning against the auto [value, error] syntax in favor of this :) What are your thoughts?

Comment thread include/simdjson/dom/element.h Outdated
* @returns true if the value was set, or false if there wsa an error.
*/
template<typename T>
inline bool get(T &value, error_code &error) const noexcept;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned value is not WARN_UNUSED which I think is good.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On purpose :) Since we're not doing the redundant bool thing anymore, the new get() is WARN_UNUSED error_code get(T&).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. I changed it after your suggestions :)

@lemire
Copy link
Copy Markdown
Member

lemire commented Jun 20, 2020

There is a lot of if(!...) which I don't like. it is confusing.

I find this construction easier to reason about...

int64_t val;
obj["val"].get(val, error);
if (error) { cerr << error << endl; }

It is close to what we had.... but better...

int64_t val;
obj["val"].tie<int64_t>().get(val, error);
if (error) { cerr << error << endl; }

Comment thread tests/readme_examples_noexceptions.cpp Outdated

// Casting a JSON element to an integer
uint64_t year;
if (!car["year"].get(year, error)) { cerr << error << endl; exit(1); }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My brain hurts when parsing...

if (!car["year"].get(year, error)) { cerr << error << endl; exit(1); }

I understand it at some level....

(By the way, in C++, you can type not instead of !.)

if (not car["year"].get(year, error)) { cerr << error << endl; exit(1); }

Already, to me, it looks better...

Anything to get close to...

if not get year from car[year]

but compared to English, it is not in order....

Copy link
Copy Markdown
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, it is a clear progress.

I am fine with it but I would almost surely not write the code the way you do. My intuition at this point would be to just ignore the bool return.

I'd go back to what we had, but with a bit less code.

bool b;
doc.get(b,error);
if(error) {cout << error << endl;}

To me, it is a lot less confusing than...

bool b;
if(!doc.get(b,error)) {
  cout << error << endl;
}

If your API allows both constructions, I am happy!!!

@jkeiser
Copy link
Copy Markdown
Member Author

jkeiser commented Jun 20, 2020

@lemire let us see what happens if we remove the construction I just added, and replace it with only WARN_UNUSED error_code get<T>(T&). That would remove the confusion of negation, and I hate out parameters anyway ...

(WARN_UNUSED because now the result is the only way to find out there's an error.)

@jkeiser
Copy link
Copy Markdown
Member Author

jkeiser commented Jun 20, 2020

@lemire I removed get(v,e) and replaced it with e = get(v). I also added it to simdjson_result. I really like it a lot, to the point where I think we can stop recommending auto [v, error] = as a standard practice. Still need it in for loops for key/value pairs, but I think that's it!

really_inline simdjson_result(error_code error) noexcept; ///< @private

inline simdjson_result<dom::element_type> type() const noexcept;
really_inline simdjson_result<dom::element_type> type() const noexcept;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that we did not have really_inline until now. I wonder whether this had performance impacts.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this more as a consistency thing in a lot of cases: we are still using just inline in a lot of places we don't need to.

@lemire
Copy link
Copy Markdown
Member

lemire commented Jun 20, 2020

I think you are getting warnings because...

if (error = size.value["h"].get(height)) { return; }

looks like an error so one has to do...

if ((error = size.value["h"].get(height))) { return; }

@lemire
Copy link
Copy Markdown
Member

lemire commented Jun 20, 2020

I really like it a lot

It also looks good to me and if you did not just do that to humour me, if you really like it, then it is a good sign.

@jkeiser jkeiser changed the title Add bool get<T>(T&,err&) to element replace get<T>.tie(v,e) with e = get(v) Jun 20, 2020
@jkeiser jkeiser changed the title replace get<T>.tie(v,e) with e = get(v) replace get<T>.tie(v,e) with get(v) Jun 20, 2020
@lemire lemire self-requested a review June 20, 2020 19:53
Copy link
Copy Markdown
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@lemire
Copy link
Copy Markdown
Member

lemire commented Jun 20, 2020

I stated it earlier, but with C++17, you can do...

if(error_code e; e = doc.get(val)) {
}

It has the benefit of isolating the declaration.

@jkeiser jkeiser force-pushed the jkeiser/tie-type branch 2 times, most recently from 53174d1 to 94deb92 Compare June 20, 2020 20:04
@jkeiser
Copy link
Copy Markdown
Member Author

jkeiser commented Jun 20, 2020

I really like it a lot

It also looks good to me and if you did not just do that to humour me, if you really like it, then it is a good sign.

Not at all! I actually kind of want to deprecate tie() now. The only place you might need to use auto [a,b] is in a foreach loop over objects or document_stream, and tie won't work there anyway.

I'll suggest that and update documentation to stop suggesting auto [x,y] (except fields) in another PR.

@jkeiser
Copy link
Copy Markdown
Member Author

jkeiser commented Jun 20, 2020

And I LOVE the idea that the best syntax for using simdjson is C++11 compliant, so you don't have to confuse people with alternative syntax :)

@jkeiser jkeiser force-pushed the jkeiser/tie-type branch from 94deb92 to 1b22e4a Compare June 20, 2020 20:10
@jkeiser
Copy link
Copy Markdown
Member Author

jkeiser commented Jun 20, 2020

Oh, the other thing I like about this over auto: it completely eliminates the aliasing warnings auto generates while still being petty readable.

@jkeiser jkeiser force-pushed the jkeiser/tie-type branch from 1b22e4a to ddfe288 Compare June 20, 2020 21:19
@jkeiser jkeiser force-pushed the jkeiser/tie-type branch from ddfe288 to c3d5fe4 Compare June 20, 2020 22:57
@jkeiser jkeiser changed the base branch from jkeiser/get-type to master June 20, 2020 23:00
@lemire
Copy link
Copy Markdown
Member

lemire commented Jun 21, 2020

You have some kind of funny conflict with our main branch which causes failures?

We could deprecate tie.

@jkeiser jkeiser force-pushed the jkeiser/tie-type branch 2 times, most recently from abfbdbb to ddfe288 Compare June 21, 2020 00:36
@jkeiser jkeiser force-pushed the jkeiser/tie-type branch from ddfe288 to a7fc7d4 Compare June 21, 2020 00:57
@jkeiser jkeiser merged commit c25928e into master Jun 21, 2020
@jkeiser jkeiser deleted the jkeiser/tie-type branch June 21, 2020 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants