Skip to content

Commit feb1e7f

Browse files
authored
work on the ondemand iterators (#2590)
* work on the ondemand iterators * guarding two SIMDJSON_ASSUME * simplify following @jkeiser's comment * adding safety rails to the iterators * silencing a warning. * updating the amalgamation files
1 parent 2c7fbc1 commit feb1e7f

18 files changed

+24867
-18235
lines changed

CMakeLists.txt

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ if(SIMDJSON_BUILD_STATIC_LIB)
117117
add_library(simdjson_static STATIC ${SIMDJSON_SOURCES})
118118
add_library(simdjson::simdjson_static ALIAS simdjson_static)
119119
list(APPEND SIMDJSON_LIBRARIES simdjson_static)
120-
120+
121121
# Reuse precompiled headers for static library
122122
if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.16")
123123
target_precompile_headers(simdjson_static REUSE_FROM simdjson)
@@ -196,7 +196,12 @@ if(
196196
target_compile_options PRIVATE
197197
$<$<CONFIG:DEBUG>:-Og>
198198
)
199-
endif()
199+
# We still want to enable development checks in Debug mode
200+
simdjson_add_props(
201+
target_compile_definitions PUBLIC
202+
SIMDJSON_DEVELOPMENT_CHECKS
203+
)
204+
endif()
200205

201206
if(SIMDJSON_ENABLE_THREADS)
202207
find_package(Threads REQUIRED)

doc/basics.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -354,7 +354,13 @@ the macro `SIMDJSON_DEVELOPMENT_CHECKS` to 1 prior to including
354354
the `simdjson.h` header to enable these additional checks: just make sure you remove the
355355
definition once your code has been tested. When `SIMDJSON_DEVELOPMENT_CHECKS` is set to 1, the
356356
simdjson library runs additional (expensive) tests on your code to help ensure that you are
357-
using the library in a safe manner.
357+
using the library in a safe manner. We add asserts which may halt your program, helping
358+
you find the bad programming pattern.
359+
360+
When `SIMDJSON_DEVELOPMENT_CHECKS`, some of our data structures contain extra data for
361+
tracking explicitly potential programming mistakes. Thus you should not relying on the
362+
size (`sizeof`) of our data structures to be constant: they may change depending on the
363+
compiler settings.
358364

359365
Once your code has been tested, you can then run it in
360366
Release mode: under Visual Studio, it means having the `_DEBUG` macro undefined, and, for other
@@ -437,8 +443,8 @@ support for users who avoid exceptions. See [the simdjson error handling documen
437443

438444
If you know the type of the value, you can cast it right there, too! `for (double value : array) { ... }`.
439445

440-
You may also use explicit iterators: `for(auto i = array.begin(); i != array.end(); i++) {}`. You can check that an array is empty with the condition `auto i = array.begin(); if (i == array.end()) {...}`.
441-
* **Object Iteration:** You can iterate through an object's fields, as well: `for (auto field : object) { ... }`. You may also use explicit iterators : `for(auto i = object.begin(); i != object.end(); i++) { auto field = *i; .... }`. You can check that an object is empty with the condition `auto i = object.begin(); if (i == object.end()) {...}`.
446+
You may also use explicit iterators: `for(auto i = array.begin(); i != array.end(); i++) {}`. You can check that an array is empty with the condition `auto i = array.begin(); if (i == array.end()) {...}`. You should derefence (`*i`) an iterator at most once before incrementing it (`i++`), when compiling in debug mode with development checks, we add asserts to help you identify such a mistake.
447+
* **Object Iteration:** You can iterate through an object's fields, as well: `for (auto field : object) { ... }`.
442448
- `field.unescaped_key()` will get you the unescaped key string as a `std::string_view` instance. E.g., the JSON string `"\u00e1"` becomes the Unicode string `á`. Optionally, you pass `true` as a parameter to the `unescaped_key` method if you want invalid escape sequences to be replaced by a default replacement character (e.g., `\ud800\ud801\ud811`): otherwise bad escape sequences lead to an immediate error.
443449
- `field.escaped_key()` will get you the key string as as a `std::string_view` instance, but unlike `unescaped_key()`, the key is not processed, so no unescaping is done. E.g., the JSON string `"\u00e1"` becomes the Unicode string `\u00e1`. We expect that `escaped_key()` is faster than `field.unescaped_key()`.
444450
- `field.value()` will get you the value, which you can then use all these other methods on.
@@ -451,6 +457,10 @@ support for users who avoid exceptions. See [the simdjson error handling documen
451457

452458
When you are iterating through an object, you are advancing through its keys and values. You should not also access the object or other objects. E.g. within a loop over `myobject`, you should not be accessing `myobject`. The following is an anti-pattern: `for(auto value: myobject) {myobject["mykey"]}`.
453459

460+
We discourage using the object iterators explicitly: `for(auto i = object.begin(); i != object.end(); i++) { auto field = *i; .... }`. In addition to the usual requirement to check against `end()` prior to dereferencing, you must also always dereference the pointer (`*it`) exactly once before you increment it (`it++`). You must also only deference the iterator once (never more than once). When compiling in
461+
debug mode with development checks, we add asserts to help check whether you correctly
462+
dereferenced the pointer before incrementing it.
463+
454464
You should never reset an object as you are iterating through it. The following is an anti-pattern: `for(auto value: myobject) {myobject.reset()}`.
455465
* **Array Index:** Because it is forward-only, you cannot look up an array element by index. Instead,
456466
you should iterate through the array and keep an index yourself. Exceptionally, if need a single value

include/simdjson/common_defs.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -289,7 +289,9 @@ namespace std {
289289
// when the compiler is optimizing.
290290
// We only set SIMDJSON_DEVELOPMENT_CHECKS if both __OPTIMIZE__
291291
// and NDEBUG are not defined.
292-
#if !defined(__OPTIMIZE__) && !defined(NDEBUG)
292+
// We recognize _DEBUG as overriding __OPTIMIZE__ so that if both
293+
// __OPTIMIZE__ and _DEBUG are defined, we still set SIMDJSON_DEVELOPMENT_CHECKS.
294+
#if ((!defined(__OPTIMIZE__) || defined(_DEBUG)) && !defined(NDEBUG))
293295
#define SIMDJSON_DEVELOPMENT_CHECKS 1
294296
#endif // __OPTIMIZE__
295297
#endif // _MSC_VER

include/simdjson/error-inl.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
namespace simdjson {
99

1010
inline bool is_fatal(error_code error) noexcept {
11-
return error == TAPE_ERROR || error == INCOMPLETE_ARRAY_OR_OBJECT;
11+
return error == TAPE_ERROR || error == INCOMPLETE_ARRAY_OR_OBJECT || error == OUT_OF_ORDER_ITERATION || error == DEPTH_ERROR;
1212
}
1313

1414
namespace internal {

include/simdjson/generic/ondemand/array_iterator-inl.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@ simdjson_inline array_iterator::array_iterator(const value_iterator &_iter) noex
1717
{}
1818

1919
simdjson_inline simdjson_result<value> array_iterator::operator*() noexcept {
20+
#if SIMDJSON_DEVELOPMENT_CHECKS
21+
SIMDJSON_ASSUME(!has_been_referenced);
22+
has_been_referenced = true;
23+
#endif
2024
if (iter.error()) { iter.abandon(); return iter.error(); }
2125
return value(iter.child());
2226
}
@@ -27,6 +31,9 @@ simdjson_inline bool array_iterator::operator!=(const array_iterator &) const no
2731
return iter.is_open();
2832
}
2933
simdjson_inline array_iterator &array_iterator::operator++() noexcept {
34+
#if SIMDJSON_DEVELOPMENT_CHECKS
35+
has_been_referenced = false;
36+
#endif
3037
error_code error;
3138
// PERF NOTE this is a safety rail ... users should exit loops as soon as they receive an error, so we'll never get here.
3239
// However, it does not seem to make a perf difference, so we add it out of an abundance of caution.

include/simdjson/generic/ondemand/array_iterator.h

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
#ifndef SIMDJSON_CONDITIONAL_INCLUDE
44
#define SIMDJSON_GENERIC_ONDEMAND_ARRAY_ITERATOR_H
5+
#include <iterator>
56
#include "simdjson/generic/implementation_simdjson_result_base.h"
67
#include "simdjson/generic/ondemand/base.h"
78
#include "simdjson/generic/ondemand/value_iterator.h"
@@ -17,11 +18,17 @@ namespace ondemand {
1718
*
1819
* This is an input_iterator, meaning:
1920
* - It is forward-only
20-
* - * must be called exactly once per element.
21+
* - * must be called at most once per element.
2122
* - ++ must be called exactly once in between each * (*, ++, *, ++, * ...)
2223
*/
2324
class array_iterator {
2425
public:
26+
using iterator_category = std::input_iterator_tag;
27+
using value_type = simdjson_result<value>;
28+
using difference_type = std::ptrdiff_t;
29+
using pointer = void;
30+
using reference = value_type;
31+
2532
/** Create a new, invalid array iterator. */
2633
simdjson_inline array_iterator() noexcept = default;
2734

@@ -65,6 +72,9 @@ class array_iterator {
6572
simdjson_warn_unused simdjson_inline bool at_end() const noexcept;
6673

6774
private:
75+
#if SIMDJSON_DEVELOPMENT_CHECKS
76+
bool has_been_referenced{false};
77+
#endif
6878
value_iterator iter{};
6979

7080
simdjson_inline array_iterator(const value_iterator &iter) noexcept;
@@ -82,6 +92,12 @@ namespace simdjson {
8292

8393
template<>
8494
struct simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::array_iterator> : public SIMDJSON_IMPLEMENTATION::implementation_simdjson_result_base<SIMDJSON_IMPLEMENTATION::ondemand::array_iterator> {
95+
using iterator_category = std::input_iterator_tag;
96+
using value_type = simdjson_result<SIMDJSON_IMPLEMENTATION::ondemand::value>;
97+
using difference_type = std::ptrdiff_t;
98+
using pointer = void;
99+
using reference = value_type;
100+
85101
simdjson_inline simdjson_result(SIMDJSON_IMPLEMENTATION::ondemand::array_iterator &&value) noexcept; ///< @private
86102
simdjson_inline simdjson_result(error_code error) noexcept; ///< @private
87103
simdjson_inline simdjson_result() noexcept = default;

include/simdjson/generic/ondemand/json_iterator-inl.h

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -215,11 +215,10 @@ simdjson_inline void json_iterator::assert_more_tokens(uint32_t required_tokens)
215215
}
216216

217217
simdjson_inline void json_iterator::assert_valid_position(token_position position) const noexcept {
218+
(void)position; // Suppress unused parameter warning
218219
#ifndef SIMDJSON_CLANG_VISUAL_STUDIO
219220
SIMDJSON_ASSUME( position >= &parser->implementation->structural_indexes[0] );
220221
SIMDJSON_ASSUME( position < &parser->implementation->structural_indexes[parser->implementation->n_structural_indexes] );
221-
#else
222-
(void)position; // Suppress unused parameter warning
223222
#endif
224223
}
225224

@@ -358,7 +357,11 @@ simdjson_inline token_position json_iterator::position() const noexcept {
358357
simdjson_inline simdjson_result<std::string_view> json_iterator::unescape(raw_json_string in, bool allow_replacement) noexcept {
359358
#if SIMDJSON_DEVELOPMENT_CHECKS
360359
auto result = parser->unescape(in, _string_buf_loc, allow_replacement);
360+
#if !defined(SIMDJSON_VISUAL_STUDIO) && !defined(SIMDJSON_CLANG_VISUAL_STUDIO)
361+
// Under Visual Studio, the next SIMDJSON_ASSUME fails with: the argument
362+
// has side effects that will be discarded.
361363
SIMDJSON_ASSUME(!parser->string_buffer_overflow(_string_buf_loc));
364+
#endif // !defined(SIMDJSON_VISUAL_STUDIO) && !defined(SIMDJSON_CLANG_VISUAL_STUDIO)
362365
return result;
363366
#else
364367
return parser->unescape(in, _string_buf_loc, allow_replacement);
@@ -368,7 +371,11 @@ simdjson_inline simdjson_result<std::string_view> json_iterator::unescape(raw_js
368371
simdjson_inline simdjson_result<std::string_view> json_iterator::unescape_wobbly(raw_json_string in) noexcept {
369372
#if SIMDJSON_DEVELOPMENT_CHECKS
370373
auto result = parser->unescape_wobbly(in, _string_buf_loc);
374+
#if !defined(SIMDJSON_VISUAL_STUDIO) && !defined(SIMDJSON_CLANG_VISUAL_STUDIO)
375+
// Under Visual Studio, the next SIMDJSON_ASSUME fails with: the argument
376+
// has side effects that will be discarded.
371377
SIMDJSON_ASSUME(!parser->string_buffer_overflow(_string_buf_loc));
378+
#endif // !defined(SIMDJSON_VISUAL_STUDIO) && !defined(SIMDJSON_CLANG_VISUAL_STUDIO)
372379
return result;
373380
#else
374381
return parser->unescape_wobbly(in, _string_buf_loc);

include/simdjson/generic/ondemand/object.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,13 @@ class object {
2727
*/
2828
simdjson_inline object() noexcept = default;
2929

30+
/**
31+
* Get an iterator to the start of the object. We recommend using a range-based for loop.
32+
*
33+
* Using the iterator directly is also possible but error-prone and discouraged. In particular,
34+
* you must dereference the iterator exactly once per iteration (before calling '++').
35+
* Doing otherwise is unsafe and may lead to errors. You are responsible for ensuring
36+
*/
3037
simdjson_inline simdjson_result<object_iterator> begin() noexcept;
3138
simdjson_inline simdjson_result<object_iterator> end() noexcept;
3239
/**

include/simdjson/generic/ondemand/object_iterator-inl.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,11 @@ simdjson_inline object_iterator::object_iterator(const value_iterator &_iter) no
2121
{}
2222

2323
simdjson_inline simdjson_result<field> object_iterator::operator*() noexcept {
24+
#if SIMDJSON_DEVELOPMENT_CHECKS
25+
// We must call * once per iteration.
26+
SIMDJSON_ASSUME(!has_been_referenced);
27+
has_been_referenced = true;
28+
#endif
2429
error_code error = iter.error();
2530
if (error) { iter.abandon(); return error; }
2631
auto result = field::start(iter);
@@ -39,6 +44,11 @@ simdjson_inline bool object_iterator::operator!=(const object_iterator &) const
3944
SIMDJSON_PUSH_DISABLE_WARNINGS
4045
SIMDJSON_DISABLE_STRICT_OVERFLOW_WARNING
4146
simdjson_inline object_iterator &object_iterator::operator++() noexcept {
47+
#if SIMDJSON_DEVELOPMENT_CHECKS
48+
// Before calling ++, we must have called *.
49+
SIMDJSON_ASSUME(has_been_referenced);
50+
has_been_referenced = false;
51+
#endif
4252
// TODO this is a safety rail ... users should exit loops as soon as they receive an error.
4353
// Nonetheless, let's see if performance is OK with this if statement--the compiler may give it to us for free.
4454
if (!iter.is_open()) { return *this; } // Iterator will be released if there is an error

include/simdjson/generic/ondemand/object_iterator.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,14 @@ class object_iterator {
3232
// Assumes it's being compared with the end. true if depth >= iter->depth.
3333
simdjson_inline bool operator!=(const object_iterator &) const noexcept;
3434
// Checks for ']' and ','
35+
// YOU MUST NOT CALL THIS IF operator* YIELDED AN ERROR.
36+
// YOU MUST NOT CALL THIS WITHOUT A CORRESPONDING operator* CALL.
3537
simdjson_inline object_iterator &operator++() noexcept;
3638

3739
private:
40+
#if SIMDJSON_DEVELOPMENT_CHECKS
41+
bool has_been_referenced{false};
42+
#endif
3843
/**
3944
* The underlying JSON iterator.
4045
*

0 commit comments

Comments
 (0)