perf(spanner): optimize query result decoding#8452
Conversation
There was a problem hiding this comment.
Code Review
This pull request optimizes Spanner decoding performance by extracting regex constants, introducing a pre-compiled column-specific decoder generator (getDecoder), and adding a direct JSON row creation path (_createJsonRow) to bypass wrapper classes. The review feedback highlights critical correctness bugs in codec.ts where logical OR (||) fallbacks incorrectly handle valid falsy values, incomplete bounds checks for PG_OID and INT64 that fail to check against Number.MIN_SAFE_INTEGER, and a potential runtime error when accessing the prototype of columnMetadata if it is a plain object.
cf56674 to
df8089e
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations to the Spanner client's decoding pipeline. Key changes include pre-compiling column-specific decoder functions via 'getDecoder', adding fast-path parsing for standard date and RFC 3339 timestamp strings, optimizing loops, and bypassing class wrappers in JSON mode. Feedback on these changes highlights several correctness issues in 'codec.ts': the date fast-path parser needs to restrict years to '>= 1970' to avoid 2-digit year bugs; nested 'STRUCT' decoders must correctly index nested metadata by field name; empty string field names should be checked using 'name != null' to prevent incorrect fallback; and an unincremented 'index' variable in 'convertFieldsToJson' needs to be replaced with the loop index 'i' to prevent nameless fields from overwriting each other.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations for decoding Spanner values, including fast-path parsing for standard date and UTC timestamp strings, a pre-compiled column-specific decoder (getDecoder), and optimizations in PartialResultStream to bypass class wrappers when creating JSON rows. The review feedback highlights critical improvement opportunities in the fast-path parsing logic, pointing out that using parseInt on unvalidated substrings could lead to silent data corruption for malformed inputs. Additionally, the reviewer identified a bug where struct fields with empty string names would fail to retrieve their associated metadata due to an implicit falsy check.
6d65860 to
34c1f8b
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations for Spanner data decoding. It adds fast-path parsing for standard date and timestamp strings, pre-compiles column-specific decoders via getDecoder to avoid repeated type switch-case evaluations, and implements a direct JSON row creation path in PartialResultStream to bypass wrapper classes. The review feedback highlights critical safety improvements regarding dynamic property access on objects like columnMetadata, val, and columnsMetadata. Specifically, accessing properties using user-defined column names or enum values can accidentally resolve to built-in prototype properties (such as toString or valueOf), leading to runtime errors. Using Object.prototype.hasOwnProperty.call is recommended to safely check for own properties in these cases.
34c1f8b to
e9809d7
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces high-performance fast-path parsing for standard Spanner dates and UTC timestamps, pre-compiles column-specific decoders via getDecoder, and optimizes JSON row creation in PartialResultStream to bypass wrapper classes. Feedback on these changes highlights several correctness and robustness issues: the fast-path date and timestamp parsers should validate constructed date components to prevent silent rollover of invalid dates, and the timestamp parser needs stricter validation against malformed strings. Additionally, the 2-digit year mapping check should include year 0 to ensure it is correctly preserved, and a defensive check should be added in decodeValue_ to prevent runtime errors when the value is undefined.
| const dotIndex = isoString.indexOf('.', 19); | ||
| if (dotIndex !== -1) { | ||
| const subSecondsStr = isoString.substring( | ||
| dotIndex + 1, | ||
| isoString.length - 1, | ||
| ); | ||
| const padded = subSecondsStr.padEnd(9, '0'); | ||
| milliseconds = Number(padded.substring(0, 3)); | ||
| microseconds = Number(padded.substring(3, 6)); | ||
| nanoseconds = Number(padded.substring(6, 9)); | ||
|
|
||
| if ( | ||
| Number.isNaN(milliseconds) || | ||
| Number.isNaN(microseconds) || | ||
| Number.isNaN(nanoseconds) | ||
| ) { | ||
| return new PreciseDate(isoString); | ||
| } | ||
| } | ||
|
|
||
| const utcMillis = Date.UTC( | ||
| year, | ||
| month, | ||
| day, | ||
| hours, | ||
| minutes, | ||
| seconds, | ||
| milliseconds, | ||
| ); | ||
|
|
||
| const preciseDate = new PreciseDate(utcMillis); | ||
| preciseDate.setMicroseconds(microseconds); | ||
| preciseDate.setNanoseconds(nanoseconds); | ||
| return preciseDate; |
There was a problem hiding this comment.
The fast-path parser for PreciseDate has a few correctness issues that can lead to silent acceptance of malformed timestamps or silent rollover of invalid dates:
- If the sub-seconds part contains non-digits after the 9th decimal place (e.g.,
"2021-05-11T16:46:04.872345678abcZ"), the parser will truncate it and successfully parse it instead of falling back to the native constructor (which would produce anInvalid Date). We should validate thatsubSecondsStrcontains only digits usingDIGITS_REGEX. - If there is no dot but the string has extra characters (e.g.,
"2021-05-11T16:46:04abcZ"), the parser will ignore them. We should ensureisoString.length === 20when there is no dot. - Out-of-range components (e.g., month
13or day30in February) will silently roll over viaDate.UTC. We should validate the constructed UTC date components against the parsed components to ensure correctness.
const dotIndex = isoString.indexOf('.', 19);
if (dotIndex !== -1) {
if (dotIndex !== 19) {
return new PreciseDate(isoString);
}
const subSecondsStr = isoString.substring(
dotIndex + 1,
isoString.length - 1,
);
if (!DIGITS_REGEX.test(subSecondsStr)) {
return new PreciseDate(isoString);
}
const padded = subSecondsStr.padEnd(9, '0');
milliseconds = Number(padded.substring(0, 3));
microseconds = Number(padded.substring(3, 6));
nanoseconds = Number(padded.substring(6, 9));
} else if (isoString.length !== 20) {
return new PreciseDate(isoString);
}
const utcMillis = Date.UTC(
year,
month,
day,
hours,
minutes,
seconds,
milliseconds,
);
const dateCheck = new Date(utcMillis);
if (
dateCheck.getUTCFullYear() !== year ||
dateCheck.getUTCMonth() !== month ||
dateCheck.getUTCDate() !== day ||
dateCheck.getUTCHours() !== hours ||
dateCheck.getUTCMinutes() !== minutes ||
dateCheck.getUTCSeconds() !== seconds
) {
return new PreciseDate(isoString);
}
const preciseDate = new PreciseDate(utcMillis);
preciseDate.setMicroseconds(microseconds);
preciseDate.setNanoseconds(nanoseconds);
return preciseDate;
Optimized Row Decoding: Compiles type-specific decoders once per schema and reuses the generated closure for each row, significantly reducing runtime CPU overhead for large result sets.
Optimized STRUCT/JSON Decoding: Refactored the JIT STRUCT compiler to pre-compute field mappings, nameless field filtering, and default option configurations (wrapStructs) at compile-time. Explicitly checks empty-string nameless fields (name !== null && name !== undefined) to ensure correct recursive metadata propagation.
Optimized DATE and TIMESTAMP Parsing: Implemented high-performance fast-path parsers for standard Zulu timestamp and date formats, avoiding regex and split allocation overhead. Hardened these paths using type-safe Number conversions and strict NaN short-circuiting to safely fall back to standard constructors and prevent silent date corruption.