[SQL] Support for TIMESTAMP WITH TIME ZONE#6116
Conversation
mythical-fred
left a comment
There was a problem hiding this comment.
Several blockers; please mark this as draft as the description suggests, or address before merge.
| public void issue4146a() { | ||
| // Validated on postgres | ||
| this.qs(""" | ||
| SELECT TIMESTAMP WITH TIME ZONE '2020-01-01 10:10:10 America/New_York' > TIMESTAMP WITH TIME ZONE '2020-01-01 10:10:10 America/Los_Angeles'; | ||
| r | ||
| --- | ||
| false | ||
| (1 row) | ||
|
|
||
| SELECT TIMESTAMP WITH TIME ZONE '2020-01-01 08:10:10 America/New_York' = TIMESTAMP WITH TIME ZONE '2020-01-01 05:10:10 America/Los_Angeles'; | ||
| r | ||
| --- | ||
| true | ||
| (1 row) | ||
|
|
||
| SELECT (TIMESTAMP WITH TIME ZONE '2020-01-01 08:10:10 America/New_York' - TIMESTAMP WITH TIME ZONE '2020-01-01 05:10:10 America/Los_Angeles') SECONDS; | ||
| r | ||
| --- | ||
| 0 | ||
| (1 row) | ||
|
|
||
| SELECT TIMESTAMP_TRUNC(TIMESTAMP WITH TIME ZONE '2020-01-01 08:10:10 America/New_York', HOUR); | ||
| r | ||
| --- | ||
| 2020-01-01 13:00:00 GMT | ||
| (1 row) | ||
|
|
||
| SELECT CAST(TIMESTAMP WITH TIME ZONE '2020-01-01 08:10:10 America/New_York' AS TIMESTAMP); | ||
| r | ||
| --- | ||
| 2020-01-01 13:10:10 | ||
| (1 row)"""); | ||
| } |
There was a problem hiding this comment.
Test coverage is the headline issue for this PR. TIMESTAMP WITH TIME ZONE is a brand-new type that touches: ~50 extract/floor/ceil/date_trunc/timestamp_trunc functions, casts to/from every numeric type, casts to/from string with multiple formats, comparison operators, arithmetic with ShortInterval/LongInterval, tumble and hop windows, Variant round-trip, and connector serialization (Avro, Parquet, Debezium MySQL/Postgres, Snowflake, Pandas, default JSON). Two end-to-end cases (issue4146, issue4146a) do not exercise any of that, and your own PR description explicitly says "I need to implement many more tests - at least one for each function". Per PR_REVIEW_RULES.md: behavior changes without tests are a hard block.
In particular, please add coverage for: NULL handling, DST transitions (America/New_York around the spring-forward boundary), comparisons across different source zones, casting strings with numeric offsets and IANA names, casting numeric ↔ TIMESTAMP_TZ both directions including overflow/underflow, all extract_*/floor_*/ceil_*/date_trunc_*/timestamp_trunc_* units, tumble/hop window semantics, and a row-trip through each connector format including the new timestamp_tz_format config field (which is currently the only thing distinguishing TIMESTAMP WITH TIME ZONE from plain TIMESTAMP on the wire).
| JsonFlavor::DebeziumMySql => Self { | ||
| time_format: TimeFormat::Micros, | ||
| date_format: DateFormat::DaysSinceEpoch, | ||
| // Why is this missing fractions of second? |
There was a problem hiding this comment.
Stray // Why is this missing fractions of second? comment with a question to nobody. Either answer it (Debezium MySQL really does emit second-precision timestamps in this layout) or open an issue and reference it. Don't ship a TODO disguised as a question.
48d602b to
4e74e03
Compare
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
mythical-fred
left a comment
There was a problem hiding this comment.
Re-review on tip 7edb1df (force-pushed since the prior review on 8b53059). Several of my prior blockers were silently fixed without a reply (good); a few are not yet addressed; and the new force-push introduced regressions in cast_to_TimestampTz_s and a Python test file that won't even parse. PR is now draft, which is the right call.
Prior blockers — status
- timestamp.rs SerializeWithContext using
timestamp_formatinstead oftimestamp_tz_format— fixed. serialize_table_record!(TestStruct[3] {...})field count — fixed (now[4]).- DBSPTypeTimestampTz
getMaxValue8 nines — fixed (6 nines). - CastTests
Americas/New_York— fixed. - TypeCompiler TZ branch warning text + precision constant — fixed (note:
timestampPrecisionsWarnedis still shared between the two branches; minor). - TimestampTz
// Is this right for negative timestamps?TODO + Rustdoc copy-paste — gone (the methods were removed; if/when they come back, please don't reintroduce the bug). - TZ section docs typos / heading — fixed.
- Non-actionable cast error — only partially: type name corrected to TIMESTAMP WITH TIME ZONE, but still no format hint as PR_REVIEW_RULES.md asks (see inline).
- Magic-number
Objects.hash(..., 14)— number changed to16, still a bare magic constant (see inline).
Still open / not addressed
- Test coverage. Same two integration tests as before (
issue4146,issue4146a). Your own description still says "I need to implement many more tests - at least one for each function". This is the headline blocker for the PR. - Plain-TIMESTAMP doc typos
data time/asssumed(datetime.md L196-197) — flagged previously to fix in the same PR; still present. - Stray rhetorical
// Why is this missing fractions of second?comment inserde_config.rs:204.
New issues introduced by this force-push
cast_to_TimestampTz_snow uses onlyDateTime::parse_from_rfc3339. That's a regression: the previous version at least acceptedYYYY-MM-DD HH:MM:SS[.f]andYYYY-MM-DD. RFC3339 also doesn't accept IANA zone names, even though both the docs andDBSPTimestampTzLiteraladvertise IANA support. Net effect: literals accept'2020-01-01 10:10:10 America/New_York'butCAST('2020-01-01 10:10:10 America/New_York' AS TIMESTAMP WITH TIME ZONE)will now fail at runtime.python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py— the new TZ test data is broken in three ways (syntax error + two malformed strings). See inline; this file won't even import.DBSPTimestampTzLiteral.getMicroshas a negative-modulo footgun (see inline) — pre-1970 TZ literals will throw fromLocalDateTime.ofEpochSecondbecausenanoOfSecondmust be non-negative.
Keeping verdict at REQUEST_CHANGES until the test coverage gap and the regressed cast/python paths are addressed.
|
|
||
| #[doc(hidden)] | ||
| pub fn cast_to_TimestampTz_s(value: SqlString) -> SqlResult<TimestampTz> { | ||
| if let Ok(v) = DateTime::parse_from_rfc3339(value.str()) { |
There was a problem hiding this comment.
Two regressions vs. the previous implementation, both visible to users:
- No IANA zone names.
DateTime::parse_from_rfc3339only accepts numeric offsets. ButDBSPTimestampTzLiteralanddocs/sql/datetime.mdboth promise IANA names (America/New_York, etc.). SoTIMESTAMP WITH TIME ZONE '2020-01-01 10:10:10 America/New_York'works as a literal butCAST('2020-01-01 10:10:10 America/New_York' AS TIMESTAMP WITH TIME ZONE)fails. Please usechrono_tz(or equivalent) to handle IANA names, then fall back toparse_from_rfc3339for numeric offsets. - Date-only and space-separated forms dropped. The previous version of this function accepted
YYYY-MM-DD HH:MM:SS[.f](no offset, treated as UTC) andYYYY-MM-DD. RFC3339 requires theTseparator and an offset, so both of those casts now fail. At minimum, please keep the%Y-%m-%d %H:%M:%S%.fand%Y-%m-%dfallbacks (UTC-assumed, matching the literal grammar's "assumed to be in UTC").
A test that exercises each accepted format end-to-end (numeric offset, IANA name, date-only, no-offset) belongs with this fix.
| } | ||
|
|
||
| Err(SqlRuntimeError::from_string(format!( | ||
| "Failed to parse '{value}' as a TIMESTAMP WITH TIME ZONE" |
There was a problem hiding this comment.
Type name is now correct, thanks. Per PR_REVIEW_RULES.md, the error should also tell the user what shape was expected. Suggest:
Failed to parse '{value}' as TIMESTAMP WITH TIME ZONE; expected `YYYY-MM-DD HH:MM:SS[.ffffff][±HH:MM | ZoneName]` or `YYYY-MM-DD`.
|
|
||
| @Override | ||
| public int hashCode() { | ||
| return Objects.hash(this.mayBeNull, 16); |
There was a problem hiding this comment.
14 → 16 doesn't address the original concern: it's still a bare literal whose meaning isn't obvious. Either give it a name (HASH_DISCRIMINATOR) or drop it entirely — Objects.hash(this.mayBeNull) plus the per-type class identity in equals/sameType is already sufficient.
| long micros = DBSPTimestampLiteral.getMicrosSinceEpoch(this.value.getLocalTimestampString()); | ||
| LocalDateTime dt = LocalDateTime.ofEpochSecond( | ||
| micros / 1_000_000L, | ||
| (int)(micros % 1_000_000L) * 1000, |
There was a problem hiding this comment.
Negative-modulo footgun for pre-1970 timestamps:
micros / 1_000_000Ltruncates toward zero in Java for negativemicros, andmicros % 1_000_000Lis then negative.LocalDateTime.ofEpochSecondrequiresnanoOfSecondto be in[0, 999_999_999], so it throws on any pre-1970 TZ literal whose absolute value isn't an exact multiple of one second.
Use Math.floorDiv / Math.floorMod (or just go through Instant.ofEpochSecond(0, micros * 1000)), and add a regression test with a pre-1970 IANA-zone literal. Same class of bug I flagged on TimestampTz::get_date last round; please be careful with signed integer arithmetic across this PR generally.
| --- | ||
| 2020-01-01 13:10:10 | ||
| (1 row)"""); | ||
| } |
There was a problem hiding this comment.
Same headline finding as my prior review: still only issue4146 + issue4146a. PR description still says "I need to implement many more tests - at least one for each function". This is the blocker. At minimum please add coverage for: extract_* / floor_* / ceil_* / date_trunc_* / timestamp_trunc_* for every unit, casts to/from numerics + decimal (overflow paths included), casts to/from string with all four advertised input shapes (numeric offset, IANA name, no-offset, date-only), NULL handling, DST around America/New_York spring-forward, tumble/hop window semantics, Variant round-trip, and a row-trip per connector format using the new timestamp_tz_format config. The Python TZ tests can also help here once the file below is fixed.
| "id": 1, | ||
| "c1": "2020-06-21 14:00:00", | ||
| "c2": "2023-02-26 18:00:00", | ||
| "c3": None. |
There was a problem hiding this comment.
Hard error: "c3": None. — trailing . after None is a Python syntax error, so this whole module fails to import and every test in aggregate_tests2 that depends on it is broken. Should be "c3": None,.
| "id": 0, | ||
| "c1": "2014-11-05 08:27:00", | ||
| "c2": "2024-12-05 12:45:00", | ||
| "c3": "2024-12-05T12:45:00Z+05:00", |
There was a problem hiding this comment.
Malformed TZ string: "2024-12-05T12:45:00Z+05:00" mixes the UTC marker Z with an explicit +05:00 offset; they're mutually exclusive in RFC3339. Pick one — likely you meant "2024-12-05T12:45:00+05:00". Whichever you pick, double-check it matches the timestamp_tz_format your test expects to assert.
| "id": 1, | ||
| "c1": "2024-12-05 09:15:00", | ||
| "c2": "2014-11-05 16:30:00", | ||
| "c3": "2014-11-05%16:30:00Z+00:00", |
There was a problem hiding this comment.
Two issues on this row: % instead of T (or space) as the date/time separator, and the same Z+00:00 confusion as the row above. Suggest "2014-11-05T16:30:00+00:00" or "2014-11-05T16:30:00Z".
| JsonFlavor::DebeziumMySql => Self { | ||
| time_format: TimeFormat::Micros, | ||
| date_format: DateFormat::DaysSinceEpoch, | ||
| // Why is this missing fractions of second? |
There was a problem hiding this comment.
Still here from prior review. Either answer the question (does Debezium MySQL really emit second-precision timestamps in this layout? — if so, say so) or open an issue and reference it. Don't ship a TODO disguised as a question.
Fixes #4146
This PR finally adds support for the last important missing SQL type
TIMESTAMP WITH TIME ZONEThere are multiple ways to implement this; I have chosen a version which provides similar capabilities with most databases: timestamps with time zones are stored at runtime as simply UTC timestamps. Thus they are comparable to each other. No time zone information is preserved at runtime, though.
I will mark this PR as a draft for two reasons:
The Rust serialization code will need to be audited for compatibility with other standard formats (e.g., Debezium).