Skip to content

[SQL] Support for TIMESTAMP WITH TIME ZONE#6116

Draft
mihaibudiu wants to merge 1 commit into
feldera:mainfrom
mihaibudiu:issue4146
Draft

[SQL] Support for TIMESTAMP WITH TIME ZONE#6116
mihaibudiu wants to merge 1 commit into
feldera:mainfrom
mihaibudiu:issue4146

Conversation

@mihaibudiu
Copy link
Copy Markdown
Contributor

@mihaibudiu mihaibudiu commented Apr 24, 2026

Fixes #4146

This PR finally adds support for the last important missing SQL type TIMESTAMP WITH TIME ZONE
There are multiple ways to implement this; I have chosen a version which provides similar capabilities with most databases: timestamps with time zones are stored at runtime as simply UTC timestamps. Thus they are comparable to each other. No time zone information is preserved at runtime, though.

I will mark this PR as a draft for two reasons:

The Rust serialization code will need to be audited for compatibility with other standard formats (e.g., Debezium).

Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several blockers; please mark this as draft as the description suggests, or address before merge.

Comment thread crates/sqllib/src/timestamp.rs Outdated
Comment thread crates/sqllib/src/timestamp.rs Outdated
Comment thread crates/sqllib/src/casts.rs
Comment thread crates/sqllib/src/casts.rs Outdated
Comment thread crates/sqllib/src/timestamp.rs Outdated
Comment thread docs.feldera.com/docs/sql/datetime.md Outdated
Comment thread docs.feldera.com/docs/sql/types.md Outdated
Comment on lines +962 to +994
public void issue4146a() {
// Validated on postgres
this.qs("""
SELECT TIMESTAMP WITH TIME ZONE '2020-01-01 10:10:10 America/New_York' > TIMESTAMP WITH TIME ZONE '2020-01-01 10:10:10 America/Los_Angeles';
r
---
false
(1 row)

SELECT TIMESTAMP WITH TIME ZONE '2020-01-01 08:10:10 America/New_York' = TIMESTAMP WITH TIME ZONE '2020-01-01 05:10:10 America/Los_Angeles';
r
---
true
(1 row)

SELECT (TIMESTAMP WITH TIME ZONE '2020-01-01 08:10:10 America/New_York' - TIMESTAMP WITH TIME ZONE '2020-01-01 05:10:10 America/Los_Angeles') SECONDS;
r
---
0
(1 row)

SELECT TIMESTAMP_TRUNC(TIMESTAMP WITH TIME ZONE '2020-01-01 08:10:10 America/New_York', HOUR);
r
---
2020-01-01 13:00:00 GMT
(1 row)

SELECT CAST(TIMESTAMP WITH TIME ZONE '2020-01-01 08:10:10 America/New_York' AS TIMESTAMP);
r
---
2020-01-01 13:10:10
(1 row)""");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage is the headline issue for this PR. TIMESTAMP WITH TIME ZONE is a brand-new type that touches: ~50 extract/floor/ceil/date_trunc/timestamp_trunc functions, casts to/from every numeric type, casts to/from string with multiple formats, comparison operators, arithmetic with ShortInterval/LongInterval, tumble and hop windows, Variant round-trip, and connector serialization (Avro, Parquet, Debezium MySQL/Postgres, Snowflake, Pandas, default JSON). Two end-to-end cases (issue4146, issue4146a) do not exercise any of that, and your own PR description explicitly says "I need to implement many more tests - at least one for each function". Per PR_REVIEW_RULES.md: behavior changes without tests are a hard block.

In particular, please add coverage for: NULL handling, DST transitions (America/New_York around the spring-forward boundary), comparisons across different source zones, casting strings with numeric offsets and IANA names, casting numeric ↔ TIMESTAMP_TZ both directions including overflow/underflow, all extract_*/floor_*/ceil_*/date_trunc_*/timestamp_trunc_* units, tumble/hop window semantics, and a row-trip through each connector format including the new timestamp_tz_format config field (which is currently the only thing distinguishing TIMESTAMP WITH TIME ZONE from plain TIMESTAMP on the wire).

JsonFlavor::DebeziumMySql => Self {
time_format: TimeFormat::Micros,
date_format: DateFormat::DaysSinceEpoch,
// Why is this missing fractions of second?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray // Why is this missing fractions of second? comment with a question to nobody. Either answer it (Debezium MySQL really does emit second-precision timestamps in this layout) or open an issue and reference it. Don't ship a TODO disguised as a question.

@mihaibudiu mihaibudiu force-pushed the issue4146 branch 2 times, most recently from 48d602b to 4e74e03 Compare April 25, 2026 05:46
@mihaibudiu mihaibudiu marked this pull request as draft April 28, 2026 05:28
@mihaibudiu mihaibudiu marked this pull request as draft April 28, 2026 05:28
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review on tip 7edb1df (force-pushed since the prior review on 8b53059). Several of my prior blockers were silently fixed without a reply (good); a few are not yet addressed; and the new force-push introduced regressions in cast_to_TimestampTz_s and a Python test file that won't even parse. PR is now draft, which is the right call.

Prior blockers — status

  • timestamp.rs SerializeWithContext using timestamp_format instead of timestamp_tz_format — fixed.
  • serialize_table_record!(TestStruct[3] {...}) field count — fixed (now [4]).
  • DBSPTypeTimestampTz getMaxValue 8 nines — fixed (6 nines).
  • CastTests Americas/New_York — fixed.
  • TypeCompiler TZ branch warning text + precision constant — fixed (note: timestampPrecisionsWarned is still shared between the two branches; minor).
  • TimestampTz // Is this right for negative timestamps? TODO + Rustdoc copy-paste — gone (the methods were removed; if/when they come back, please don't reintroduce the bug).
  • TZ section docs typos / heading — fixed.
  • Non-actionable cast error — only partially: type name corrected to TIMESTAMP WITH TIME ZONE, but still no format hint as PR_REVIEW_RULES.md asks (see inline).
  • Magic-number Objects.hash(..., 14) — number changed to 16, still a bare magic constant (see inline).

Still open / not addressed

  • Test coverage. Same two integration tests as before (issue4146, issue4146a). Your own description still says "I need to implement many more tests - at least one for each function". This is the headline blocker for the PR.
  • Plain-TIMESTAMP doc typos data time / asssumed (datetime.md L196-197) — flagged previously to fix in the same PR; still present.
  • Stray rhetorical // Why is this missing fractions of second? comment in serde_config.rs:204.

New issues introduced by this force-push

  • cast_to_TimestampTz_s now uses only DateTime::parse_from_rfc3339. That's a regression: the previous version at least accepted YYYY-MM-DD HH:MM:SS[.f] and YYYY-MM-DD. RFC3339 also doesn't accept IANA zone names, even though both the docs and DBSPTimestampTzLiteral advertise IANA support. Net effect: literals accept '2020-01-01 10:10:10 America/New_York' but CAST('2020-01-01 10:10:10 America/New_York' AS TIMESTAMP WITH TIME ZONE) will now fail at runtime.
  • python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py — the new TZ test data is broken in three ways (syntax error + two malformed strings). See inline; this file won't even import.
  • DBSPTimestampTzLiteral.getMicros has a negative-modulo footgun (see inline) — pre-1970 TZ literals will throw from LocalDateTime.ofEpochSecond because nanoOfSecond must be non-negative.

Keeping verdict at REQUEST_CHANGES until the test coverage gap and the regressed cast/python paths are addressed.


#[doc(hidden)]
pub fn cast_to_TimestampTz_s(value: SqlString) -> SqlResult<TimestampTz> {
if let Ok(v) = DateTime::parse_from_rfc3339(value.str()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two regressions vs. the previous implementation, both visible to users:

  1. No IANA zone names. DateTime::parse_from_rfc3339 only accepts numeric offsets. But DBSPTimestampTzLiteral and docs/sql/datetime.md both promise IANA names (America/New_York, etc.). So TIMESTAMP WITH TIME ZONE '2020-01-01 10:10:10 America/New_York' works as a literal but CAST('2020-01-01 10:10:10 America/New_York' AS TIMESTAMP WITH TIME ZONE) fails. Please use chrono_tz (or equivalent) to handle IANA names, then fall back to parse_from_rfc3339 for numeric offsets.
  2. Date-only and space-separated forms dropped. The previous version of this function accepted YYYY-MM-DD HH:MM:SS[.f] (no offset, treated as UTC) and YYYY-MM-DD. RFC3339 requires the T separator and an offset, so both of those casts now fail. At minimum, please keep the %Y-%m-%d %H:%M:%S%.f and %Y-%m-%d fallbacks (UTC-assumed, matching the literal grammar's "assumed to be in UTC").

A test that exercises each accepted format end-to-end (numeric offset, IANA name, date-only, no-offset) belongs with this fix.

}

Err(SqlRuntimeError::from_string(format!(
"Failed to parse '{value}' as a TIMESTAMP WITH TIME ZONE"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type name is now correct, thanks. Per PR_REVIEW_RULES.md, the error should also tell the user what shape was expected. Suggest:

Failed to parse '{value}' as TIMESTAMP WITH TIME ZONE; expected `YYYY-MM-DD HH:MM:SS[.ffffff][±HH:MM | ZoneName]` or `YYYY-MM-DD`.


@Override
public int hashCode() {
return Objects.hash(this.mayBeNull, 16);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1416 doesn't address the original concern: it's still a bare literal whose meaning isn't obvious. Either give it a name (HASH_DISCRIMINATOR) or drop it entirely — Objects.hash(this.mayBeNull) plus the per-type class identity in equals/sameType is already sufficient.

Comment on lines +134 to +137
long micros = DBSPTimestampLiteral.getMicrosSinceEpoch(this.value.getLocalTimestampString());
LocalDateTime dt = LocalDateTime.ofEpochSecond(
micros / 1_000_000L,
(int)(micros % 1_000_000L) * 1000,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative-modulo footgun for pre-1970 timestamps:

  • micros / 1_000_000L truncates toward zero in Java for negative micros, and micros % 1_000_000L is then negative.
  • LocalDateTime.ofEpochSecond requires nanoOfSecond to be in [0, 999_999_999], so it throws on any pre-1970 TZ literal whose absolute value isn't an exact multiple of one second.

Use Math.floorDiv / Math.floorMod (or just go through Instant.ofEpochSecond(0, micros * 1000)), and add a regression test with a pre-1970 IANA-zone literal. Same class of bug I flagged on TimestampTz::get_date last round; please be careful with signed integer arithmetic across this PR generally.

---
2020-01-01 13:10:10
(1 row)""");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same headline finding as my prior review: still only issue4146 + issue4146a. PR description still says "I need to implement many more tests - at least one for each function". This is the blocker. At minimum please add coverage for: extract_* / floor_* / ceil_* / date_trunc_* / timestamp_trunc_* for every unit, casts to/from numerics + decimal (overflow paths included), casts to/from string with all four advertised input shapes (numeric offset, IANA name, no-offset, date-only), NULL handling, DST around America/New_York spring-forward, tumble/hop window semantics, Variant round-trip, and a row-trip per connector format using the new timestamp_tz_format config. The Python TZ tests can also help here once the file below is fixed.

"id": 1,
"c1": "2020-06-21 14:00:00",
"c2": "2023-02-26 18:00:00",
"c3": None.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard error: "c3": None. — trailing . after None is a Python syntax error, so this whole module fails to import and every test in aggregate_tests2 that depends on it is broken. Should be "c3": None,.

"id": 0,
"c1": "2014-11-05 08:27:00",
"c2": "2024-12-05 12:45:00",
"c3": "2024-12-05T12:45:00Z+05:00",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Malformed TZ string: "2024-12-05T12:45:00Z+05:00" mixes the UTC marker Z with an explicit +05:00 offset; they're mutually exclusive in RFC3339. Pick one — likely you meant "2024-12-05T12:45:00+05:00". Whichever you pick, double-check it matches the timestamp_tz_format your test expects to assert.

"id": 1,
"c1": "2024-12-05 09:15:00",
"c2": "2014-11-05 16:30:00",
"c3": "2014-11-05%16:30:00Z+00:00",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues on this row: % instead of T (or space) as the date/time separator, and the same Z+00:00 confusion as the row above. Suggest "2014-11-05T16:30:00+00:00" or "2014-11-05T16:30:00Z".

JsonFlavor::DebeziumMySql => Self {
time_format: TimeFormat::Micros,
date_format: DateFormat::DaysSinceEpoch,
// Why is this missing fractions of second?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still here from prior review. Either answer the question (does Debezium MySQL really emit second-precision timestamps in this layout? — if so, say so) or open an issue and reference it. Don't ship a TODO disguised as a question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for 'timestamp with time zone'

2 participants