BUG: raise OverflowError on datetime64 unit conversion overflow#31085
BUG: raise OverflowError on datetime64 unit conversion overflow#31085ngoldbaum merged 6 commits intonumpy:mainfrom
Conversation
The fast-path datetime cast functions in dtype_transfer.c performed `dt * num / denom` without checking whether the multiplication would overflow int64. This caused silent wraparound for large datetime64 values cast to finer-grained units (e.g. datetime64[s] -> datetime64[ns] for dates beyond ~2262). Add a precomputed overflow boundary check before the multiplication in both _strided_to_strided_datetime_cast and _aligned_strided_to_strided_datetime_cast. Also fix the long-standing TODO in NpyDatetime_ConvertDatetime64ToDatetimeStruct where dt *= meta->num could overflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add exact reproducers from the issue reports as test cases: - numpygh-16352: datetime64[h] -> datetime64[ns] upconversion overflow - numpygh-22346: datetime64[s] -> M8[m] downconversion overflow near INT64_MIN Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Let me know if you'd like a hand fixing the tests. |
The single symmetric overflow_limit was too conservative for positive values when denom > 1, causing false OverflowErrors for downconversions (e.g. ns->ms) with large positive values. Split into pos_limit (INT64_MAX / num) and neg_limit ((INT64_MAX - denom + 1) / num) since only negative values risk the additional (denom - 1) subtraction. Also update test_time_to_time to accept OverflowError for extreme values (INT64_MAX) that truly overflow the conversion factor. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ngoldbaum
left a comment
There was a problem hiding this comment.
I asked claude Opus to review this PR and it had a couple minor suggestions that I passed on. I think the overflow checking code is correct.
I guess Pandas doesn't really care about it, but I'm pretty sure timedeltas have the exact same bug, see datetime.c lines 3179 to 3185. Maybe Claude could fix that too? Also totally OK to punt on that if you don't want to do it.
| * Positive values compute dt * num / denom (no subtraction risk). | ||
| * Negative values compute (dt * num - (denom - 1)) / denom, so | ||
| * the limit must also account for the (denom - 1) subtraction. | ||
| */ |
There was a problem hiding this comment.
Maybe add to this comment that NPY_DATETIME_NAT is NPY_MIN_INT64, which is -NPY_MAX_INT64 - 1, and the formula below avoids generating a value that overflows to NPY_DATETIME_NAT.
Maybe also add a test that the exact boundary value triggers an OverflowError. e.g. right now on current main:
>>> np.datetime64(9223372036, "s").astype('datetime64[ns]')
np.datetime64('2262-04-11T23:47:16.000000000')
>>> np.datetime64(9223372037, "s").astype('datetime64[ns]')
np.datetime64('1677-09-21T00:12:43.290448384')
…lta, add tests - Use npy_mul_with_overflow_longlong from templ_common.h instead of hand-rolled division check in NpyDatetime_ConvertDatetime64ToDatetimeStruct - Add overflow checking to cast_timedelta_to_timedelta (same silent-wrap bug) - Expand dtype_transfer.c comments to explain NPY_DATETIME_NAT / NPY_MIN_INT64 - Add exact boundary tests (9223372036s OK, 9223372037s overflows) - Add num != 1 code path tests (datetime64[2s] metadata) - Add timedelta-specific tests: boundary, negative, scalar cast path, NaT Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
I had my claude implement your claude's suggestions.
for timedeltas we wrap the dt64 code and catch any exception to re-raise with a td64-specific message |
There was a problem hiding this comment.
LGTM! This is mergeable as is. The test failure is unrelated (#31129).
I have one comment to reduce the code and comment duplication a little. Define the following helper in a header shared by datetime.c and dtype_transfer.c:
static inline int
_datetime_scale_with_overflow_check(
npy_int64 *dt, npy_int64 num, npy_int64 denom)
{
if (*dt == NPY_DATETIME_NAT) {
return 0;
}
/*
* long comment that's repeated twice in the PR
*/
npy_int64 pos_limit = NPY_MAX_INT64 / num;
npy_int64 neg_limit = (NPY_MAX_INT64 - denom + 1) / num;
if (*dt > pos_limit || *dt < -neg_limit) {
PyErr_SetString(PyExc_OverflowError,
"Overflow when converting between "
"datetime64 units");
return -1;
}
if (*dt < 0) {
*dt = (*dt * num - (denom - 1)) / denom;
}
else {
*dt = *dt * num / denom;
}
return 0;
}and then use it to reduce the boilerplate
(totally untested, please double-check)
|
I guess it also needs a parameter for whether it's a datetime or a timedelta conversion... |
Consolidate the overflow-checked datetime/timedelta scaling logic into a shared static inline helper in _datetime.h, replacing three duplicated implementations across dtype_transfer.c and datetime.c. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| if (*dt > pos_limit || *dt < -neg_limit) { | ||
| PyErr_SetString(PyExc_OverflowError, | ||
| "Overflow when converting between " | ||
| "datetime64 units"); |
There was a problem hiding this comment.
It could be a timedelta too. You need a parameter to format this error message correctly.
Add type_name parameter to _datetime_scale_with_overflow_check so timedelta conversions report "timedelta64 units" instead of "datetime64 units". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ngoldbaum
left a comment
There was a problem hiding this comment.
Thanks for working with me on this! Looks like a clear improvement now and I don't see any remaining issues.
Claude wrote this, I reviewed it manually. But as with #31023, my C-fu is weak. The prompt was pretty directly to adapt the pandas implementation.
closes #16352
closes #22346