Skip to content

BUG: raise OverflowError on datetime64 unit conversion overflow#31085

Merged
ngoldbaum merged 6 commits intonumpy:mainfrom
jbrockmendel:bug-overflowsafe
Apr 9, 2026
Merged

BUG: raise OverflowError on datetime64 unit conversion overflow#31085
ngoldbaum merged 6 commits intonumpy:mainfrom
jbrockmendel:bug-overflowsafe

Conversation

@jbrockmendel
Copy link
Copy Markdown
Contributor

@jbrockmendel jbrockmendel commented Mar 28, 2026

Claude wrote this, I reviewed it manually. But as with #31023, my C-fu is weak. The prompt was pretty directly to adapt the pandas implementation.

closes #16352
closes #22346

jbrockmendel and others added 2 commits March 28, 2026 11:30
The fast-path datetime cast functions in dtype_transfer.c performed
`dt * num / denom` without checking whether the multiplication would
overflow int64. This caused silent wraparound for large datetime64
values cast to finer-grained units (e.g. datetime64[s] -> datetime64[ns]
for dates beyond ~2262).

Add a precomputed overflow boundary check before the multiplication in
both _strided_to_strided_datetime_cast and
_aligned_strided_to_strided_datetime_cast. Also fix the long-standing
TODO in NpyDatetime_ConvertDatetime64ToDatetimeStruct where
dt *= meta->num could overflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add exact reproducers from the issue reports as test cases:
- numpygh-16352: datetime64[h] -> datetime64[ns] upconversion overflow
- numpygh-22346: datetime64[s] -> M8[m] downconversion overflow near INT64_MIN

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ngoldbaum ngoldbaum added this to the 2.4.5 release milestone Apr 1, 2026
@ngoldbaum
Copy link
Copy Markdown
Member

Let me know if you'd like a hand fixing the tests.

The single symmetric overflow_limit was too conservative for positive
values when denom > 1, causing false OverflowErrors for downconversions
(e.g. ns->ms) with large positive values. Split into pos_limit
(INT64_MAX / num) and neg_limit ((INT64_MAX - denom + 1) / num) since
only negative values risk the additional (denom - 1) subtraction.

Also update test_time_to_time to accept OverflowError for extreme
values (INT64_MAX) that truly overflow the conversion factor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked claude Opus to review this PR and it had a couple minor suggestions that I passed on. I think the overflow checking code is correct.

I guess Pandas doesn't really care about it, but I'm pretty sure timedeltas have the exact same bug, see datetime.c lines 3179 to 3185. Maybe Claude could fix that too? Also totally OK to punt on that if you don't want to do it.

* Positive values compute dt * num / denom (no subtraction risk).
* Negative values compute (dt * num - (denom - 1)) / denom, so
* the limit must also account for the (denom - 1) subtraction.
*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add to this comment that NPY_DATETIME_NAT is NPY_MIN_INT64, which is -NPY_MAX_INT64 - 1, and the formula below avoids generating a value that overflows to NPY_DATETIME_NAT.

Maybe also add a test that the exact boundary value triggers an OverflowError. e.g. right now on current main:

>>> np.datetime64(9223372036, "s").astype('datetime64[ns]')
np.datetime64('2262-04-11T23:47:16.000000000')
>>> np.datetime64(9223372037, "s").astype('datetime64[ns]')
np.datetime64('1677-09-21T00:12:43.290448384')

Comment thread numpy/_core/tests/test_datetime.py
Comment thread numpy/_core/src/multiarray/datetime.c
…lta, add tests

- Use npy_mul_with_overflow_longlong from templ_common.h instead of
  hand-rolled division check in NpyDatetime_ConvertDatetime64ToDatetimeStruct
- Add overflow checking to cast_timedelta_to_timedelta (same silent-wrap bug)
- Expand dtype_transfer.c comments to explain NPY_DATETIME_NAT / NPY_MIN_INT64
- Add exact boundary tests (9223372036s OK, 9223372037s overflows)
- Add num != 1 code path tests (datetime64[2s] metadata)
- Add timedelta-specific tests: boundary, negative, scalar cast path, NaT

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jbrockmendel
Copy link
Copy Markdown
Contributor Author

I had my claude implement your claude's suggestions.

I guess Pandas doesn't really care about it, but I'm pretty sure timedeltas have the exact same bug

for timedeltas we wrap the dt64 code and catch any exception to re-raise with a td64-specific message

Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is mergeable as is. The test failure is unrelated (#31129).

I have one comment to reduce the code and comment duplication a little. Define the following helper in a header shared by datetime.c and dtype_transfer.c:

static inline int
_datetime_scale_with_overflow_check(
        npy_int64 *dt, npy_int64 num, npy_int64 denom)
{
    if (*dt == NPY_DATETIME_NAT) {
        return 0;
    }
    /*
     * long comment that's repeated twice in the PR
     */
    npy_int64 pos_limit = NPY_MAX_INT64 / num;
    npy_int64 neg_limit = (NPY_MAX_INT64 - denom + 1) / num;

    if (*dt > pos_limit || *dt < -neg_limit) {
        PyErr_SetString(PyExc_OverflowError,
                "Overflow when converting between "
                "datetime64 units");
        return -1;
    }
    if (*dt < 0) {
        *dt = (*dt * num - (denom - 1)) / denom;
    }
    else {
        *dt = *dt * num / denom;
    }
    return 0;
}

and then use it to reduce the boilerplate

(totally untested, please double-check)

@ngoldbaum
Copy link
Copy Markdown
Member

I guess it also needs a parameter for whether it's a datetime or a timedelta conversion...

Consolidate the overflow-checked datetime/timedelta scaling logic
into a shared static inline helper in _datetime.h, replacing three
duplicated implementations across dtype_transfer.c and datetime.c.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread numpy/_core/src/multiarray/_datetime.h Outdated
if (*dt > pos_limit || *dt < -neg_limit) {
PyErr_SetString(PyExc_OverflowError,
"Overflow when converting between "
"datetime64 units");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a timedelta too. You need a parameter to format this error message correctly.

Add type_name parameter to _datetime_scale_with_overflow_check so
timedelta conversions report "timedelta64 units" instead of
"datetime64 units".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working with me on this! Looks like a clear improvement now and I don't see any remaining issues.

@ngoldbaum ngoldbaum merged commit 849a584 into numpy:main Apr 9, 2026
83 of 85 checks passed
@jbrockmendel jbrockmendel deleted the bug-overflowsafe branch April 9, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: dt64[s].astype("M8[m]") overflow ENH: overflow-safe astype for datetime64/timedelta64 unit conversion

3 participants