Bug report
The C accelerator for datetime.fromisoformat() and time.fromisoformat() accepts a
decimal separator (. or ,) that is followed by zero fractional digits when a
timezone designator (+, -, or Z) comes next. The pure-Python implementation
(_pydatetime) correctly rejects these strings with
ValueError: Invalid isoformat string.
Reproduction
>>> import datetime
>>> datetime.datetime.fromisoformat('2020-01-01T12:34:56.+05:00')
datetime.datetime(2020, 1, 1, 12, 34, 56, tzinfo=datetime.timezone(datetime.timedelta(seconds=18000)))
>>> datetime.time.fromisoformat('12:34:56.Z')
datetime.time(12, 34, 56, tzinfo=datetime.timezone.utc)
The C implementation parses these and silently sets microsecond=0. The pure-Python
implementation raises:
>>> import _pydatetime
>>> _pydatetime.datetime.fromisoformat('2020-01-01T12:34:56.+05:00')
Traceback (most recent call last):
...
ValueError: Invalid isoformat string: '2020-01-01T12:34:56.+05:00'
C-vs-pure-Python divergence
| Input |
C accelerator |
Pure-Python (_pydatetime) |
'2020-01-01T12:34:56.+05:00' |
parses, microsecond=0 |
ValueError |
'2020-01-01T12:34:56.Z' |
parses, microsecond=0 |
ValueError |
'2020-01-01T12:34:56,+05:00' |
parses, microsecond=0 |
ValueError |
'12:34:56.+05:00' |
parses, microsecond=0 |
ValueError |
'12:34:56,+05:00' |
parses, microsecond=0 |
ValueError |
'12:34:56.Z' |
parses, microsecond=0 |
ValueError |
Control cases that already agree (both reject / both accept), unchanged:
| Input |
C / pure-Python |
'12:34:56.' |
both ValueError |
'12:34:56.5' |
both time(12, 34, 56, 500000) |
'12:34:56+05:00' |
both accept (no fraction) |
Which side is wrong
The C accelerator is the lenient (wrong) side; pure-Python is correct.
- ISO 8601 §4.2.2.4 and RFC 3339 §5.6: a decimal sign in a time fraction must be
followed by at least one fractional digit. An empty fraction is not valid.
- The C implementation already rejects the standalone trailing decimal separator
('12:34:56.'), so rejecting '12:34:56.+05:00' is consistent with its own
end-of-string behavior; the lenient outcome here is incidental rather than intentional.
Root cause
In Modules/_datetimemodule.c, parse_hh_mm_ss_ff() consumes the ./, separator and
then computes to_parse as the number of fractional-digit characters. When a timezone
designator follows immediately, the time substring handed to the function ends right after
the separator, so the generic end-of-substring check
(if (p >= p_end) return c != '\0';) fires before the decimal-separator branch and
returns "trailing content present" (1) rather than flagging an empty fraction. The caller
then accepts the value because a timezone is present. The standalone-. case is rejected
only because that same end-of-string check happens to map to a ValueError in the
no-timezone path.
Suggested fix
Reject parse_hh_mm_ss_ff() when a ./, separator is seen but no fractional digit
follows, by handling the decimal-separator case before the generic end-of-substring check.
This matches the pure-Python behavior and the spec. A patch and regression tests are ready.
Alternative the maintainers may prefer: make pure-Python lenient instead. The spec
(≥1 digit required) and the C implementation's own rejection of the standalone trailing
separator both argue for fixing C to reject, so that is what the attached patch does.
Environment
Reproduced on current main (3.16.0a0). Affects all maintained branches that ship the C
accelerator.
Linked PRs
Bug report
The C accelerator for
datetime.fromisoformat()andtime.fromisoformat()accepts adecimal separator (
.or,) that is followed by zero fractional digits when atimezone designator (
+,-, orZ) comes next. The pure-Python implementation(
_pydatetime) correctly rejects these strings withValueError: Invalid isoformat string.Reproduction
The C implementation parses these and silently sets
microsecond=0. The pure-Pythonimplementation raises:
C-vs-pure-Python divergence
_pydatetime)'2020-01-01T12:34:56.+05:00'ValueError'2020-01-01T12:34:56.Z'ValueError'2020-01-01T12:34:56,+05:00'ValueError'12:34:56.+05:00'ValueError'12:34:56,+05:00'ValueError'12:34:56.Z'ValueErrorControl cases that already agree (both reject / both accept), unchanged:
'12:34:56.'ValueError'12:34:56.5'time(12, 34, 56, 500000)'12:34:56+05:00'Which side is wrong
The C accelerator is the lenient (wrong) side; pure-Python is correct.
followed by at least one fractional digit. An empty fraction is not valid.
(
'12:34:56.'), so rejecting'12:34:56.+05:00'is consistent with its ownend-of-string behavior; the lenient outcome here is incidental rather than intentional.
Root cause
In
Modules/_datetimemodule.c,parse_hh_mm_ss_ff()consumes the./,separator andthen computes
to_parseas the number of fractional-digit characters. When a timezonedesignator follows immediately, the time substring handed to the function ends right after
the separator, so the generic end-of-substring check
(
if (p >= p_end) return c != '\0';) fires before the decimal-separator branch andreturns "trailing content present" (1) rather than flagging an empty fraction. The caller
then accepts the value because a timezone is present. The standalone-
.case is rejectedonly because that same end-of-string check happens to map to a
ValueErrorin theno-timezone path.
Suggested fix
Reject
parse_hh_mm_ss_ff()when a./,separator is seen but no fractional digitfollows, by handling the decimal-separator case before the generic end-of-substring check.
This matches the pure-Python behavior and the spec. A patch and regression tests are ready.
Alternative the maintainers may prefer: make pure-Python lenient instead. The spec
(≥1 digit required) and the C implementation's own rejection of the standalone trailing
separator both argue for fixing C to reject, so that is what the attached patch does.
Environment
Reproduced on current
main(3.16.0a0). Affects all maintained branches that ship the Caccelerator.
Linked PRs