Skip to content

Commit a721aba

Browse files
committed
Issue #26331: Implement the parsing part of PEP 515.
Thanks to Georg Brandl for the patch.
1 parent ee73a65 commit a721aba

File tree

22 files changed

+742
-204
lines changed

22 files changed

+742
-204
lines changed

Doc/library/decimal.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -345,7 +345,7 @@ Decimal objects
345345
*value* can be an integer, string, tuple, :class:`float`, or another :class:`Decimal`
346346
object. If no *value* is given, returns ``Decimal('0')``. If *value* is a
347347
string, it should conform to the decimal numeric string syntax after leading
348-
and trailing whitespace characters are removed::
348+
and trailing whitespace characters, as well as underscores throughout, are removed::
349349

350350
sign ::= '+' | '-'
351351
digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
@@ -394,6 +394,10 @@ Decimal objects
394394
:class:`float` arguments raise an exception if the :exc:`FloatOperation`
395395
trap is set. By default the trap is off.
396396

397+
.. versionchanged:: 3.6
398+
Underscores are allowed for grouping, as with integral and floating-point
399+
literals in code.
400+
397401
Decimal floating point objects share many properties with the other built-in
398402
numeric types such as :class:`float` and :class:`int`. All of the usual math
399403
operations and special methods apply. Likewise, decimal objects can be
@@ -1075,8 +1079,8 @@ In addition to the three supplied contexts, new contexts can be created with the
10751079
Decimal('4.44')
10761080

10771081
This method implements the to-number operation of the IBM specification.
1078-
If the argument is a string, no leading or trailing whitespace is
1079-
permitted.
1082+
If the argument is a string, no leading or trailing whitespace or
1083+
underscores are permitted.
10801084

10811085
.. method:: create_decimal_from_float(f)
10821086

Doc/library/functions.rst

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,9 @@ are always available. They are listed here in alphabetical order.
271271

272272
The complex type is described in :ref:`typesnumeric`.
273273

274+
.. versionchanged:: 3.6
275+
Grouping digits with underscores as in code literals is allowed.
276+
274277

275278
.. function:: delattr(object, name)
276279

@@ -531,11 +534,14 @@ are always available. They are listed here in alphabetical order.
531534

532535
The float type is described in :ref:`typesnumeric`.
533536

534-
.. index::
535-
single: __format__
536-
single: string; format() (built-in function)
537+
.. versionchanged:: 3.6
538+
Grouping digits with underscores as in code literals is allowed.
537539

538540

541+
.. index::
542+
single: __format__
543+
single: string; format() (built-in function)
544+
539545
.. function:: format(value[, format_spec])
540546

541547
Convert a *value* to a "formatted" representation, as controlled by
@@ -702,6 +708,10 @@ are always available. They are listed here in alphabetical order.
702708
:meth:`base.__int__ <object.__int__>` instead of :meth:`base.__index__
703709
<object.__index__>`.
704710

711+
.. versionchanged:: 3.6
712+
Grouping digits with underscores as in code literals is allowed.
713+
714+
705715
.. function:: isinstance(object, classinfo)
706716

707717
Return true if the *object* argument is an instance of the *classinfo*

Doc/reference/lexical_analysis.rst

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -721,20 +721,24 @@ Integer literals
721721
Integer literals are described by the following lexical definitions:
722722

723723
.. productionlist::
724-
integer: `decimalinteger` | `octinteger` | `hexinteger` | `bininteger`
725-
decimalinteger: `nonzerodigit` `digit`* | "0"+
724+
integer: `decinteger` | `bininteger` | `octinteger` | `hexinteger`
725+
decinteger: `nonzerodigit` (["_"] `digit`)* | "0"+ (["_"] "0")*
726+
bininteger: "0" ("b" | "B") (["_"] `bindigit`)+
727+
octinteger: "0" ("o" | "O") (["_"] `octdigit`)+
728+
hexinteger: "0" ("x" | "X") (["_"] `hexdigit`)+
726729
nonzerodigit: "1"..."9"
727730
digit: "0"..."9"
728-
octinteger: "0" ("o" | "O") `octdigit`+
729-
hexinteger: "0" ("x" | "X") `hexdigit`+
730-
bininteger: "0" ("b" | "B") `bindigit`+
731+
bindigit: "0" | "1"
731732
octdigit: "0"..."7"
732733
hexdigit: `digit` | "a"..."f" | "A"..."F"
733-
bindigit: "0" | "1"
734734

735735
There is no limit for the length of integer literals apart from what can be
736736
stored in available memory.
737737

738+
Underscores are ignored for determining the numeric value of the literal. They
739+
can be used to group digits for enhanced readability. One underscore can occur
740+
between digits, and after base specifiers like ``0x``.
741+
738742
Note that leading zeros in a non-zero decimal number are not allowed. This is
739743
for disambiguation with C-style octal literals, which Python used before version
740744
3.0.
@@ -743,6 +747,10 @@ Some examples of integer literals::
743747

744748
7 2147483647 0o177 0b100110111
745749
3 79228162514264337593543950336 0o377 0xdeadbeef
750+
100_000_000_000 0b_1110_0101
751+
752+
.. versionchanged:: 3.6
753+
Underscores are now allowed for grouping purposes in literals.
746754

747755

748756
.. _floating:
@@ -754,23 +762,28 @@ Floating point literals are described by the following lexical definitions:
754762

755763
.. productionlist::
756764
floatnumber: `pointfloat` | `exponentfloat`
757-
pointfloat: [`intpart`] `fraction` | `intpart` "."
758-
exponentfloat: (`intpart` | `pointfloat`) `exponent`
759-
intpart: `digit`+
760-
fraction: "." `digit`+
761-
exponent: ("e" | "E") ["+" | "-"] `digit`+
765+
pointfloat: [`digitpart`] `fraction` | `digitpart` "."
766+
exponentfloat: (`digitpart` | `pointfloat`) `exponent`
767+
digitpart: `digit` (["_"] `digit`)*
768+
fraction: "." `digitpart`
769+
exponent: ("e" | "E") ["+" | "-"] `digitpart`
762770

763771
Note that the integer and exponent parts are always interpreted using radix 10.
764772
For example, ``077e010`` is legal, and denotes the same number as ``77e10``. The
765-
allowed range of floating point literals is implementation-dependent. Some
766-
examples of floating point literals::
773+
allowed range of floating point literals is implementation-dependent. As in
774+
integer literals, underscores are supported for digit grouping.
775+
776+
Some examples of floating point literals::
767777

768-
3.14 10. .001 1e100 3.14e-10 0e0
778+
3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93
769779

770780
Note that numeric literals do not include a sign; a phrase like ``-1`` is
771781
actually an expression composed of the unary operator ``-`` and the literal
772782
``1``.
773783

784+
.. versionchanged:: 3.6
785+
Underscores are now allowed for grouping purposes in literals.
786+
774787

775788
.. _imaginary:
776789

@@ -780,15 +793,15 @@ Imaginary literals
780793
Imaginary literals are described by the following lexical definitions:
781794

782795
.. productionlist::
783-
imagnumber: (`floatnumber` | `intpart`) ("j" | "J")
796+
imagnumber: (`floatnumber` | `digitpart`) ("j" | "J")
784797

785798
An imaginary literal yields a complex number with a real part of 0.0. Complex
786799
numbers are represented as a pair of floating point numbers and have the same
787800
restrictions on their range. To create a complex number with a nonzero real
788801
part, add a floating point number to it, e.g., ``(3+4j)``. Some examples of
789802
imaginary literals::
790803

791-
3.14j 10.j 10j .001j 1e100j 3.14e-10j
804+
3.14j 10.j 10j .001j 1e100j 3.14e-10j 3.14_15_93j
792805

793806

794807
.. _operators:

Doc/whatsnew/3.6.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,29 @@ Windows improvements:
124124
New Features
125125
============
126126

127+
.. _pep-515:
128+
129+
PEP 515: Underscores in Numeric Literals
130+
========================================
131+
132+
Prior to PEP 515, there was no support for writing long numeric
133+
literals with some form of separator to improve readability. For
134+
instance, how big is ``1000000000000000```? With :pep:`515`, though,
135+
you can use underscores to separate digits as desired to make numeric
136+
literals easier to read: ``1_000_000_000_000_000``. Underscores can be
137+
used with other numeric literals beyond integers, e.g.
138+
``0x_FF_FF_FF_FF``.
139+
140+
Single underscores are allowed between digits and after any base
141+
specifier. More than a single underscore in a row, leading, or
142+
trailing underscores are not allowed.
143+
144+
.. seealso::
145+
146+
:pep:`523` - Underscores in Numeric Literals
147+
PEP written by Georg Brandl & Serhiy Storchaka.
148+
149+
127150
.. _pep-523:
128151

129152
PEP 523: Adding a frame evaluation API to CPython

Include/pystrtod.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@ PyAPI_FUNC(char *) PyOS_double_to_string(double val,
1919
int *type);
2020

2121
#ifndef Py_LIMITED_API
22+
PyAPI_FUNC(PyObject *) _Py_string_to_number_with_underscores(
23+
const char *str, Py_ssize_t len, const char *what, PyObject *obj, void *arg,
24+
PyObject *(*innerfunc)(const char *, Py_ssize_t, void *));
25+
2226
PyAPI_FUNC(double) _Py_parse_inf_or_nan(const char *p, char **endptr);
2327
#endif
2428

Lib/_pydecimal.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -589,7 +589,7 @@ def __new__(cls, value="0", context=None):
589589
# From a string
590590
# REs insist on real strings, so we can too.
591591
if isinstance(value, str):
592-
m = _parser(value.strip())
592+
m = _parser(value.strip().replace("_", ""))
593593
if m is None:
594594
if context is None:
595595
context = getcontext()
@@ -4125,7 +4125,7 @@ def _set_rounding(self, type):
41254125
This will make it round up for that operation.
41264126
"""
41274127
rounding = self.rounding
4128-
self.rounding= type
4128+
self.rounding = type
41294129
return rounding
41304130

41314131
def create_decimal(self, num='0'):
@@ -4134,10 +4134,10 @@ def create_decimal(self, num='0'):
41344134
This method implements the to-number operation of the
41354135
IBM Decimal specification."""
41364136

4137-
if isinstance(num, str) and num != num.strip():
4137+
if isinstance(num, str) and (num != num.strip() or '_' in num):
41384138
return self._raise_error(ConversionSyntax,
4139-
"no trailing or leading whitespace is "
4140-
"permitted.")
4139+
"trailing or leading whitespace and "
4140+
"underscores are not permitted.")
41414141

41424142
d = Decimal(num, context=self)
41434143
if d._isnan() and len(d._int) > self.prec - self.clamp:

Lib/test/test_complex.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
import unittest
22
from test import support
3+
from test.test_grammar import (VALID_UNDERSCORE_LITERALS,
4+
INVALID_UNDERSCORE_LITERALS)
35

46
from random import random
57
from math import atan2, isnan, copysign
@@ -377,6 +379,18 @@ def __complex__(self):
377379
self.assertAlmostEqual(complex(complex1(1j)), 2j)
378380
self.assertRaises(TypeError, complex, complex2(1j))
379381

382+
def test_underscores(self):
383+
# check underscores
384+
for lit in VALID_UNDERSCORE_LITERALS:
385+
if not any(ch in lit for ch in 'xXoObB'):
386+
self.assertEqual(complex(lit), eval(lit))
387+
self.assertEqual(complex(lit), complex(lit.replace('_', '')))
388+
for lit in INVALID_UNDERSCORE_LITERALS:
389+
if lit in ('0_7', '09_99'): # octals are not recognized here
390+
continue
391+
if not any(ch in lit for ch in 'xXoObB'):
392+
self.assertRaises(ValueError, complex, lit)
393+
380394
def test_hash(self):
381395
for x in range(-30, 30):
382396
self.assertEqual(hash(x), hash(complex(x, 0)))

Lib/test/test_decimal.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -554,6 +554,10 @@ def test_explicit_from_string(self):
554554
self.assertEqual(str(Decimal(' -7.89')), '-7.89')
555555
self.assertEqual(str(Decimal(" 3.45679 ")), '3.45679')
556556

557+
# underscores
558+
self.assertEqual(str(Decimal('1_3.3e4_0')), '1.33E+41')
559+
self.assertEqual(str(Decimal('1_0_0_0')), '1000')
560+
557561
# unicode whitespace
558562
for lead in ["", ' ', '\u00a0', '\u205f']:
559563
for trail in ["", ' ', '\u00a0', '\u205f']:
@@ -578,6 +582,9 @@ def test_explicit_from_string(self):
578582
# embedded NUL
579583
self.assertRaises(InvalidOperation, Decimal, "12\u00003")
580584

585+
# underscores don't prevent errors
586+
self.assertRaises(InvalidOperation, Decimal, "1_2_\u00003")
587+
581588
@cpython_only
582589
def test_from_legacy_strings(self):
583590
import _testcapi
@@ -772,6 +779,9 @@ def test_explicit_context_create_decimal(self):
772779
self.assertRaises(InvalidOperation, nc.create_decimal, "xyz")
773780
self.assertRaises(ValueError, nc.create_decimal, (1, "xyz", -25))
774781
self.assertRaises(TypeError, nc.create_decimal, "1234", "5678")
782+
# no whitespace and underscore stripping is done with this method
783+
self.assertRaises(InvalidOperation, nc.create_decimal, " 1234")
784+
self.assertRaises(InvalidOperation, nc.create_decimal, "12_34")
775785

776786
# too many NaN payload digits
777787
nc.prec = 3

Lib/test/test_float.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
import fractions
32
import operator
43
import os
@@ -9,6 +8,8 @@
98
import unittest
109

1110
from test import support
11+
from test.test_grammar import (VALID_UNDERSCORE_LITERALS,
12+
INVALID_UNDERSCORE_LITERALS)
1213
from math import isinf, isnan, copysign, ldexp
1314

1415
INF = float("inf")
@@ -60,6 +61,27 @@ def test_float(self):
6061
float(b'.' + b'1'*1000)
6162
float('.' + '1'*1000)
6263

64+
def test_underscores(self):
65+
for lit in VALID_UNDERSCORE_LITERALS:
66+
if not any(ch in lit for ch in 'jJxXoObB'):
67+
self.assertEqual(float(lit), eval(lit))
68+
self.assertEqual(float(lit), float(lit.replace('_', '')))
69+
for lit in INVALID_UNDERSCORE_LITERALS:
70+
if lit in ('0_7', '09_99'): # octals are not recognized here
71+
continue
72+
if not any(ch in lit for ch in 'jJxXoObB'):
73+
self.assertRaises(ValueError, float, lit)
74+
# Additional test cases; nan and inf are never valid as literals,
75+
# only in the float() constructor, but we don't allow underscores
76+
# in or around them.
77+
self.assertRaises(ValueError, float, '_NaN')
78+
self.assertRaises(ValueError, float, 'Na_N')
79+
self.assertRaises(ValueError, float, 'IN_F')
80+
self.assertRaises(ValueError, float, '-_INF')
81+
self.assertRaises(ValueError, float, '-INF_')
82+
# Check that we handle bytes values correctly.
83+
self.assertRaises(ValueError, float, b'0_.\xff9')
84+
6385
def test_non_numeric_input_types(self):
6486
# Test possible non-numeric types for the argument x, including
6587
# subclasses of the explicitly documented accepted types.

0 commit comments

Comments
 (0)