Skip to content

Latest commit

 

History

History
664 lines (508 loc) · 22.8 KB

File metadata and controls

664 lines (508 loc) · 22.8 KB

dumps() function

.. currentmodule:: rapidjson

.. testsetup::

   from rapidjson import (dumps, loads, BM_NONE, BM_UTF8, DM_NONE, DM_ISO8601,
                          DM_UNIX_TIME, DM_ONLY_SECONDS, DM_IGNORE_TZ, DM_NAIVE_IS_UTC,
                          DM_SHIFT_TO_UTC, IM_ANY_ITERABLE, IM_ONLY_LISTS, MM_ANY_MAPPING,
                          MM_ONLY_DICTS, MM_COERCE_KEYS_TO_STRINGS, MM_SORT_KEYS,
                          NM_NATIVE, NM_DECIMAL, NM_NAN, PM_NONE, PM_COMMENTS,
                          PM_TRAILING_COMMAS, UM_NONE, UM_CANONICAL, UM_HEX, WM_COMPACT,
                          WM_PRETTY, WM_SINGLE_LINE_ARRAY)

.. function:: dumps(obj, *, skipkeys=False, ensure_ascii=True, write_mode=WM_COMPACT, \
                    indent=4, default=None, sort_keys=False, number_mode=None, \
                    datetime_mode=None, uuid_mode=None, bytes_mode=BM_UTF8, \
                    iterable_mode=IM_ANY_ITERABLE, mapping_mode=MM_ANY_MAPPING, \
                    allow_nan=True)

   Encode given Python `obj` instance into a ``JSON`` string.

   :param obj: the value to be serialized
   :param bool skipkeys: whether invalid :class:`dict` keys will be skipped
   :param bool ensure_ascii: whether the output should contain only ASCII
                             characters
   :param int write_mode: enable particular pretty print behaviors
   :param indent: indentation width or string to produce pretty printed JSON
   :param callable default: a function that gets called for objects that can't
                            otherwise be serialized
   :param bool sort_keys: whether dictionary keys should be sorted
                          alphabetically
   :param int number_mode: enable particular behaviors in handling numbers
   :param int datetime_mode: how should :class:`datetime`, :class:`time` and
                             :class:`date` instances be handled
   :param int uuid_mode: how should :class:`UUID` instances be handled
   :param int bytes_mode: how should :class:`bytes` instances be handled
   :param int iterable_mode: how should `iterable` values be handled
   :param int mapping_mode: how should `mapping` values be handled
   :param bool allow_nan: *compatibility* flag equivalent to ``number_mode=NM_NAN``
   :returns: A Python :class:`str` instance.


   .. _skip-invalid-keys:
   .. rubric:: `skipkeys`

   If `skipkeys` is true (default: ``False``), then dict keys that are not of a basic type
   (:class:`str`, :class:`int`, :class:`float`, :class:`bool`, ``None``) will be skipped
   instead of raising a :exc:`TypeError`:

   .. doctest::

      >>> dumps({(0,): 'empty tuple', True: 'a true value'})
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: keys must be strings
      >>> dumps({(0,): 'empty tuple', True: 'a true value'},
      ...       skipkeys=True)
      '{}'

   .. note:: `skipkeys` is a backward compatible alias of new
             ``MM_SKIP_NON_STRING_KEYS`` :ref:`mapping mode <mapping_mode>`.

   .. _ensure-ascii:
   .. rubric:: `ensure_ascii`

   If `ensure_ascii` is true (the default), the output is guaranteed to have all incoming
   non-ASCII characters escaped.  If `ensure_ascii` is false, these characters will be
   output as-is:

   .. doctest::

      >>> dumps('The symbol for the Euro currency is €')
      '"The symbol for the Euro currency is \\u20AC"'
      >>> dumps('The symbol for the Euro currency is €',
      ...       ensure_ascii=False)
      '"The symbol for the Euro currency is €"'


   .. _write-mode:
   .. rubric:: `write_mode`

   The `write_mode` controls how ``python-rapidjson`` emits JSON: by default it is
   :data:`WM_COMPACT`, that produces the most compact JSON representation:

   .. doctest::

      >>> dumps([1, 2, {'three': 3, 'four': 4}])
      '[1,2,{"three":3,"four":4}]'

   With :data:`WM_PRETTY` it will use ``RapidJSON``\ 's ``PrettyWriter``, with a default
   `indent` (see below) of four spaces:

   .. doctest::

      >>> print(dumps([1, 2, {'three': 3, 'four': 4}],
      ...       write_mode=WM_PRETTY))
      [
          1,
          2,
          {
              "three": 3,
              "four": 4
          }
      ]

   With :data:`WM_SINGLE_LINE_ARRAY` arrays will be kept on a single line:

   .. doctest::

      >>> print(dumps([1, 2, 'three', [4, 5]],
      ...       write_mode=WM_SINGLE_LINE_ARRAY))
      [1, 2, "three", [4, 5]]
      >>> print(dumps([1, 2, {'three': 3, 'four': 4}],
      ...       write_mode=WM_SINGLE_LINE_ARRAY))
      [1, 2, {
              "three": 3,
              "four": 4
          }]


   .. rubric:: `indent`

   The `indent` parameter may be either a positive integer number or a string: in the
   former case it specifies a number of spaces, while in the latter the string may contain
   zero or more ASCII *whitespace* characters (space, tab ``\t``, newline ``\n`` and
   carriage-return ``\r``), all equals (that is, ``"\n\t"`` is not accepted).

   The integer number or the length of the string determine how many spaces (or the
   characters composing the string) will be used to indent nested structures, when the
   `write_mode` above is not :data:`WM_COMPACT`, and it defaults to 4. Specifying a value
   different from ``None`` automatically sets `write_mode` to :data:`WM_PRETTY`, if not
   explicited.

   By setting `indent` to 0 each array item (when `write_mode` is not
   :data:`WM_SINGLE_LINE_MODE`) and each dictionary value will be followed by a newline. A
   positive integer means that each *level* will be indented by that many spaces:

   .. doctest::

      >>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent=0))
      [
      1,
      2,
      {
      "three": 3,
      "four": 4
      }
      ]
      >>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent=2))
      [
        1,
        2,
        {
          "three": 3,
          "four": 4
        }
      ]
      >>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent=""))
      [
      1,
      2,
      {
      "three": 3,
      "four": 4
      }
      ]
      >>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent="  "))
      [
        1,
        2,
        {
          "three": 3,
          "four": 4
        }
      ]
      >>> print(dumps([1, 2, {'three': 3, 'four': 4}],
      ...       indent="\t").replace('\t', '→ '))
      [
      → 1,
      → 2,
      → {
      → → "three": 3,
      → → "four": 4
      → }
      ]


   .. rubric:: `default`

   The `default` argument may be used to specify a custom serializer for otherwise not
   handled objects. If specified, it should be a function that gets called for such
   objects and returns a JSON encodable version of the object itself or raise a
   :exc:`TypeError`:

   .. doctest::

      >>> class Point(object):
      ...   def __init__(self, x, y):
      ...     self.x = x
      ...     self.y = y
      ...
      >>> point = Point(1,2)
      >>> dumps(point)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: <__main__.Point object at …> is not JSON serializable
      >>> def point_jsonifier(obj):
      ...   if isinstance(obj, Point):
      ...     return {'x': obj.x, 'y': obj.y}
      ...   else:
      ...     raise ValueError('%r is not JSON serializable' % obj)
      ...
      >>> dumps(point, default=point_jsonifier)
      '{"x":1,"y":2}'


   .. _sort-keys:
   .. rubric:: `sort_keys`

   When `sort_keys` is true (default: ``False``), the JSON representation of Python
   dictionaries is sorted by key:

   .. doctest::

      >>> data = {'a': 'A', 'c': 'C', 'i': 'I', 'd': 'D'}
      >>> dumps(data, sort_keys=True)
      '{"a":"A","c":"C","d":"D","i":"I"}'

   .. note:: `sort_keys` is a backward compatible alias of new ``MM_SORT_KEYS``
             :ref:`mapping mode <mapping_mode>`.

   .. doctest::

      >>> dumps(data, mapping_mode=MM_SORT_KEYS)
      '{"a":"A","c":"C","d":"D","i":"I"}'

   The default setting, on modern snakes (that is, on `Python >= 3.7`__), preserves
   original dictionary insertion order:

   .. doctest::

      >>> dumps(data)
      '{"a":"A","c":"C","i":"I","d":"D"}'

   __ https://mail.python.org/pipermail/python-dev/2017-December/151283.html


   .. _dumps-number-mode:
   .. rubric:: `number_mode`

   The `number_mode` argument selects different behaviors in handling numeric values.

   By default *non-numbers* (``nan``, ``inf``, ``-inf``) will be serialized as their
   JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``), because ``NM_NAN`` is
   *on* by default (**NB**: this is *not* compliant with the ``JSON`` standard):

   .. doctest::

      >>> nan = float('nan')
      >>> inf = float('inf')
      >>> dumps([nan, inf])
      '[NaN,Infinity]'
      >>> dumps([nan, inf], number_mode=NM_NAN)
      '[NaN,Infinity]'

   Explicitly setting `number_mode` or using the compatibility option `allow_nan` you can
   avoid that and obtain a ``ValueError`` exception instead:

   .. doctest::

      >>> dumps([nan, inf], number_mode=NM_NATIVE)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      ValueError: Out of range float values are not JSON compliant
      >>> dumps([nan, inf], allow_nan=False)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      ValueError: Out of range float values are not JSON compliant

   Likewise :class:`Decimal` instances cause a ``TypeError`` exception:

   .. doctest::

      >>> from decimal import Decimal
      >>> pi = Decimal('3.1415926535897932384626433832795028841971')
      >>> dumps(pi)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: Decimal(…) is not JSON serializable

   while using :data:`NM_DECIMAL` they will be serialized as their textual representation
   like any other float value:

   .. doctest::

      >>> dumps(pi, number_mode=NM_DECIMAL)
      '3.1415926535897932384626433832795028841971'

   Yet another possible flag affects how numeric values are passed to the underlying
   RapidJSON_ library: by default they are serialized to their string representation by
   the module itself, so they are virtually of unlimited precision:

   .. doctest::

      >>> dumps(123456789012345678901234567890)
      '123456789012345678901234567890'

   With :data:`NM_NATIVE` their binary values will be passed directly instead: this is
   somewhat faster, it is subject to the underlying C library ``long long`` and ``double``
   limits:

   .. doctest::

      >>> dumps(123456789012345678901234567890, number_mode=NM_NATIVE)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      OverflowError: int too big to convert

   These flags can be combined together:

   .. doctest::

      >>> fast_and_precise = NM_NATIVE | NM_DECIMAL | NM_NAN
      >>> dumps([-1, nan, pi], number_mode=fast_and_precise)
      '[-1,NaN,3.1415926535897932384626433832795028841971]'


   .. _dumps-datetime-mode:
   .. rubric:: `datetime_mode`

   By default :class:`date`, :class:`datetime` and :class:`time` instances are not
   serializable:

   .. doctest::

      >>> from datetime import datetime
      >>> right_now = datetime(2016, 8, 28, 13, 14, 52, 277256)
      >>> date = right_now.date()
      >>> time = right_now.time()
      >>> dumps({'date': date, 'time': time, 'timestamp': right_now})
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: datetime(…) is not JSON serializable

   When `datetime_mode` is set to :data:`DM_ISO8601` those values are serialized using the
   common `ISO 8601`_ format:

   .. doctest::

      >>> dumps(['date', date, 'time', time, 'timestamp', right_now],
      ...       datetime_mode=DM_ISO8601)
      '["date","2016-08-28","time","13:14:52.277256","timestamp","2016-08-28T13:14:52.277256"]'

   The `right_now` value is a naïve datetime (because it does not carry the timezone
   information) and is normally assumed to be in the local timezone, whatever your system
   thinks it is. When you instead *know* that your value, even being naïve are actually in
   the UTC_ timezone, you can use the :data:`DM_NAIVE_IS_UTC` flag to inform RapidJSON
   about that:

   .. doctest::

      >>> mode = DM_ISO8601 | DM_NAIVE_IS_UTC
      >>> dumps(['time', time, 'timestamp', right_now], datetime_mode=mode)
      '["time","13:14:52.277256+00:00","timestamp","2016-08-28T13:14:52.277256+00:00"]'

   A variant is :data:`DM_SHIFT_TO_UTC`, that *shifts* all datetime values to the UTC_
   timezone before serializing them:

   .. doctest::

      >>> from datetime import timedelta, timezone
      >>> here = timezone(timedelta(hours=2))
      >>> now = datetime(2016, 8, 28, 20, 31, 11, 84418, here)
      >>> dumps(now, datetime_mode=DM_ISO8601)
      '"2016-08-28T20:31:11.084418+02:00"'
      >>> mode = DM_ISO8601 | DM_SHIFT_TO_UTC
      >>> dumps(now, datetime_mode=mode)
      '"2016-08-28T18:31:11.084418+00:00"'

   With :data:`DM_IGNORE_TZ` the timezone, if present, is simply omitted:

   .. doctest::

      >>> mode = DM_ISO8601 | DM_IGNORE_TZ
      >>> dumps(now, datetime_mode=mode)
      '"2016-08-28T20:31:11.084418"'

   Another :ref:`one-way only <no-unix-time-loads>` alternative format is `Unix time`_:
   with :data:`DM_UNIX_TIME` :class:`date`, :class:`datetime` and :class:`time` instances
   are serialized as a number of seconds, respectively since the ``EPOCH`` for the first
   two kinds and since midnight for the latter:

   .. doctest::

      >>> mode = DM_UNIX_TIME | DM_NAIVE_IS_UTC
      >>> dumps([now, now.date(), now.time()], datetime_mode=mode)
      '[1472409071.084418,1472342400.0,73871.084418]'
      >>> unixtime = float(dumps(now, datetime_mode=mode))
      >>> datetime.fromtimestamp(unixtime, here) == now
      True

   Combining it with the :data:`DM_ONLY_SECONDS` will produce integer values instead,
   dropping *microseconds*:

   .. doctest::

      >>> mode = DM_UNIX_TIME | DM_NAIVE_IS_UTC | DM_ONLY_SECONDS
      >>> dumps([now, now.date(), now.time()], datetime_mode=mode)
      '[1472409071,1472342400,73871]'

   It can be used combined with :data:`DM_SHIFT_TO_UTC` to obtain the timestamp of the
   corresponding UTC_ time:

      >>> mode = DM_UNIX_TIME | DM_SHIFT_TO_UTC
      >>> dumps(now, datetime_mode=mode)
      '1472409071.084418'

   As above, when you know that your values are in the UTC_ timezone, you can use the
   :data:`DM_NAIVE_IS_UTC` flag to get the right result:

   .. doctest::

      >>> a_long_time_ago = datetime(1968, 3, 18, 9, 10, 0, 0)
      >>> mode = DM_UNIX_TIME | DM_NAIVE_IS_UTC
      >>> dumps([a_long_time_ago, a_long_time_ago.date(), a_long_time_ago.time()],
      ...       datetime_mode=mode)
      '[-56472600.0,-56505600.0,33000.0]'


   .. _dumps-uuid-mode:
   .. rubric:: `uuid_mode`

   Likewise, to handle :class:`UUID` instances there are two modes that can be specified
   with the `uuid_mode` argument, that will use the string representation of their values:

   .. doctest::

      >>> from uuid import uuid4
      >>> random_uuid = uuid4()
      >>> dumps(random_uuid)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: UUID(…) is not JSON serializable
      >>> dumps(random_uuid, uuid_mode=UM_CANONICAL) # doctest: +SKIP
      '"be576345-65b5-4fc2-92c5-94e2f82e38fd"'
      >>> dumps(random_uuid, uuid_mode=UM_HEX) # doctest: +SKIP
      '"be57634565b54fc292c594e2f82e38fd"'


   .. _dumps-bytes-mode:
   .. rubric:: `bytes_mode`

   By default all :class:`bytes` instances are assumed to be ``UTF-8`` encoded strings,
   and acted on accordingly:

   .. doctest::

      >>> ascii_string = 'ciao'
      >>> bytes_string = b'cio\xc3\xa8'
      >>> unicode_string = 'cioè'
      >>> dumps([ascii_string, bytes_string, unicode_string])
      '["ciao","cio\\u00E8","cio\\u00E8"]'

   Sometime you may prefer a different approach, explicitly disabling that behavior using
   the :data:`BM_NONE` mode:

   .. doctest::

      >>> dumps([ascii_string, bytes_string, unicode_string],
      ...       bytes_mode=BM_NONE)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: b'cio\xc3\xa8' is not JSON serializable
      >>> my_bytes_handler = lambda b: b.decode('UTF-8').upper()
      >>> dumps([ascii_string, bytes_string, unicode_string],
      ...       bytes_mode=BM_NONE, default=my_bytes_handler)
      '["ciao","CIO\\u00C8","cio\\u00E8"]'


   .. dumps-iterable-mode:
   .. rubric:: `iterable_mode`

   By default a value that implements the `iterable` protocol gets encoded as a ``JSON``
   array:

   .. doctest::

      >>> from time import localtime, struct_time
      >>> lt = localtime()
      >>> dumps(lt) # doctest: +SKIP
      '[2020,11,28,19,55,40,5,333,0]'
      >>> class MyList(list):
      ...   pass
      >>> ml = MyList((1,2,3))
      >>> dumps(ml)
      '[1,2,3]'

   When that's not appropriate, for example because you want to use a different way to
   encode them, you may specify `iterable_mode` to ``IM_ONLY_LISTS``:

   .. doctest::

      >>> dumps(lt, iterable_mode=IM_ONLY_LISTS)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: <time.struct_time …> is not JSON serializable
      >>> dumps(ml, iterable_mode=IM_ONLY_LISTS)
      Traceback (most recent call last):
        ...
      TypeError: [1, 2, 3] is not JSON serializable

   and thus you can use the `default` argument:

   .. doctest::

      >>> def ts_or_ml(obj):
      ...   if isinstance(obj, struct_time):
      ...     return {'__class__': 'time.struct_time', '__init__': list(obj)}
      ...   elif isinstance(obj, MyList):
      ...     return [i*2 for i in obj]
      ...   else:
      ...     raise ValueError('%r is not JSON serializable' % obj)
      >>> dumps(lt, iterable_mode=IM_ONLY_LISTS, default=ts_or_ml) # doctest: +SKIP
      '{"__class__":"time.struct_time","__init__":[2020,11,28,19,55,40,5,333,0]}'
      >>> dumps(ml, iterable_mode=IM_ONLY_LISTS, default=ts_or_ml)
      '[2,4,6]'

   Obviously, in such case the value returned by the `default` callable **must not**
   be or contain a ``tuple``:

      >>> def bad_timestruct(obj):
      ...   if isinstance(obj, struct_time):
      ...     return {'__class__': 'time.struct_time', '__init__': tuple(obj)}
      ...   else:
      ...     raise ValueError('%r is not JSON serializable' % (obj,))
      >>> dumps(lt, iterable_mode=IM_ONLY_LISTS, default=bad_timestruct)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      ValueError: (…) is not JSON serializable


   .. dumps-mapping-mode:
   .. rubric:: `mapping_mode`

   By default a value that implements the `mapping` protocol gets encoded as a ``JSON``
   object:

   .. doctest::

      >>> from collections import Counter
      >>> d = {"a":1,"b":2,"c":3}
      >>> c = Counter(d)
      >>> dumps([c, d])
      '[{"a":1,"b":2,"c":3},{"a":1,"b":2,"c":3}]'

   When that's not appropriate, for example because you want to use a different way to
   encode them, you may specify `mapping_mode` to ``MM_ONLY_DICTS``:

   .. doctest::

      >>> dumps(d, mapping_mode=MM_ONLY_DICTS)
      '{"a":1,"b":2,"c":3}'
      >>> dumps(c, mapping_mode=MM_ONLY_DICTS)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: Counter(…) is not JSON serializable

   and thus you can use the `default` argument:

   .. doctest::

      >>> def counter(obj):
      ...   if isinstance(obj, Counter):
      ...     return {'__class__': 'collections.Counter', '__init__': dict(obj)}
      ...   else:
      ...     raise ValueError('%r is not JSON serializable' % obj)
      >>> dumps(c, mapping_mode=MM_ONLY_DICTS, default=counter)
      '{"__class__":"collections.Counter","__init__":{"a":1,"b":2,"c":3}}'

   Obviously, in such case the value returned by the `default` callable **must not**
   be or contain mappings other than plain ``dict``\ s:

      >>> from collections import OrderedDict
      >>> def bad_counter(obj):
      ...   if isinstance(obj, Counter):
      ...     return {'__class__': 'time.struct_time', '__init__': OrderedDict(obj)}
      ...   else:
      ...     raise ValueError('%r is not JSON serializable' % (obj,))
      >>> dumps(c, mapping_mode=MM_ONLY_DICTS, default=bad_counter)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      ValueError: OrderedDict([('a', 1), ('b', 2), ('c', 3)]) is not JSON serializable

   Normally, dumping a dictionary containing *non-string* keys raises a ``TypeError``
   exception:

   .. doctest::

      >>> dumps({-1: 'minus-one'})
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: keys must be strings

   Setting `mapping_mode` to ``MM_COERCE_KEYS_TO_STRINGS`` such keys will be converted to
   their string representation:

   .. doctest::

      >>> dumps({-1: 'minus-one', True: "good", False: "bad", None: "ugly"},
      ...       mapping_mode=MM_COERCE_KEYS_TO_STRINGS)
      '{"-1":"minus-one","True":"good","False":"bad","None":"ugly"}'

   Alternatively, by providing a `default` function you can have finer control on how they
   should be encoded. For example the following mimics the default behaviour of the
   standard library ``json`` module:

   .. doctest::

      >>> def mimic_stdlib_json(obj):
      ...   if isinstance(obj, dict):
      ...     result = {}
      ...     for key in obj:
      ...       if key is True:
      ...         result['true'] = obj[key]
      ...       elif key is False:
      ...         result['false'] = obj[key]
      ...       elif key is None:
      ...         result['null'] = obj[key]
      ...       elif isinstance(key, (int, float)):
      ...         result[str(key)] = obj[key]
      ...       else:
      ...         raise TypeError('keys must be str, int, float, bool or None')
      ...     return result
      ...   else:
      ...     raise ValueError('%r is not JSON serializable' % (obj,))
      >>> dumps({True: 'good', False: 'bad', None: 'ugly'},
      ...       default=mimic_stdlib_json)
      '{"true":"good","false":"bad","null":"ugly"}'

   .. warning:: This can lead to an infinite recursion error, if the `default` function
                returns a dictionary that still contains *non-string* keys:

                .. doctest::

                   >>> dumps({True: 'vero', False: 'falso'},
                   ...       default=lambda map: map)
                   Traceback (most recent call last):
                     File "<stdin>", line 1, in <module>
                   RecursionError: maximum recursion depth exceeded