Skip to content

Performance issue in BytesQueueBuffer.get() byte slicing #3710

@phenylshima

Description

@phenylshima

Subject

BytesQueueBuffer.get() is slow when downloading gzipped, unchunked data.

By profiling with line_profiler, I identified the slowest part in the BytesQueueBuffer.get(), specifically the following line:

left_chunk, right_chunk = chunk[:remaining], chunk[remaining:]

Environment

OS Linux-6.12.48-1-MANJARO-x86_64-with-glibc2.42
Python 3.12.5
OpenSSL 3.0.14 4 Jun 2024
urllib3 0.1.dev4309

I also confirmed this with python 3.14.

As this is a performance issue, I will put the CPU and memory info just in case:

CPU: 13th Gen Intel(R) Core(TM) i5-13500

> free
               total        used        free      shared  buff/cache   available
Mem:           31833       11164       15246        1254        7134       20669
Swap:              0           0           0

Steps to Reproduce

  1. Clone https://gist.github.com/phenylshima/0af2d0692db455d2d040805289689fef
  2. Install urllib3==2.5.0, fastapi==0.121.1, and uvicorn==0.38.0
  3. Start test_server.py
  4. Run test_client.py, and check the "Content retrieval time" line.

Expected Behavior

The two "Content retrieval time" are almost the same.

Actual Behavior

Content retrieval time is about 9x slower when decode_content=True, compared to taking the whole data with decode_content=False and then unzipping.

Response time: 0.001905202865600586 seconds
Content retrieval time: 2.0758426189422607 seconds
Response time: 0.0009431838989257812 seconds
Content retrieval and decompression time: 0.2243025302886963 seconds

Profiling

I think this result contains significant overhead associated with line_profiler, but it shows that the aforementioned line is taking considerably longer time than others.

line_profiler result
Timer unit: 1e-06 s

Total time: 2.37476 s
File: /home/user/0af2d0692db455d2d040805289689fef/.venv/lib/python3.12/site-packages/urllib3/response.py
Function: BytesQueueBuffer.get at line 282

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   282                                               @line_profiler.profile
   283                                               def get(self, n: int) -> bytes:
   284    134218      39514.2      0.3      1.7          if n == 0:
   285                                                       return b""
   286    134218      40398.6      0.3      1.7          elif not self.buffer:
   287                                                       raise RuntimeError("buffer is empty")
   288    134218      38182.3      0.3      1.6          elif n < 0:
   289                                                       raise ValueError("n should be > 0")
   290                                           
   291    134218      37615.7      0.3      1.6          fetched = 0
   292    134218      49670.3      0.4      2.1          ret = io.BytesIO()
   293    134361      39164.8      0.3      1.6          while fetched < n:
   294    134361      38041.0      0.3      1.6              remaining = n - fetched
   295    134361      38083.3      0.3      1.6              chunk = self.buffer.popleft()
   296    134361      38311.4      0.3      1.6              chunk_length = len(chunk)
   297    134361      34412.0      0.3      1.4              if remaining < chunk_length:
   298    134217    1740781.8     13.0     73.3                  left_chunk, right_chunk = chunk[:remaining], chunk[remaining:]
   299    134217      57110.2      0.4      2.4                  ret.write(left_chunk)
   300    134217      39958.4      0.3      1.7                  self.buffer.appendleft(right_chunk)
   301    134217      38834.6      0.3      1.6                  self._size -= remaining
   302    134217      35020.9      0.3      1.5                  break
   303                                                       else:
   304       144         80.0      0.6      0.0                  ret.write(chunk)
   305       144         49.4      0.3      0.0                  self._size -= chunk_length
   306       144         44.4      0.3      0.0              fetched += chunk_length
   307                                           
   308       144         39.9      0.3      0.0              if not self.buffer:
   309         1          0.3      0.3      0.0                  break
   310                                           
   311    134218      69447.3      0.5      2.9          return ret.getvalue()

Total time: 4.33477 s
File: /home/user/0af2d0692db455d2d040805289689fef/.venv/lib/python3.12/site-packages/urllib3/response.py
Function: HTTPResponse.stream at line 1071

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1071                                               @line_profiler.profile
  1072                                               def stream(
  1073                                                   self, amt: int | None = 2**16, decode_content: bool | None = None
  1074                                               ) -> typing.Generator[bytes]:
  1075                                                   """
  1076                                                   A generator wrapper for the read() method. A call will block until
  1077                                                   ``amt`` bytes have been read from the connection or until the
  1078                                                   connection is closed.
  1079                                           
  1080                                                   :param amt:
  1081                                                       How much of the content to read. The generator will return up to
  1082                                                       much data per iteration, but may return less. This is particularly
  1083                                                       likely when using compressed data. However, the empty string will
  1084                                                       never be returned.
  1085                                           
  1086                                                   :param decode_content:
  1087                                                       If True, will attempt to decode the body based on the
  1088                                                       'content-encoding' header.
  1089                                                   """
  1090         1          0.9      0.9      0.0          if self.chunked and self.supports_chunked_reads():
  1091                                                       yield from self.read_chunked(amt, decode_content=decode_content)
  1092                                                   else:
  1093    134219     206651.4      1.5      4.8              while not is_fp_closed(self._fp) or len(self._decoded_buffer) > 0:
  1094    134218    4038064.2     30.1     93.2                  data = self.read(amt=amt, decode_content=decode_content)
  1095                                           
  1096    134218      38546.2      0.3      0.9                  if data:
  1097    134218      51505.8      0.4      1.2                      yield data

  2.37 seconds - /home/user/0af2d0692db455d2d040805289689fef/.venv/lib/python3.12/site-packages/urllib3/response.py:282 - BytesQueueBuffer.get
  4.33 seconds - /home/user/0af2d0692db455d2d040805289689fef/.venv/lib/python3.12/site-packages/urllib3/response.py:1071 - HTTPResponse.stream

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions