Skip to content

fix: preserve newlines in multipart form data decoding#8144

Open
Krishnachaitanyakc wants to merge 3 commits intomitmproxy:mainfrom
Krishnachaitanyakc:fix/multipart-decode-preserve-newlines
Open

fix: preserve newlines in multipart form data decoding#8144
Krishnachaitanyakc wants to merge 3 commits intomitmproxy:mainfrom
Krishnachaitanyakc:fix/multipart-decode-preserve-newlines

Conversation

@Krishnachaitanyakc
Copy link
Copy Markdown

Summary

Fixes #4466 — multipart form data values containing newlines (\n or \r\n) were silently stripped during decoding, corrupting text fields with line breaks and binary file uploads (e.g. JPEG images).

  • Root cause: decode_multipart used bytes.splitlines() to parse each part and then joined body lines with b"", which discarded all newline characters from the body content.
  • Fix: Replace the splitlines() + join approach with bytes.find() to locate the header/body separator (\r\n\r\n or \n\n), keeping the body as a single contiguous bytes object. Only the trailing CRLF/LF that RFC 2046 defines as part of the boundary delimiter is stripped.
  • Tests: Updated the existing test_decode to use RFC-compliant CRLF delimiters and added new test cases for:
    • Embedded CRLF newlines in values
    • Bare LF newlines in values
    • Binary content with embedded \r\n and \n bytes
    • Encode/decode round-trip key recovery

Test plan

  • Verify existing test_decode passes with CRLF-delimited payloads
  • Verify test_decode_with_lf passes for bare-LF payloads
  • Verify test_decode_content_preserves_newlines catches the original bug (value a\r\nb not flattened to ab)
  • Verify test_decode_binary_content ensures binary data with embedded \r\n/\n bytes is preserved
  • Verify test_decode_roundtrip confirms encode then decode recovers field names
  • Verify test_encode still passes unchanged

Krishnachaitanyakc and others added 3 commits March 24, 2026 22:32
The previous `decode_multipart` implementation used `splitlines()` to
parse each part and then joined the body lines with `b""`, which
silently stripped all newline characters (`\n`, `\r\n`) from field
values.  This corrupted any multipart payload whose values contained
embedded newlines — including plain text with line breaks and binary
file uploads (e.g. JPEG images whose raw bytes happen to include
0x0A / 0x0D).

The fix locates the header/body separator (`\r\n\r\n` or `\n\n`) with
`bytes.find` and keeps the body as a single contiguous `bytes` object.
Only the trailing CRLF (or LF) that RFC 2046 defines as part of the
boundary delimiter is stripped; all other newlines inside the body are
preserved.

Tests are updated to use RFC-compliant CRLF delimiters and new cases
are added for embedded newlines, bare-LF newlines, binary content, and
encode/decode round-tripping.

Fixes mitmproxy#4466
The encoder appended an extra empty-bytes element after each value,
producing `value\r\n\r\n` before the next boundary instead of the
RFC 2046 compliant `value\r\n`.  This caused decode_multipart (which
correctly strips only one trailing CRLF) to return values with a
spurious `\r\n` suffix, breaking the test_set_multipart_form round-trip
test in test_http.py.

Remove the extra `hdrs.append(b"")` from the encoder and update the
encode/round-trip tests to match the corrected output format.
@Krishnachaitanyakc
Copy link
Copy Markdown
Author

@mhils can you please review this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error parsing multipart with newline in the content

1 participant