fix: preserve newlines in multipart form data decoding by Krishnachaitanyakc · Pull Request #8144 · mitmproxy/mitmproxy

Krishnachaitanyakc · 2026-03-25T02:32:50Z

Summary

Fixes #4466 — multipart form data values containing newlines (\n or \r\n) were silently stripped during decoding, corrupting text fields with line breaks and binary file uploads (e.g. JPEG images).

Root cause: decode_multipart used bytes.splitlines() to parse each part and then joined body lines with b"", which discarded all newline characters from the body content.
Fix: Replace the splitlines() + join approach with bytes.find() to locate the header/body separator (\r\n\r\n or \n\n), keeping the body as a single contiguous bytes object. Only the trailing CRLF/LF that RFC 2046 defines as part of the boundary delimiter is stripped.
Tests: Updated the existing test_decode to use RFC-compliant CRLF delimiters and added new test cases for:
- Embedded CRLF newlines in values
- Bare LF newlines in values
- Binary content with embedded \r\n and \n bytes
- Encode/decode round-trip key recovery

Test plan

Verify existing test_decode passes with CRLF-delimited payloads
Verify test_decode_with_lf passes for bare-LF payloads
Verify test_decode_content_preserves_newlines catches the original bug (value a\r\nb not flattened to ab)
Verify test_decode_binary_content ensures binary data with embedded \r\n/\n bytes is preserved
Verify test_decode_roundtrip confirms encode then decode recovers field names
Verify test_encode still passes unchanged

The previous `decode_multipart` implementation used `splitlines()` to parse each part and then joined the body lines with `b""`, which silently stripped all newline characters (`\n`, `\r\n`) from field values. This corrupted any multipart payload whose values contained embedded newlines — including plain text with line breaks and binary file uploads (e.g. JPEG images whose raw bytes happen to include 0x0A / 0x0D). The fix locates the header/body separator (`\r\n\r\n` or `\n\n`) with `bytes.find` and keeps the body as a single contiguous `bytes` object. Only the trailing CRLF (or LF) that RFC 2046 defines as part of the boundary delimiter is stripped; all other newlines inside the body are preserved. Tests are updated to use RFC-compliant CRLF delimiters and new cases are added for embedded newlines, bare-LF newlines, binary content, and encode/decode round-tripping. Fixes mitmproxy#4466

The encoder appended an extra empty-bytes element after each value, producing `value\r\n\r\n` before the next boundary instead of the RFC 2046 compliant `value\r\n`. This caused decode_multipart (which correctly strips only one trailing CRLF) to return values with a spurious `\r\n` suffix, breaking the test_set_multipart_form round-trip test in test_http.py. Remove the extra `hdrs.append(b"")` from the encoder and update the encode/round-trip tests to match the corrected output format.

Krishnachaitanyakc · 2026-04-06T06:58:48Z

@mhils can you please review this?

Krishnachaitanyakc and others added 3 commits March 24, 2026 22:32

[autofix.ci] apply automated fixes

95ab9d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: preserve newlines in multipart form data decoding#8144

fix: preserve newlines in multipart form data decoding#8144
Krishnachaitanyakc wants to merge 3 commits intomitmproxy:mainfrom
Krishnachaitanyakc:fix/multipart-decode-preserve-newlines

Krishnachaitanyakc commented Mar 25, 2026

Uh oh!

Krishnachaitanyakc commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Krishnachaitanyakc commented Mar 25, 2026

Summary

Test plan

Uh oh!

Krishnachaitanyakc commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant