fix(api): strip NUL bytes from LLM output fields before API response by firecrawl-spring[bot] · Pull Request #3221 · firecrawl/firecrawl

firecrawl-spring · 2026-03-24T12:46:43Z

Summary

LLM-generated output fields (summary, extract/json, answer) could contain NUL bytes (\u0000) when the model mishandles special characters like ² (U+00B2). These NUL bytes pass through to the API response unchanged, causing downstream failures for users storing results in PostgreSQL text columns (which reject NUL bytes).

Changes

Added removeNulBytes() helper in llmExtract.ts that strips NUL bytes from strings, and uses JSON.stringify with a replacer to handle nested arrays/objects without manual recursion
Applied sanitization to all three LLM output assignment paths:
- document.extract / document.json (extract data)
- document.summary (summary generation)
- document.answer (query generation)

This is consistent with existing sanitization patterns in the codebase:

sanitizeMetadataValue() in build-document.ts strips control chars via .replace(/[\x00-\x1f\x7f]/g, '')
sanitizeString() in log_job.ts strips NUL via .replace(/\u0000/g, '')

Updates since last revision

Simplified removeNulBytes() implementation: replaced manual recursion through arrays and objects with JSON.stringify's built-in replacer function. This eliminates the separate array/object branches while maintaining identical behavior. The replacer operates on actual string values (not serialized form), so there are no edge cases with escaped characters.

Test plan

TypeScript compilation passes (no new errors introduced)
Unit-tested removeNulBytes() logic with strings containing \u0000, nested objects, arrays, null/undefined, numbers, and literal \u0000 text — all pass correctly

Review & Testing Checklist for Human

Verify the JSON.stringify replacer approach doesn't silently drop or alter non-string values in deeply nested extract output (e.g. numbers, booleans, nulls inside arrays of objects)
Confirm the JSON round-trip doesn't lose data for any real-world extract schemas — LLM output should always be JSON-serializable, but worth a sanity check against a complex extraction
Test with a page known to trigger NUL bytes in LLM output (e.g. content with ², ³, or other superscript characters) to verify the fix end-to-end

Notes

The JSON.stringify/JSON.parse round-trip is safe for LLM extract output since it's already JSON-parsed data (no Dates, undefined values, or class instances to lose)
The formatting change to modelChain in performQuery is from prettier (pre-commit hook), not a functional change

Related Pylon Ticket

https://app.usepylon.com/issues?issueNumber=25035

Link to Devin session: https://app.devin.ai/sessions/dac229a885894c6d8d87db047fe8d440
Requested by: @Chadha93

LLM-generated output (summary, extract/json, answer) could contain NUL bytes (\u0000) when the model mishandles special characters like ² (U+00B2). These NUL bytes pass through to the API response unchanged, causing downstream failures for users storing results in PostgreSQL text columns (which reject NUL bytes). Add removeNulBytes() helper that recursively strips NUL bytes from strings, arrays, and objects, and apply it to all three LLM output assignment paths in llmExtract.ts. Co-Authored-By: micahstairs <micah@sideguide.dev>

cubic-dev-ai

No issues found across 1 file

blacksmith-sh · 2026-03-24T12:51:05Z

Found 6 test failures on Blacksmith runners:

Failures

Test	View Logs
`src/tests/snips/v2/scrape-browser.test.ts/` `Scrape browser interact replay returns 404 when scrape job does not exist`	View Logs
`src/tests/snips/v2/scrape-browser.test.ts/` `Scrape browser interact replay returns 404 when scrape job does not exist`	View Logs
`src/tests/snips/v2/scrape-browser.test.ts/` `Scrape browser interact replay returns 404 when scrape job does not exist`	View Logs
`src/tests/snips/v2/scrape-browser.test.ts/` `Scrape browser interact replay returns 404 when scrape job does not exist`	View Logs
`src/tests/snips/v2/scrape-browser.test.ts/` `Scrape browser interact replay returns replay-context error when scrape data is not ret` `ained`	View Logs
`src/tests/snips/v2/scrape-browser.test.ts/` `Scrape browser interact replay returns replay-context error when scrape data is not ret` `ained`	View Logs

Replace manual recursion through arrays and objects with JSON.stringify's built-in replacer function, which handles all nesting automatically. This eliminates the separate array/object branches while maintaining identical behavior. Co-Authored-By: gaurav <gauravchadha1676@gmail.com>

cubic-dev-ai

0 issues found across 1 file (changes from recent commits).

firecrawl-spring Bot requested a review from micahstairs March 24, 2026 12:46

cubic-dev-ai Bot approved these changes Mar 24, 2026

View reviewed changes

cubic-dev-ai Bot approved these changes Mar 25, 2026

View reviewed changes

Chadha93 approved these changes Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): strip NUL bytes from LLM output fields before API response#3221

fix(api): strip NUL bytes from LLM output fields before API response#3221
firecrawl-spring[bot] wants to merge 2 commits intomainfrom
fix/sanitize-llm-output-nul-bytes

firecrawl-spring Bot commented Mar 24, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

blacksmith-sh Bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

firecrawl-spring Bot commented Mar 24, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Updates since last revision

Test plan

Review & Testing Checklist for Human

Notes

Related Pylon Ticket

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

blacksmith-sh Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Failures

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

firecrawl-spring Bot commented Mar 24, 2026 •

edited by devin-ai-integration Bot

Loading

blacksmith-sh Bot commented Mar 24, 2026 •

edited

Loading