fix(mcp): pass ensure_ascii=False to json.dumps in mcp_bridge.py by devteamaegis · Pull Request #1990 · unclecode/crawl4ai

devteamaegis · 2026-05-28T16:31:03Z

Summary

All three json.dumps() calls in deploy/docker/mcp_bridge.py were using the
default ensure_ascii=True, which escapes every non-ASCII codepoint (CJK, emoji,
accented characters) to \uXXXX sequences.

For CJK content this inflates token counts by 2.5–3x, increasing LLM costs and
eating context budget unnecessarily.

The HTTP REST API already returns native UTF-8 — MCP tool results should behave
the same way.

Fix

# Before
return [t.TextContent(type="text", text=json.dumps(err))]
return [t.TextContent(type="text", text=json.dumps(res, default=str))]

# After
return [t.TextContent(type="text", text=json.dumps(err, ensure_ascii=False))]
return [t.TextContent(type="text", text=json.dumps(res, default=str, ensure_ascii=False))]

Tests

Added tests/unit/test_mcp_bridge_ensure_ascii.py:

Baseline sanity check for ensure_ascii=False behavior
AST-level check that every json.dumps call in mcp_bridge.py includes ensure_ascii=False

Fixes #1962

`NlpSentenceChunking.chunk()` was returning `list(set(sens))`. Python's `set` is unordered, so the returned chunks were in arbitrary order — not document order — and duplicate sentences were silently discarded. Fix: return `sens` directly, which preserves the order produced by `nltk.sent_tokenize` and keeps any intentional duplicates. Adds two regression tests that verify ordering and duplicate preservation without requiring a full crawl4ai install. Fixes unclecode#1909

All three json.dumps() calls in deploy/docker/mcp_bridge.py were using the default ensure_ascii=True, which escapes every non-ASCII codepoint to \uXXXX sequences. For CJK content this inflates token counts by 2.5-3x, raising costs and eating context budget for nothing. The HTTP REST API already returns UTF-8 natively; MCP tool results should behave the same. Changes: - json.dumps(err) → json.dumps(err, ensure_ascii=False) - json.dumps(res, default=str) ×2 → json.dumps(res, default=str, ensure_ascii=False) Adds two unit tests: - test_cjk_not_escaped_in_json_dumps — baseline sanity check - test_mcp_bridge_serialize_uses_ensure_ascii_false — AST-level verification that every json.dumps call in mcp_bridge.py passes ensure_ascii=False Fixes unclecode#1962

Ishaan Samantray added 2 commits May 28, 2026 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(mcp): pass ensure_ascii=False to json.dumps in mcp_bridge.py#1990

fix(mcp): pass ensure_ascii=False to json.dumps in mcp_bridge.py#1990
devteamaegis wants to merge 2 commits into
unclecode:mainfrom
devteamaegis:fix/mcp-bridge-ensure-ascii-1962

devteamaegis commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

devteamaegis commented May 28, 2026

Summary

Fix

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant