Skip to content

feat: add opt-in GCF output format for 62% fewer tokens on tool responses#1290

Open
blackwell-systems wants to merge 1 commit into
CodeGraphContext:mainfrom
blackwell-systems:feat/gcf-output-format
Open

feat: add opt-in GCF output format for 62% fewer tokens on tool responses#1290
blackwell-systems wants to merge 1 commit into
CodeGraphContext:mainfrom
blackwell-systems:feat/gcf-output-format

Conversation

@blackwell-systems

@blackwell-systems blackwell-systems commented Jun 20, 2026

Copy link
Copy Markdown

Summary

  • Add CGC_OUTPUT_FORMAT=gcf env var to encode tool responses using GCF instead of JSON
  • GCF is an optional dependency (pip install gcf-python); falls back to JSON silently if not installed
  • Zero behavior change without the env var
  • README updated with usage docs and MCP client config example

Why this matters for CodeGraphContext

Code graph query results (symbols, callers, callees, complexity, class hierarchies) are pure structured data with consistent schemas and repeated field names. This is the exact data shape where GCF saves the most tokens: keys declared once in a header, values pipe-delimited per row.

Measured savings (on CGC data shapes)

Query Type JSON tokens GCF tokens Savings
find_callers (12 results) 976 352 63.9%
find_dead_code (15 functions) 763 304 60.2%
Cypher query (20 records) 1,342 516 61.5%
complex_functions (10 results) 670 223 66.7%
class_hierarchy (8 classes) 536 211 60.6%
Overall 4,287 1,606 62.5%

Measured with tiktoken (cl100k_base) on response shapes matching analyze_code_relationships, find_dead_code, execute_cypher_query, find_most_complex_functions, and class hierarchy queries.

Comprehension: LLMs read GCF better than JSON

GCF doesn't just use fewer tokens; models understand it more accurately:

  • 23 runs, 10 models, 3 providers (Claude, GPT, Gemini)
  • GCF wins 22, ties 1, loses 0 vs TOON across all runs
  • 100% accuracy on Claude Opus 4.6, Claude Sonnet 4.6, Gemini 2.5 Pro, Gemini 3.1 Pro, Gemini 3.5 Flash
  • 100% accuracy on general structured data across every frontier model tested

No format instructions or primers needed. The model reads GCF natively.

Full eval methodology: eval/README.md

What is GCF?

GCF (Graph Compact Format) encodes structured data with positional fields: keys declared once in the header, values pipe-delimited per row.

GCF was originally extracted from a code intelligence engine (knowing) with the same graph-structured data patterns as CodeGraphContext.

Changes

File What
src/codegraphcontext/utils/gcf_encoder.py New module: lazy-loads gcf-python, encodes or falls back to JSON
src/codegraphcontext/server.py Tool result serialization uses encode_response() instead of json.dumps()
pyproject.toml gcf-python added as optional dependency (pip install codegraphcontext[gcf])
README.md Usage docs with env var and MCP client config example
tests/unit/tools/test_gcf_encoder.py 7 tests for config, encoding, size comparison, and fallback

Usage

# Install the optional dependency
pip install gcf-python

# Enable via environment variable
CGC_OUTPUT_FORMAT=gcf codegraphcontext mcp start

Test plan

  • 7 new tests for GCF encoder (config, encoding, size comparison, fallback)
  • Encoder module verified standalone (6 assertions, all pass)
  • No changes to existing tool logic or response shapes
  • Fallback to JSON verified when gcf-python not installed

Adoption

GCF is already in production at OmniRoute (6.1K stars), NetClaw (556 stars), NeuroNest (commercial IDE), Open Data Products SDK (Linux Foundation), and ctx (510 stars). Full adopter list: gcformat.com/ecosystem/adopters

Notes

Happy to refactor the implementation however you see fit. If you'd prefer a different config surface (CLI flag, config file setting, per-tool toggle), just let me know and I'll adjust.

…nses

Add CGC_OUTPUT_FORMAT=gcf environment variable to encode tool responses
using GCF (Graph Compact Format) instead of JSON.

Measured on CodeGraphContext's own data shapes:
- find_callers (12 results): 63.9% fewer tokens
- find_dead_code (15 functions): 60.2% fewer tokens
- Cypher query (20 records): 61.5% fewer tokens
- complex_functions (10 results): 66.7% fewer tokens
- class_hierarchy (8 classes): 60.6% fewer tokens
- Overall: 62.5% reduction

GCF is an optional dependency (pip install gcf-python). Falls back to
JSON silently if not installed. Zero behavior change without the env var.
@vercel

vercel Bot commented Jun 20, 2026

Copy link
Copy Markdown

@blackwell-systems is attempting to deploy a commit to the shashankss1205's projects Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog tasks

Development

Successfully merging this pull request may close these issues.

1 participant