---
name: debug-codex-requests
description: Codex request diagnostics: capture a proxied run, inspect the provider request, and summarize payload, instructions, tools, overrides, and throughput.
triggers:
- debug-codex-requests
- "debug codex requests"
- "inspect codex request"
- "capture codex api request"
- "codex proxy log"
- "codex request payload"
- "codex system prompt"
- "codex tool schema"
- "debug codex outbound requests"
---
Use this skill when you need evidence from the real provider request, not a guessed explanation.
Default mode is execute + summarize.
Unless the user explicitly asked for instructions only:
- Run the diagnostic workflow locally.
- Finalize and inspect the proxy log or benchmark summary.
- Return findings to the user.
A task is not done when you have only:
- started the proxy
- run
codex exec - printed commands for the user to run manually
A task is done only when all of these are true:
- a proxied Codex run or benchmark was attempted
- the resulting log or run summary was inspected with the bundled scripts
- the user received a concise answer with findings, actual model used, fallbacks, and remaining blockers
Never end in tutorial mode when local execution was possible.
- Treat the proxy log as the source of truth.
- Prefer
codex execover interactivecodex. - Prefer a fresh temp directory such as
/tmp/debug-codex-requests-<timestamp>. - Use bundled scripts and return findings. Do not stop at shell examples.
- Keep
--dump-contextopt-in. Use it only when the user explicitly wants prompt/context payload. - If the user wants the system prompt imposed by the agent environment, use
--dump-contextand inspectsystem_prompt. - Throughput is a separate benchmark workflow. Do not infer tokens-per-second from a normal run.
- The main throughput path is the multi-phase benchmark on the real budget resolved from
~/.codex/config.toml. - Use subagents only when the user explicitly asked for subagents or isolated worker execution. Otherwise run the same workflow locally.
- Even with redaction enabled, assume dumped context may still contain sensitive user content.
Resolve bundled resources relative to this file and use absolute paths:
<proxy-script>:scripts/codex_proxy.py<inspect-script>:scripts/inspect_proxy_log.py<benchmark-script>:scripts/run_codex_benchmark.py<log-fields-ref>:references/log-fields.md<benchmark-ref>:references/throughput-benchmark.md<codex-config>:~/.codex/config.toml
Choose exactly one flow:
- Capture only: default when the user asks to debug requests, payload shape, exposed tools, provider overrides, or model selection.
- System prompt / context: only when the user asks for
system_prompt,instructions, or sanitizedinput. - Throughput: only when the user explicitly asks for TTFT, speed, tokens-per-second, or near-context behavior.
If the request is underspecified, default to capture only and return a result.
Always inspect <codex-config> before building the run.
Model selection precedence:
- A model name the user wrote in the prompt, exact or approximate.
- A profile name the user wrote in the prompt, if no separate model name was given.
- The default Codex behavior from
<codex-config>when the user specified neither.
Approximate matching:
- lowercase the user text
- remove spaces, underscores, hyphens, and dots
- try exact configured model names
- try exact profile names
- try normalized exact matches
- try a strong alias or substring match such as
spark->gpt-5.3-codex-spark
Command construction rules:
- Use
-p <profile>when the chosen model already matches that profile's configured model and you want that profile's surrounding settings. - Use
-m <model>when the user explicitly named a different model or when no profile cleanly represents the chosen model. - If you fall back to
gpt-5.4, setmodel_reasoning_effort=mediumfor that diagnostic run.
Fallback policy:
- Start with the user-requested model if one was provided.
- If that model is unavailable, unloaded, or the provider returns
model not found, retry withgpt-5.3-codex-spark. - If Spark is unavailable, retry with
gpt-5.4andmediumreasoning. - Report
requested_model,resolved_model,actual_model, andfallback_reasonin the final summary.
Rules:
model not foundis not a blocker if a fallback succeeds.- Do not stop to ask the user which fallback to use.
- If the current proxied provider path cannot serve a fallback model, switch to another already-available local OpenAI-compatible provider path if your environment exposes one.
- Ask about model choice only if every fallback path failed.
Normal capture:
python3 <proxy-script> -p <proxy-port> -t <target-port> -o <log-file>Context capture:
python3 <proxy-script> -p <proxy-port> -t <target-port> -o <log-file> --dump-context--dump-context stores:
instructionssystem_promptinputcontext_redactedredaction_summary
- Inspect
<codex-config>. - Resolve
flow,requested_model,resolved_model, and whether--dump-contextis needed. - Start the proxy.
- Run a fresh
codex execthrough the proxy. - Confirm the proxy log contains the request attempt.
- Stop the proxy cleanly so the JSON array is finalized.
- Inspect the log or benchmark summary with the bundled scripts.
- Return a result summary to the user.
Do not skip steps 6 or 7.
Default prompt:
Reply with exactly PONG and do not call any tools.
Without a profile:
codex exec --skip-git-repo-check --ephemeral --json \
-C <workdir> \
-m <model> \
-c model_provider=ollama-proxy \
-c 'model_providers.ollama-proxy.name="Proxy"' \
-c 'model_providers.ollama-proxy.base_url="http://127.0.0.1:<proxy-port>/v1"' \
'<prompt>'With a profile:
codex exec --skip-git-repo-check --ephemeral --json \
-C <workdir> \
-p <profile> \
-c 'profiles.<profile>.model_provider="ollama-proxy"' \
-c 'model_providers.ollama-proxy.name="Proxy"' \
-c 'model_providers.ollama-proxy.base_url="http://127.0.0.1:<proxy-port>/v1"' \
'<prompt>'Defaults:
<workdir>: the user's requested repo, else the current working directory<profile>: the named profile from<codex-config>when the user provided one and it matches the chosen model<model>: the chosen model after applying user intent and fallback rules<prompt>: a short one-shot prompt unless the user is testing a specific request shape
If the selected path points at Ollama and successful completion matters, verify the model exists in ollama list before the first run. If it does not, continue with the fallback chain instead of asking the user.
Use this path when the user wants the environment-imposed prompt or the actual sanitized payload.
Workflow:
- Start the proxy with
--dump-context. - Run a fresh
codex execthrough the proxy. - Stop the proxy cleanly.
- Inspect the log with one of these commands:
Show only the environment/system prompt:
python3 <inspect-script> --show-system-prompt <log-file>Show the full sanitized capture:
python3 <inspect-script> --show-context <log-file>Interpretation:
system_promptis the sanitized top-level prompt that Codex actually sent ininstructions.- Use
--show-system-promptwhen the user only cares about agent-environment instructions. - Use
--show-contextwhen the user also wants the sanitizedinput. - Report whether
--dump-contextwas used, whethersystem_promptwas captured, and whether redaction changed anything. - Do not claim
system_promptor sanitizedinputwere captured unless this run used--dump-context.
Use this path only when the user explicitly wants speed, TTFT, or tokens-per-second.
Default benchmark path:
python3 <benchmark-script> \
--workdir <workdir> \
--profile <profile> \
--proxy-base-url "http://127.0.0.1:<proxy-port>/v1" \
--proxy-log <log-file> \
--events-log <events-log-file> \
--summary-out <summary-file> \
--prompt-file <prompt-file>What this means:
- Multi-phase mode is the default.
- The near-context probe is enabled by default in multi-phase mode.
- The runner resolves the real budget from
<codex-config>. - Prefer the selected profile budget, such as
profiles.ossresolving tomodel_auto_compact_token_limit. - Use manual
--context-budget-tokensor--context-window-tokensonly for smoke tests or deliberate experiments.
Fallback modes:
Quick one-shot benchmark:
python3 <benchmark-script> \
--workflow-mode single-phase \
--workdir <workdir> \
--profile <profile> \
--proxy-base-url "http://127.0.0.1:<proxy-port>/v1" \
--proxy-log <log-file> \
--events-log <events-log-file> \
--summary-out <summary-file> \
--prompt-file <prompt-file>Disable near-context but keep multi-phase:
python3 <benchmark-script> \
--workdir <workdir> \
--profile <profile> \
--proxy-base-url "http://127.0.0.1:<proxy-port>/v1" \
--proxy-log <log-file> \
--events-log <events-log-file> \
--summary-out <summary-file> \
--prompt-file <prompt-file> \
--no-near-contextRules:
- Use the same measured prompt for the cold probe and warm runs.
- Report cold TTFT separately from warm generation speed.
- Treat each phase as valid only when its proxy slice contains exactly one successful request.
- Large near-context prompts are sent over
stdinby the runner. Do not try to pass giant prompt text as a shell argument. - Read
<benchmark-ref>when choosing or adjusting benchmark prompts. - Resolve the model first and preserve the chosen profile only when it still represents the chosen model.
After the diagnostic or benchmark run, stop the proxy cleanly so the JSON array is finalized.
If the proxy was interrupted, inspect with --repair.
Standard inspection:
python3 <inspect-script> <log-file>Benchmark inspection:
python3 <inspect-script> --run-summary <summary-file> <log-file>Context inspection:
python3 <inspect-script> --show-context <log-file>
python3 <inspect-script> --show-system-prompt <log-file>Do not recommend --show-context or --show-system-prompt unless this run used --dump-context.
Raw JSON only when needed:
python3 -m json.tool <log-file>Answer the user's actual debugging question, not just the raw log and not a tutorial.
Minimum final summary:
flowrequested_modelresolved_modelactual_modelfallback_reasonornonelog_path- request path and model
- whether streaming was enabled
- total bytes
instructions_chars,tools_chars, andinput_charsnum_tools,num_input_items,tool_types, and functiontool_names- any forwarding error, proxy exception, retry pattern, or provider rejection
- a one or two sentence conclusion answering the user's question
When the user asked for system prompt or context, also report:
- whether
--dump-contextwas used - whether
system_promptwas captured - whether redaction changed anything
- whether the captured
system_promptconfirms the user's suspicion
When the user asked for throughput, also report:
- cold
ttft_ms - warm generation speed
- warm end-to-end speed
- near-context ratio and behavior score when enabled
- whether the run or phase was valid
If comparing runs, present deltas instead of two unrelated summaries.
Ask the user only when one of these is true:
- no request reached the proxy after all reasonable local attempts
- all fallback models and available provider paths failed
- a destructive or ambiguous benchmark variant was requested
- the user wants a specific port or provider shape that cannot be inferred
Otherwise run the workflow and return the result.
- Never answer with a step-by-step manual guide if the commands were runnable locally.
- Never treat a successful
codex execas the end. Inspection is mandatory. - Never leave a background proxy running when you answer unless the user explicitly asked to keep it alive.
- Never claim
system_promptorinputwere captured unless--dump-contextwas used. - If the proxy logged a request and the upstream run failed, the capture still succeeded. Summarize the captured failure.
- Repeated near-identical requests usually mean retries. Report that explicitly.
- If the context budget cannot be resolved from
<codex-config>, use explicit overrides only as a fallback. - If the user wants the current conversation context on purpose, use a local run or a forked worker instead of a fresh worker.
- Read
<log-fields-ref>for field meanings and interpretation hints. - Read
<benchmark-ref>for prompt guidance and benchmark validity rules.