|
| 1 | +--- |
| 2 | +name: adk-debug |
| 3 | +description: Use when debugging ADK agents, inspecting sessions, testing agent behavior, troubleshooting tool calls, event flow issues, or diagnosing LLM/model problems. |
| 4 | +--- |
| 5 | + |
| 6 | +# Debugging ADK Agents |
| 7 | + |
| 8 | +Two debugging modes: `adk web` (browser UI + API) and `adk run` (CLI). |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +## Mode 1: adk web (Browser UI + REST API) |
| 13 | + |
| 14 | +Best for: visual inspection, session management, multi-turn testing. |
| 15 | + |
| 16 | +### Dev server workflow |
| 17 | + |
| 18 | +Before starting a server, ask the user: |
| 19 | +1. **Is there already a running `adk web` server?** If yes, use it |
| 20 | + (check with `curl -s http://localhost:8000/health`). |
| 21 | +2. **If not**, start one. Use `run_in_background` so it doesn't |
| 22 | + block. **Remember to shut it down when debugging is done.** |
| 23 | + |
| 24 | +```bash |
| 25 | +# Check if server is already running |
| 26 | +curl -s http://localhost:8000/health |
| 27 | + |
| 28 | +# Start server (if not running) |
| 29 | +adk web path/to/agents_dir # default: http://localhost:8000 |
| 30 | +adk web -v path/to/agents_dir # verbose (DEBUG level) |
| 31 | +adk web --reload_agents path/to/agents_dir # auto-reload on file changes |
| 32 | + |
| 33 | +# Shut down when done (if you started it) |
| 34 | +# Kill the background process or Ctrl+C |
| 35 | +``` |
| 36 | + |
| 37 | +Web UI: `http://localhost:8000/dev-ui/` |
| 38 | + |
| 39 | +### Session inspection via curl |
| 40 | + |
| 41 | +```bash |
| 42 | +# List sessions |
| 43 | +curl -s http://localhost:8000/apps/{app_name}/users/{user_id}/sessions | python3 -m json.tool |
| 44 | + |
| 45 | +# Get full session with events |
| 46 | +curl -s http://localhost:8000/apps/{app_name}/users/{user_id}/sessions/{session_id} | python3 -m json.tool |
| 47 | +``` |
| 48 | + |
| 49 | +Do NOT delete sessions after debugging — the user may want to |
| 50 | +inspect them in the web UI. |
| 51 | + |
| 52 | +### Summarize events |
| 53 | + |
| 54 | +Fetch the session JSON and write a Python script to summarize |
| 55 | +it. Do NOT use hardcoded inline scripts — the JSON schema may |
| 56 | +change. Instead, fetch the raw JSON first: |
| 57 | + |
| 58 | +```bash |
| 59 | +curl -s http://localhost:8000/apps/{app_name}/users/{user_id}/sessions/{session_id} | python3 -m json.tool |
| 60 | +``` |
| 61 | + |
| 62 | +Then write a script based on the actual structure you see. |
| 63 | +Key fields to look for in each event: `author`, `branch`, |
| 64 | +`content.parts` (text, functionCall, functionResponse), |
| 65 | +`output`, `actions` (transferToAgent, requestTask, finishTask), |
| 66 | +`nodeInfo.path`. |
| 67 | + |
| 68 | +### Send test messages via curl |
| 69 | + |
| 70 | +```bash |
| 71 | +SESSION=$(curl -s -X POST http://localhost:8000/apps/{app_name}/users/test/sessions \ |
| 72 | + -H "Content-Type: application/json" -d '{}' | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])") |
| 73 | + |
| 74 | +curl -N -X POST http://localhost:8000/run_sse \ |
| 75 | + -H "Content-Type: application/json" \ |
| 76 | + -d "{\"app_name\":\"{app_name}\",\"user_id\":\"test\",\"session_id\":\"$SESSION\", |
| 77 | + \"new_message\":{\"role\":\"user\",\"parts\":[{\"text\":\"your message here\"}]}, |
| 78 | + \"streaming\":false}" |
| 79 | +``` |
| 80 | + |
| 81 | +### Debug endpoints (traces) |
| 82 | + |
| 83 | +```bash |
| 84 | +# Trace for a specific event |
| 85 | +curl -s http://localhost:8000/debug/trace/{event_id} | python3 -m json.tool |
| 86 | + |
| 87 | +# All traces for a session |
| 88 | +curl -s http://localhost:8000/debug/trace/session/{session_id} | python3 -m json.tool |
| 89 | + |
| 90 | +# Health check |
| 91 | +curl -s http://localhost:8000/health |
| 92 | +``` |
| 93 | + |
| 94 | +### Extract LLM content history |
| 95 | + |
| 96 | +Fetch trace data and inspect the `call_llm` spans. The LLM |
| 97 | +request/response are in span attributes: |
| 98 | + |
| 99 | +```bash |
| 100 | +curl -s http://localhost:8000/debug/trace/session/{session_id} | python3 -m json.tool |
| 101 | +``` |
| 102 | + |
| 103 | +Look for spans with `name: "call_llm"` and inspect their |
| 104 | +`attributes.gcp.vertex.agent.llm_request` (JSON string of the |
| 105 | +full request including `contents`, `config`, `model`). |
| 106 | + |
| 107 | +### Key span attributes |
| 108 | + |
| 109 | +| Attribute | Description | |
| 110 | +|-----------|-------------| |
| 111 | +| `gcp.vertex.agent.llm_request` | Full LLM request JSON (contents, config, model) | |
| 112 | +| `gcp.vertex.agent.llm_response` | Full LLM response JSON | |
| 113 | +| `gcp.vertex.agent.event_id` | Event ID — correlate with session events | |
| 114 | +| `gen_ai.request.model` | Model name | |
| 115 | +| `gen_ai.usage.input_tokens` | Input token count | |
| 116 | +| `gen_ai.usage.output_tokens` | Output token count | |
| 117 | +| `gen_ai.response.finish_reasons` | Stop reason | |
| 118 | + |
| 119 | +--- |
| 120 | + |
| 121 | +## Mode 2: adk run (CLI) |
| 122 | + |
| 123 | +Best for: quick testing, scripting, CI/CD, headless debugging. |
| 124 | + |
| 125 | +### Run interactively |
| 126 | + |
| 127 | +```bash |
| 128 | +adk run path/to/my_agent # interactive prompts |
| 129 | +adk run -v path/to/my_agent # verbose logging |
| 130 | +``` |
| 131 | + |
| 132 | +### Event printing utility |
| 133 | + |
| 134 | +```python |
| 135 | +from google.adk.utils._debug_output import print_event |
| 136 | + |
| 137 | +print_event(event, verbose=False) # text responses only |
| 138 | +print_event(event, verbose=True) # tool calls, code execution, inline data |
| 139 | +``` |
| 140 | + |
| 141 | +Location: `src/google/adk/utils/_debug_output.py` |
| 142 | + |
| 143 | +### Programmatic debugging |
| 144 | + |
| 145 | +```python |
| 146 | +from google.adk import Agent, Runner |
| 147 | +from google.adk.sessions import InMemorySessionService |
| 148 | + |
| 149 | +agent = Agent(name="test", model="gemini-2.5-flash", instruction="...") |
| 150 | +runner = Runner(app_name="test", agent=agent, session_service=InMemorySessionService()) |
| 151 | + |
| 152 | +session = runner.session_service.create_session_sync(app_name="test", user_id="u") |
| 153 | +for event in runner.run(user_id="u", session_id=session.id, new_message="hello"): |
| 154 | + print(f"{event.author}: {event.content}") |
| 155 | + if event.actions.transfer_to_agent: |
| 156 | + print(f" -> transfer to {event.actions.transfer_to_agent}") |
| 157 | + if event.output: |
| 158 | + print(f" -> output: {event.output}") |
| 159 | +``` |
| 160 | + |
| 161 | +--- |
| 162 | + |
| 163 | +## Logging |
| 164 | + |
| 165 | +Shared across both modes. |
| 166 | + |
| 167 | +Set log level with `--log_level` (DEBUG, INFO, WARNING, ERROR, CRITICAL) or `-v` for DEBUG. |
| 168 | +Logs write to `/tmp/agents_log/`. Tail latest: `tail -F /tmp/agents_log/agent.latest.log` |
| 169 | +Logger name: `google_adk`. Setup: `src/google/adk/cli/utils/logs.py` |
| 170 | + |
| 171 | +| Env Variable | Effect | |
| 172 | +|---|---| |
| 173 | +| `ADK_CAPTURE_MESSAGE_CONTENT_IN_SPANS` | Include prompt/response in traces (default: `true`) | |
| 174 | +| `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` | Enable prompt/response in OTEL spans | |
| 175 | +| `GOOGLE_CLOUD_PROJECT` | Required for `--trace_to_cloud` | |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## Common Issues |
| 180 | + |
| 181 | +### 1. Agent outputs raw JSON instead of calling tools |
| 182 | + |
| 183 | +**Symptom:** Agent with `output_schema` dumps JSON text instead of calling tools. |
| 184 | +**Cause:** `output_schema` sets `response_schema` on the LLM config, activating controlled generation (JSON-only mode). |
| 185 | +**Check:** Look for `response_mime_type: "application/json"` in the LLM request. |
| 186 | +**Location:** `src/google/adk/agents/llm/basic.py` |
| 187 | + |
| 188 | +### 2. Events missing from session / not visible to plugins |
| 189 | + |
| 190 | +**Symptom:** Events from sub-agents don't appear in plugin callbacks or runner event stream. |
| 191 | +**Cause:** Direct `append_event` calls inside components bypass the runner's event loop. |
| 192 | +**Check:** Only the runner (`runners.py`) should call `append_event`. Components should yield events. |
| 193 | + |
| 194 | +### 3. `NameError: name 'X' is not defined` at runtime |
| 195 | + |
| 196 | +**Symptom:** `{"error": "name 'SomeClass' is not defined"}` |
| 197 | +**Cause:** Class imported under `TYPE_CHECKING` but used at runtime (e.g., `isinstance()`). |
| 198 | +**Fix:** Move import outside `TYPE_CHECKING` or use a local import. |
| 199 | + |
| 200 | +### 4. Sub-agent doesn't have context from parent conversation |
| 201 | + |
| 202 | +**Symptom:** Sub-agent only sees its own input, not the parent's history. |
| 203 | +**Cause:** Branch isolation — sub-agents on a branch only see events on that branch. |
| 204 | +**Fix:** Write the sub-agent's `description` to prompt the parent to include context in delegation input. |
| 205 | + |
| 206 | +### 5. Agent validation errors at startup |
| 207 | + |
| 208 | +**Symptom:** `ValueError` on agent construction. |
| 209 | +**Common causes:** |
| 210 | +- `"All tools must be set via LlmAgent.tools."` — Don't pass tools via `generate_content_config` |
| 211 | +- `"System instruction must be set via LlmAgent.instruction."` — Don't set via `generate_content_config` |
| 212 | +- `"Response schema must be set via LlmAgent.output_schema."` — Don't set via `generate_content_config` |
| 213 | +**Location:** `src/google/adk/agents/llm_agent.py` — `validate_generate_content_config` |
| 214 | + |
| 215 | +### 6. LLM calls exceeding limit |
| 216 | + |
| 217 | +**Symptom:** `LlmCallsLimitExceededError: Max number of llm calls limit of N exceeded` |
| 218 | +**Cause:** `run_config.max_llm_calls` limit reached. |
| 219 | +**Fix:** Increase `max_llm_calls` in `RunConfig`, or investigate why the agent is looping. |
| 220 | +**Location:** `src/google/adk/agents/invocation_context.py` |
| 221 | + |
| 222 | +### 7. Tool errors silently swallowed |
| 223 | + |
| 224 | +**Symptom:** Tool call fails but agent continues without expected result. |
| 225 | +**Cause:** Errors are caught and returned as function response text. Set `on_tool_error_callback` to customize. |
| 226 | +**Check:** Look for error text in function response events. |
| 227 | + |
| 228 | +### 8. Agent not loading / not discovered |
| 229 | + |
| 230 | +**Symptom:** `adk web` doesn't list the agent, or returns 404. |
| 231 | +**Cause:** Agent directory must follow convention: |
| 232 | +``` |
| 233 | +my_agent/ |
| 234 | + __init__.py # MUST contain: from . import agent |
| 235 | + agent.py # MUST define: root_agent = Agent(...) OR app = App(...) |
| 236 | +``` |
| 237 | + |
| 238 | +### 9. Sync tool blocking the event loop |
| 239 | + |
| 240 | +**Symptom:** Agent hangs or becomes very slow. |
| 241 | +**Cause:** Sync tools run in a thread pool (max 4 workers). All workers busy → new tool calls block. |
| 242 | +**Fix:** Make tools async if they do I/O. |
| 243 | + |
| 244 | +--- |
| 245 | + |
| 246 | +## LLM Finish Reasons |
| 247 | + |
| 248 | +- `STOP` — normal completion |
| 249 | +- `MAX_TOKENS` — output truncated (increase `max_output_tokens`) |
| 250 | +- `SAFETY` — blocked by safety filters |
| 251 | +- `RECITATION` — blocked for recitation |
| 252 | + |
| 253 | +--- |
| 254 | + |
| 255 | +## Event Flow Architecture |
| 256 | + |
| 257 | +``` |
| 258 | +User message |
| 259 | + -> Runner.run_async() |
| 260 | + -> Runner._exec_with_plugin() # persists events, runs plugins |
| 261 | + -> agent.run_async() # yields events |
| 262 | + -> LlmAgent._run_async_impl() |
| 263 | + -> _Mesh.run_node_impl() # multi-agent orchestration |
| 264 | + -> _SingleLlmAgent # reason-act loop (Workflow) |
| 265 | + -> call_llm # LLM request + response |
| 266 | + -> execute_tools # tool dispatch |
| 267 | +``` |
| 268 | + |
| 269 | +--- |
| 270 | + |
| 271 | +## Callback Chain |
| 272 | + |
| 273 | +**Before model call:** PluginManager `run_before_model_callback()` → agent `canonical_before_model_callbacks` |
| 274 | +**After model call:** PluginManager `run_after_model_callback()` → agent `canonical_after_model_callbacks` |
| 275 | +**Before/after tool call:** PluginManager `run_before_tool_callback()` / `run_after_tool_callback()` → agent callbacks |
| 276 | + |
| 277 | +--- |
| 278 | + |
| 279 | +## Key Files for Debugging |
| 280 | + |
| 281 | +| Area | File | |
| 282 | +|---|---| |
| 283 | +| Runner event loop | `src/google/adk/runners.py` | |
| 284 | +| LLM request building | `src/google/adk/agents/llm/basic.py` | |
| 285 | +| Tool dispatch | `src/google/adk/agents/llm/execute_tools_node.py` | |
| 286 | +| Multi-agent orchestration | `src/google/adk/agents/llm/mesh.py` | |
| 287 | +| Content/context building | `src/google/adk/agents/llm/contents.py` | |
| 288 | +| Task content processor | `src/google/adk/agents/llm/task/task_contents_processor.py` | |
| 289 | +| Agent config + validation | `src/google/adk/agents/llm_agent.py` | |
| 290 | +| Event model | `src/google/adk/events/event.py` | |
| 291 | +| Session services | `src/google/adk/sessions/` | |
| 292 | +| Invocation context | `src/google/adk/agents/invocation_context.py` | |
| 293 | +| Web server + debug endpoints | `src/google/adk/cli/adk_web_server.py` | |
| 294 | +| Debug output printer | `src/google/adk/utils/_debug_output.py` | |
| 295 | + |
| 296 | +--- |
| 297 | + |
| 298 | +## Debugging Checklist |
| 299 | + |
| 300 | +1. **Start with logs** — `-v` flag, check `/tmp/agents_log/agent.latest.log` |
| 301 | +2. **Inspect the session** — curl endpoints (`adk web`) or print events (`adk run`) |
| 302 | +3. **Check event actions** — `transfer_to_agent`, `request_task`, `finish_task`, `escalate` |
| 303 | +4. **Check event.output** — single_turn and task agents set output here |
| 304 | +5. **Check traces** — `/debug/trace/session/{id}` for model/token usage |
| 305 | +6. **Verify agent structure** — `__init__.py` imports, `root_agent` or `app` defined |
| 306 | +7. **Check tool responses** — look for error text in function response events |
| 307 | +8. **Check LLM finish reason** — `STOP`, `MAX_TOKENS`, `SAFETY` |
| 308 | +9. **Test in isolation** — create a minimal agent with just the problem tool/config |
0 commit comments