A unified ADK agent for document processing that combines an end-to-end inference pipeline with an interactive learning system for continuous improvement -- all in a single self-contained agent.
Current reference implementation: Invoice Processing.
| Property | Value |
|---|---|
| Interaction Type | Conversational |
| Complexity | Advanced |
| Agent Type | Single Agent (dual-mode) |
| Vertical | Finance / Document Processing |
| ADK Pattern | LlmAgent + 18 FunctionTools |
| Model | gemini-2.5-flash |
| Framework | Google Agent Development Kit (ADK) |
| Component | Description |
|---|---|
| Dual-Mode Prompt | Single agent supports both Inference and Learning modes, selectable at session start with seamless switching |
| 9-Agent Acting Pipeline | Classification, extraction, 4-phase validation, transformation, output generation, audit logging |
| 3-Layer Investigation | Deterministic checks (Layer 1), LLM-powered rule discovery with SHA-256 caching (Layer 2), per-group ultra-conservative validation (Layer 3) |
| ALF Correction Engine | Collect-Plan-Execute pipeline: deterministic condition matching (24 operators), scope-based mutual exclusion, hybrid execution (LLM + deterministic) |
| Impact Assessment | Evaluates proposed rules against all existing cases to detect collateral matches before committing |
| Rule Management | Schema validation, conflict detection, backup on write, ID auto-assignment |
| Session Logging | Full audit trail of SME interactions, rule proposals, and approvals |
| Schema-Driven Eval | Two-layer evaluation: deterministic field comparison + optional LLM-as-judge |
| Domain-Agnostic Config | All domain knowledge in master_data.yaml -- swap to adapt to any document type |
- Black box acting agent -- the acting agent is never modified; all evolution happens downstream (ALF, learning)
- Self-contained agent -- all data, libraries, and test cases live inside the agent package
- Layered corrections -- deterministic rules first, LLM only when needed, human approval always
- Configuration over code -- domain knowledge lives in
master_data.yaml, not in source code - Human governance -- every correction rule requires SME review and approval
- Backward compatible -- all components fall back to hardcoded invoice defaults when no master data is available
Inference tools (2):
list_inference_cases()-- discover available casesrun_inference(case_id, skip_investigation="true"|"false")-- run Acting -> Investigation -> ALF pipeline (Investigation is optional)
Learning tools (16):
list_cases(),load_case(case_id)-- browse and review processed casesdiscover_safe_rule(case_id, sme_feedback)-- generate rule with automatic safety loop (validate -> assess impact -> auto-tighten)revise_safe_rule(case_id, rule_json, sme_feedback)-- revise a rule with the same safety loopbuild_rule_discovery_context(case_id, sme_feedback)-- build LLM context for manual rule generationbuild_rule_revision_context(case_id, rule_json, feedback, impact)-- build context for manual revisionvalidate_rule(rule_json),check_conflicts(rule_json)-- schema and conflict checksassess_impact(conditions_json, target_case_id)-- cross-case impact analysiswrite_rule(rule_json, mode),delete_rule(rule_id)-- persist/remove rules inrule_base.jsonget_existing_rules(),get_existing_scopes(),get_next_rule_id()-- rule base queriesformat_rule_display(rule_json)-- human-readable rule formattinglog_session_event(event_type, data_json),save_session()-- audit logging
Agent: Welcome to Invoice Processing! Which mode would you like to work in?
(1) Inference -- process cases, or (2) Learning -- review & create rules?
You: 1
Agent: Great, we're in Inference mode.
You: Run case_002 and case_005, skip the critic
Agent: Inference pipeline completed for case_002:
Stage 1 -- Acting Agent: ACCEPT (Pending Payment)
Vendor: Precision Tech Co Pty Ltd | Invoice: INV-00002 ($555.50 AUD)
Investigation: SKIPPED
Stage 2 -- ALF: 0 rules evaluated -- output unchanged.
Final output: data/agent_output/case_002/
Agent: Inference pipeline completed for case_005:
Stage 1 -- Acting Agent: REJECT
Vendor: QuickFix HVAC Repairs | Invoice: INV-00005 ($1,069.32 AUD)
Rejection Phase: Phase 4 (Step 4.3: Labour hours within authorization)
Rejection Reason: Invoice does not match work authorization
(Invoice: 2.5h, WAF authorized hours: 0)
Investigation: SKIPPED
Stage 2 -- ALF: 0 rules evaluated -- output unchanged.
Final output: data/agent_output/case_005/
The SME reviews case_005. The acting agent rejected at Phase 4 Step 4.3 because the invoice bills 2.5 labour hours but no WAF was submitted to authorize them. The calculations are correct (totals match, GST is 10%) -- this is a work authorization issue, not a calculation error. The SME identifies this as a policy exception: emergency maintenance jobs under $2,000 are exempt from WAF requirements per updated procurement policy.
You: Switch to learning
Agent: Switched to Learning mode.
You: Load case_005
Agent: Case case_005 loaded:
Decision: REJECT
Rejection reason: 'Invoice does not match work authorization'
Failed at: Phase 4, Step 4.3: Labour hours within authorization
Evidence: Invoice: 2.5h, WAF authorized hours: 0
Vendor: QuickFix HVAC Repairs (ABN: 51824753556)
Invoice: INV-00005 ($1,069.32 AUD), dated 2026-02-11
Line Items: Gasket and seal kit (3 x $219.87) + Technician labour (2.5h x $125)
Subtotal: $972.11 | GST: $97.21 | Total: $1,069.32
Has WAF: No (waf_count: 0)
Validation phases:
Phase 1 (Intake): CONTINUE (4/4 passed)
Phase 2 (PO/Invoice): CONTINUE (1/1 passed)
Phase 3 (Status/Date): CONTINUE (2/2 passed)
Phase 4 (Totals/EWAF): REJECT (2/3 passed)
FAILED Step 4.3: Labour hours within authorization
You: This is an emergency HVAC repair. Our updated procurement policy
exempts maintenance jobs under $2,000 from the WAF requirement.
Create a rule to accept these cases.
Agent: Generating rule via discover_safe_rule...
=== Proposed Rule: ALF-001 ===
Name: WAF Exemption for Emergency Maintenance Under $2,000
Scope: waf_exemption | Priority: 50
Conditions:
1. decision_phase4 equals "REJECT"
2. phase4.rejection_template contains "work authorization"
3. invoice.total_amount less_than 2000
Actions:
1. set_field
Invoice Processing.Invoice Status = "Pending Payment"
Invoice Processing.Rejection Reason = ""
Invoice Processing.Rejection Phase = ""
Metadata:
Root cause: Policy exception -- updated procurement policy
not yet reflected in rules book
Impact Assessment:
Case case_005: MATCH (target)
4 other cases: NO MATCH (safe)
Would you like to approve, revise, or discard?
You: Add a condition for maintenance service categories only.
Agent: Revising via revise_safe_rule...
Revised Conditions:
1. decision_phase4 equals "REJECT"
2. phase4.rejection_template contains "work authorization"
3. invoice.total_amount less_than 2000
4. invoice.service_category in ["HVAC", "ELECTRICAL", "PLUMBING"]
Impact: case_005 MATCH (HVAC, $1,069.32), 4 others safe.
You: Approve
Agent: Rule ALF-001 written. Total rules: 1.
ALF rules handle exceptions not covered by the rules book -- not bugs in the acting agent. They use low-effort deterministic actions (set_field) to patch output fields directly without re-running pipeline stages.
The diagram above illustrates the three-zone architecture of the Invoice Processing agent:
-
Zone 1 -- The Constitution Architecture: The Reconstructed Rules Book serves as the agent's "constitution" -- the single source of truth and transparency. It governs how both the Acting Agent and the Critic Agent (Investigation) behave, ensuring all decisions are traceable back to documented rules.
-
Zone 2 -- The Runtime Inference Pipeline: An input invoice flows through three sequential stages. First, the Acting Agent processes it through a 4-step internal pipeline (Classify, Extract, Validate, Reason). Next, the Critic Agent (Investigation) audits the Acting Agent's output against the constitution, with the ability to STOP or allow continuation. Finally, the ALF engine (Adaptive Learning Framework) checks its Rule Base for matching correction rules and applies a Collect-Plan-Execute pipeline to produce the final approved output.
-
Zone 3 -- The Learning & Evolution Loop: When a Human Expert (SME) flags an error in the final output, they provide feedback to the Rule Learning Agent (RLA). The RLA generates a new exception rule that is written into the ALF Rule Base. On subsequent inference runs, the ALF engine automatically applies this correction. Over time, a Periodic System Review promotes frequent exception rules into permanent changes to the Acting Agent itself (green arrow), closing the evolution loop.
exemplary_data/
(input PDFs + ground truth)
|
v
+-------------------------------+
| Invoice Processing |
| (single LlmAgent, 18 tools)|
+-------------------------------+
| |
INFERENCE MODE LEARNING MODE
| |
run_inference() load_case() + SME feedback
| |
+----+----+----+ discover_safe_rule()
| | | | |
v v v v generate -> validate -> assess -> write
[Act][Inv][ALF] | |
| | | | rule_base.json
| | | | (new/updated rules)
v v v v |
data/agent_output/ +----------+
data/alf_output/ |
v
Next inference run
picks up new rules
ACTING PIPELINE (9 agents):
PDF -> Classify -> Extract -> Phase1 -> Phase2 -> Phase3 -> Phase4
-> Transform -> Output -> Audit
INVESTIGATION (3 layers):
Layer 1: Deterministic (data source, bypass, tolerance)
Layer 2: LLM rule discovery (cached by SHA-256)
Layer 3: Per-group validation (ultra-conservative LLM)
ALF ENGINE (Collect-Plan-Execute):
Collect: Evaluate rules deterministically (24 operators, scope exclusion)
Plan: Categorize actions into 3 tiers
Execute: Tier 1 LLM pipeline continuation | Tier 2 LLM field patch | Tier 3 deterministic
invoice-processing/
├── invoice_processing/ # Python package (fully self-contained)
│ ├── __init__.py # Exports root_agent
│ ├── agent.py # LlmAgent + run_inference pipeline + root_agent
│ ├── prompt.py # Dual-mode instruction prompt
│ ├── tools/
│ │ └── tools.py # 18 FunctionTools (inference + learning)
│ ├── shared_libraries/
│ │ ├── master_data_loader.py # Domain config loader
│ │ ├── invoice_master_data.yaml
│ │ ├── alf_engine.py # ALF correction engine (87 KB)
│ │ ├── acting/
│ │ │ └── general_invoice_agent.py # 9-agent pipeline (61 KB)
│ │ └── investigation/
│ │ └── investigate_agent_reconst.py # 3-layer validation
│ ├── core/ # Learning logic
│ │ ├── config.py # Central path/LLM configuration
│ │ ├── case_loader.py # Load processed case artifacts
│ │ ├── impact_assessor.py # Rule impact analysis across all cases
│ │ ├── rule_writer.py # Rule validation, conflict detection, persistence
│ │ ├── rule_discoverer.py # LLM-driven rule generation
│ │ ├── safe_rule_orchestrator.py # Programmatic safety loop for rule discovery
│ │ ├── session_logger.py # Audit logging
│ │ └── prompts.py # LLM prompt templates for rule discovery
│ ├── data/ # Runtime data (inside agent package)
│ │ ├── agent_output/ # Per-case processing artifacts
│ │ ├── alf_output/ # ALF-corrected outputs
│ │ ├── investigation_output/ # Investigation reports
│ │ ├── eval_results/ # Evaluation results
│ │ ├── learning_sessions/ # Session logs
│ │ ├── rule_base.json # ALF correction rules
│ │ ├── reconstructed_rules_book.md
│ │ └── rule_discovery_cache.json
│ ├── exemplary_data/ # Test cases with PDFs and ground truth
│ │ └── case_001/ ... case_005/
│ └── sub_agents/
├── deployment/
├── eval/
│ ├── eval.py # Schema-driven ground truth evaluation
│ └── compare_postprocessing.py # ALF before/after diff
├── tests/
├── pyproject.toml
├── .env.example
└── README.md # This file
- Python 3.10+
- uv for dependency management
- Google Cloud project with Vertex AI API enabled
- Google ADK
- GCP authentication configured
# Navigate to the agent directory
cd agents/invoice-processing
# Install dependencies using uv
uv sync
# Copy and configure environment variables
cp .env.example .env
# Edit .env with your GCP project ID:
# PROJECT_ID=your-gcp-project-id
# LOCATION=us-central1
# Authenticate with GCP
gcloud auth application-default loginSee .env.example for the full list. Key variables:
| Variable | Default | Description |
|---|---|---|
PROJECT_ID |
(required) | GCP project ID |
LOCATION |
us-central1 |
GCP region |
GEMINI_FLASH_MODEL |
gemini-2.5-flash |
Model for the ADK agent |
GEMINI_PRO_MODEL |
gemini-2.5-pro |
Model for ALF pipeline continuation and investigation |
API_CALL_DELAY_SECONDS |
1.0 |
Rate limiting between API calls |
# Navigate back to the agents directory
cd ..
# Launch via ADK web UI
adk web invoice_processing
# Open http://127.0.0.1:8000 in your browser
# Select "invoice_processing" from the app listThe agent will greet you with a mode selection prompt. Choose Inference to process cases or Learning to review and create rules.
# ADK CLI mode (non-interactive, run from agents directory)
adk run invoice_processing| What to change | Where |
|---|---|
| Agent instructions and conversation behavior | invoice_processing/prompt.py -- edit INVOICE_PROCESSING_INSTRUCTION |
| Pipeline stages, gating logic, stage ordering | invoice_processing/agent.py -- edit run_inference() |
| Which tools are available to the agent | invoice_processing/agent.py -- edit the tools=[] list in root_agent |
| Model selection | invoice_processing/agent.py -- change model= parameter in root_agent |
-
Add your function to
invoice_processing/tools/tools.py:def my_new_tool(param: str) -> dict: """Description shown to the LLM. Args documented here.""" # your logic return {"result": "..."}
-
Import and register it in
invoice_processing/agent.py:from invoice_processing.tools.tools import my_new_tool root_agent = LlmAgent( ... tools=[..., my_new_tool], )
-
Update the prompt in
invoice_processing/prompt.pyto tell the agent when and how to use the new tool.
| What to change | How |
|---|---|
| Domain configuration | Replace shared_libraries/invoice_master_data.yaml with your domain's YAML. The MasterData class provides typed accessors for 11 sections: document types, extraction schemas, taxonomies, validation pipeline, output schema, eval comparison groups, and more. |
| Validation rules | Edit data/reconstructed_rules_book.md -- the "constitution" that the investigation layer validates against. |
| ALF correction rules | Edit data/rule_base.json directly, or use Learning mode to create rules interactively. |
| Test cases | Add new case folders to exemplary_data/ with PDFs and optional ground truth Postprocessing_Data.json. |
- Create a new
your_domain_master_data.yamlfollowing the schema ininvoice_master_data.yaml - Replace the acting agent in
shared_libraries/acting/with your domain's processing pipeline - Update
data/reconstructed_rules_book.mdwith your domain's validation rules - Add test cases to
exemplary_data/ - All framework components (ALF, investigation, eval) automatically adapt via the master data configuration -- no code changes needed
The evaluation framework lives in eval/ and provides schema-driven assessment of agent output quality.
Two-layer evaluation (implemented in eval/eval.py):
| Layer | Type | Cost | Description |
|---|---|---|---|
| Layer 1: Deterministic | Field-by-field comparison | Free | Compares agent output against ground truth using comparison groups defined in master data. Instant, reproducible, zero cost. |
| Layer 2: LLM-as-Judge | Holistic alignment | ~1 API call/case | Single Gemini call per case producing an overall alignment verdict. Optional (--skip-llm to disable). |
- Field-level match rates per comparison group (header fields, line items, totals, tax, decision)
- Decision alignment: does the agent's ACCEPT/REJECT match ground truth?
- Financial tolerance: configurable threshold for numeric comparisons (default: $0.02)
- LLM verdicts (when enabled):
ALIGNED-- output matches ground truth in all material respectsPARTIALLY_ALIGNED-- correct decision but some field differencesNOT_ALIGNED-- wrong decision or critical data errors
# Full evaluation (deterministic + LLM)
uv run eval/eval.py \
--ground-truth agents/invoice-processing/invoice_processing/exemplary_data \
--agent-output agents/invoice-processing/invoice_processing/data/agent_output
# Deterministic only (no LLM, no cost)
uv run eval/eval.py \
--ground-truth agents/invoice-processing/invoice_processing/exemplary_data \
--agent-output agents/invoice-processing/invoice_processing/data/agent_output \
--skip-llm
# Single case evaluation
uv run eval/eval.py --case case_001
# Custom financial tolerance
uv run eval/eval.py \
--ground-truth agents/invoice-processing/invoice_processing/exemplary_data \
--agent-output agents/invoice-processing/invoice_processing/data/agent_output \
--tolerance 0.05
# Compare original vs ALF-revised output (before/after diff)
python agents/invoice-processing/eval/compare_postprocessing.pyResults are saved to invoice_processing/data/eval_results/.
To deploy Invoice Processing to a cloud environment, follow the ADK Samples Integration instructions to deploy via Google Agents CLI.
See deployment/README.md for details.
In local development, all data lives inside the agent package (invoice_processing/data/ and invoice_processing/exemplary_data/). For production deployment, these directories should be replaced with Google Cloud Storage (GCS) buckets so that:
- Incoming invoice cases are read from a bucket where upstream systems or users upload PDFs
- Intermediate and final outputs are written to a bucket for downstream consumption
- Rule base and rules book are stored in a bucket accessible to SMEs for review and editing
| Local Path | GCS Bucket Path | Direction | Description |
|---|---|---|---|
invoice_processing/exemplary_data/ |
gs://{BUCKET}/incoming_cases/ |
Read | Invoice PDFs and supporting documents uploaded by users or upstream systems |
invoice_processing/data/agent_output/ |
gs://{BUCKET}/agent_output/ |
Write | Per-case intermediate artifacts (classification, extraction, validation, etc.) |
invoice_processing/data/alf_output/ |
gs://{BUCKET}/alf_output/ |
Write | ALF-corrected final outputs |
invoice_processing/data/investigation_output/ |
gs://{BUCKET}/investigation_output/ |
Write | Investigation compliance reports |
invoice_processing/data/eval_results/ |
gs://{BUCKET}/eval_results/ |
Write | Evaluation results |
invoice_processing/data/learning_sessions/ |
gs://{BUCKET}/learning_sessions/ |
Write | SME session logs |
invoice_processing/data/rule_base.json |
gs://{BUCKET}/config/rule_base.json |
Read/Write | ALF correction rules (user-facing) |
invoice_processing/data/reconstructed_rules_book.md |
gs://{BUCKET}/config/reconstructed_rules_book.md |
Read | Validation rules constitution (user-facing) |
invoice_processing/data/rule_discovery_cache.json |
gs://{BUCKET}/config/rule_discovery_cache.json |
Read/Write | Cached rule discovery results |
gs://your-invoice-processing-bucket/
├── incoming_cases/ # Upload invoices here
│ ├── case_001/
│ │ ├── invoice.pdf
│ │ └── waf.pdf
│ ├── case_002/
│ │ └── invoice.pdf
│ └── .../
│
├── agent_output/ # Agent writes intermediate artifacts here
│ ├── case_001/
│ │ ├── 01_classification.json
│ │ ├── 02_extraction.json
│ │ ├── ...
│ │ └── Postprocessing_Data.json
│ └── .../
│
├── alf_output/ # ALF writes corrected outputs here
│ ├── case_001/
│ │ ├── Postprocessing_Data.json
│ │ └── alf_audit_log.json
│ └── .../
│
├── investigation_output/ # Investigation reports
├── eval_results/ # Evaluation results
├── learning_sessions/ # SME session logs
│
└── config/ # User-facing configuration files
├── rule_base.json # SMEs review and approve rules here
├── reconstructed_rules_book.md # Validation rules (editable by admins)
└── rule_discovery_cache.json # Cached rule discovery
Add the following to .env for production GCS integration:
# GCS Integration (production mode)
GCS_ENABLED=true
GCS_BUCKET=your-invoice-processing-bucket
GCS_INPUT_PREFIX=incoming_cases # where incoming invoice cases are uploaded
GCS_OUTPUT_PREFIX=agent_output # where agent writes intermediate artifacts
GCS_ALF_PREFIX=alf_output # where ALF writes corrected outputs
GCS_CONFIG_PREFIX=config # where rule_base.json and rules_book.md liveThe following files need modification to support GCS I/O instead of local file paths:
| File | What to Change |
|---|---|
invoice_processing/core/config.py |
Add GCS path resolution: when GCS_ENABLED=true, resolve DATA_DIR, AGENTIC_FLOW_OUT, ALF_OUT_DIR, RULE_BASE_PATH, RULES_BOOK_PATH, and SESSIONS_DIR to GCS paths instead of local paths |
invoice_processing/agent.py |
Update run_inference() to read source cases from GCS (gs://{BUCKET}/incoming_cases/{case_id}/) and write outputs to GCS |
invoice_processing/shared_libraries/acting/general_invoice_agent.py |
Replace local OUTPUT_BASE_DIR file I/O with GCS reads/writes using google-cloud-storage client |
invoice_processing/shared_libraries/investigation/investigate_agent_reconst.py |
Update AGENT_OUTPUT_DIR, INVESTIGATION_OUTPUT_DIR, and RULES_BOOK_PATH to read from/write to GCS |
invoice_processing/shared_libraries/alf_engine.py |
Update ALF_OUT_DIR and rule base loading to use GCS |
invoice_processing/core/rule_writer.py |
Update RULE_BASE_PATH reads/writes and backup logic for GCS |
invoice_processing/tools/tools.py |
Update EXEMPLARY_DIR and DATA_DIR to support GCS paths |
In production, the following files should be accessible to SMEs and administrators through the GCS bucket (or a UI built on top of it):
| File | Audience | Access | Purpose |
|---|---|---|---|
config/rule_base.json |
SMEs, Admins | Read/Write | Review, approve, and manually edit ALF correction rules |
config/reconstructed_rules_book.md |
Admins | Read/Write | Update the validation rules constitution that the investigation layer validates against |
alf_output/{case_id}/Postprocessing_Data.json |
AP Team | Read | Review ALF-corrected invoice decisions |
agent_output/{case_id}/Postprocessing_Data.json |
AP Team | Read | Review original agent decisions before ALF correction |
learning_sessions/*.json |
Admins | Read | Audit trail of SME rule creation sessions |
The agent ships with 5 sample invoice cases in exemplary_data/:
| Case | Vendor | Total | Acting Decision | Phase | Scenario |
|---|---|---|---|---|---|
| case_001 | FastTrack Logistics | $733.70 | REJECT | Phase 3 | Vendor tax ID invalid |
| case_002 | Precision Tech Co Pty Ltd | $555.50 | ACCEPT | -- | Preventative maintenance, 3 line items, all valid |
| case_005 | QuickFix HVAC Repairs | $1,069.32 | REJECT | Phase 4 | Labour hours not authorized -- no WAF submitted (Step 4.3) |
