AI-powered PDF form filling with voice support for Indian languages. Upload a PDF, let a vision LLM detect blank fields, fill them via text or voice, and download the completed form.
Stack: React + FastAPI + Gemini 3 Flash + Sarvam AI + PyMuPDF
graph LR
subgraph "Frontend"
React["React SPA"]
end
subgraph "Backend"
API["FastAPI"]
end
subgraph "AI Services"
Gemini["Gemini Flash<br/>Form Detection"]
Sarvam["Sarvam AI<br/>Voice I/O"]
end
subgraph "Databricks Platform"
Volumes["Unity Catalog<br/>Volumes"]
Delta["Delta Lake<br/>Tables"]
SQL["SQL<br/>Warehouse"]
end
React -->|"REST API"| API
API -->|"Vision API"| Gemini
API -->|"TTS / STT"| Sarvam
API -->|"File Storage"| Volumes
API -->|"SQL Connector"| SQL
SQL -->|"Read/Write"| Delta
┌─────────────────────────────────────────────────┐
│ React Frontend (Vite) │
│ ├── PDF upload (drag & drop) │
│ ├── Bounding box preview (CSS overlays) │
│ ├── Field editor (text + voice) │
│ └── Analytics dashboard (recharts) │
├─────────────────────────────────────────────────┤
│ FastAPI Backend │
│ ├── /api/upload — PDF intake │
│ ├── /api/analyze — LLM field detection │
│ ├── /api/generate — Filled PDF output │
│ ├── /api/tts, /stt — Sarvam AI voice I/O │
│ └── /api/analytics — Delta Lake queries │
├─────────────────────────────────────────────────┤
│ Python Utils │
│ ├── llm_helper.py — Gemini 3 Flash via OpenRouter │
│ ├── pdf_processor.py — PyMuPDF (fitz) │
│ ├── sarvam_helper.py — Indian language TTS/STT│
│ └── databricks_storage.py — Unity Catalog / Delta Lake │
└─────────────────────────────────────────────────┘
- Python 3.11+
- Node.js 18+ and npm
- uv (recommended) or pip
- An OpenRouter API key (for Gemini 3 Flash vision model)
- (Optional) A Sarvam AI API key (for Indian language voice features)
git clone https://github.com/<your-org>/databricks-hack.git
cd databricks-hack/databricks_appCreate a .env file:
OPENROUTER_API_KEY="sk-or-v1-..."
SARVAM_API_KEY="sk_..." # optional, for voice features# Using uv (recommended)
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt
# Or using pip
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtcd frontend
npm install
npm run build # builds to ../static/
cd ..# Load .env and start
export $(grep -v '^#' .env | xargs)
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadOpen http://localhost:8000 in your browser.
In a separate terminal, for live frontend development:
cd frontend
npm run dev # starts Vite dev server on port 5173The Vite dev server proxies /api/* requests to localhost:8000 (configured in vite.config.js). Use http://localhost:5173 during frontend development.
- Upload a PDF form (drag & drop or click to browse)
- Analyze — AI detects all blank input fields (underlines, boxes, checkboxes)
- Fill — Enter values via text input or toggle voice mode for speech-to-text
- Download — Generate and download the filled PDF
./deploy.shThe script handles everything:
- Installs Databricks CLI if missing
- Authenticates (reads
DATABRICKS_TOKENenv var or prompts) - Pushes secrets from
.envto thepdf-form-fillerscope - Binds resources (secrets + SQL warehouse) to the app
- Builds the React frontend
- Stages, uploads, and deploys
# 1. Install Databricks CLI
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
# 2. Authenticate
databricks auth login --host https://dbc-cec27335-fd8f.cloud.databricks.com
# 3. Push secrets
databricks secrets put-secret pdf-form-filler openrouter-api-key --string-value "sk-or-v1-..."
databricks secrets put-secret pdf-form-filler sarvam-api-key --string-value "sk_..."
# 4. Build frontend
cd frontend && npm run build && cd ..
# 5. Upload (exclude dev artifacts)
rsync -a --delete \
--exclude='.venv' --exclude='frontend/node_modules' \
--exclude='__pycache__' --exclude='*.pyc' --exclude='.env' \
. /tmp/deploy-stage/
databricks workspace import-dir /tmp/deploy-stage \
/Workspace/Users/<you>/pdf-form-filler --overwrite
# 6. Bind resources
databricks apps update pdf-form-filler --json '{
"resources": [
{"name":"openrouter-api-key","secret":{"scope":"pdf-form-filler","key":"openrouter-api-key","permission":"READ"}},
{"name":"sarvam-api-key","secret":{"scope":"pdf-form-filler","key":"sarvam-api-key","permission":"READ"}},
{"name":"sql-warehouse","sql_warehouse":{"id":"<WAREHOUSE_ID>","permission":"CAN_USE"}}
]
}'
# 7. Deploy
databricks apps deploy pdf-form-filler \
--source-code-path /Workspace/Users/<you>/pdf-form-fillerdatabricks_app/
├── main.py # FastAPI entry point (serves static + API)
├── app.yaml # Databricks Apps runtime config
├── requirements.txt # Python dependencies
├── deploy.sh # One-command deploy script
├── .env # API keys (not committed)
├── api/
│ ├── pdf.py # /api/upload, /api/analyze, /api/generate
│ ├── voice.py # /api/tts, /api/stt, /api/clean-value
│ └── analytics.py # /api/analytics/*
├── utils/
│ ├── llm_helper.py # Gemini 3 Flash vision LLM (field detection)
│ ├── pdf_processor.py # PDF ↔ image, text overlay (PyMuPDF)
│ ├── sarvam_helper.py # Sarvam AI TTS/STT
│ └── databricks_storage.py # Unity Catalog Volumes + Delta Lake
├── frontend/
│ ├── package.json
│ ├── vite.config.js
│ └── src/
│ ├── App.jsx
│ ├── api.js # Fetch wrappers for /api/*
│ ├── styles/index.css
│ └── components/
│ ├── form-filler/ # FormFiller, PdfPreview, FieldEditor, VoiceInput
│ ├── analytics/ # Analytics dashboard
│ ├── help/ # Help page
│ └── layout/ # Navbar, StatusBadge
├── static/ # Built frontend (generated by npm run build)
└── setup/
└── create_tables.sql # Delta Lake table definitions
| Component | Technology | Purpose |
|---|---|---|
| Vision LLM | Gemini 3 Flash (via OpenRouter) | Detect form fields with bounding boxes |
| PDF Processing | PyMuPDF (fitz) | PDF → image, text overlay |
| Voice Input | Sarvam AI | Indian language TTS/STT (10 languages) |
| Frontend | React 18 + Vite | Interactive form filling UI |
| Backend | FastAPI + Uvicorn | REST API serving |
| Storage | Unity Catalog Volumes | PDF file storage |
| Analytics | Delta Lake | Form submission tracking |
| Charts | Recharts | Analytics visualizations |
English (India), Hindi, Tamil, Telugu, Kannada, Malayalam, Marathi, Bengali, Gujarati, Punjabi
| Variable | Required | Description |
|---|---|---|
OPENROUTER_API_KEY |
Yes | OpenRouter API key for Gemini 3 Flash |
SARVAM_API_KEY |
No | Sarvam AI key for voice features |
SQL_WAREHOUSE_ID |
No | Databricks SQL warehouse (for analytics on Databricks) |
LLM_MODEL_NAME |
No | Override vision model (default: google/gemini-3-flash-preview) |
LLM_API_BASE |
No | Override API base URL (default: OpenRouter) |