Skip to content

manogyasingh/bharatbricks-hackathon

Repository files navigation

AI PDF Form Filler

AI-powered PDF form filling with voice support for Indian languages. Upload a PDF, let a vision LLM detect blank fields, fill them via text or voice, and download the completed form.

Stack: React + FastAPI + Gemini 3 Flash + Sarvam AI + PyMuPDF

Architecture

graph LR
    subgraph "Frontend"
        React["React SPA"]
    end
    subgraph "Backend"
        API["FastAPI"]
    end
    subgraph "AI Services"
        Gemini["Gemini Flash<br/>Form Detection"]
        Sarvam["Sarvam AI<br/>Voice I/O"]
    end
    subgraph "Databricks Platform"
        Volumes["Unity Catalog<br/>Volumes"]
        Delta["Delta Lake<br/>Tables"]
        SQL["SQL<br/>Warehouse"]
    end
    React -->|"REST API"| API
    API -->|"Vision API"| Gemini
    API -->|"TTS / STT"| Sarvam
    API -->|"File Storage"| Volumes
    API -->|"SQL Connector"| SQL
    SQL -->|"Read/Write"| Delta
Loading
┌─────────────────────────────────────────────────┐
│  React Frontend (Vite)                          │
│  ├── PDF upload (drag & drop)                   │
│  ├── Bounding box preview (CSS overlays)        │
│  ├── Field editor (text + voice)                │
│  └── Analytics dashboard (recharts)             │
├─────────────────────────────────────────────────┤
│  FastAPI Backend                                │
│  ├── /api/upload     — PDF intake               │
│  ├── /api/analyze    — LLM field detection      │
│  ├── /api/generate   — Filled PDF output        │
│  ├── /api/tts, /stt  — Sarvam AI voice I/O     │
│  └── /api/analytics  — Delta Lake queries       │
├─────────────────────────────────────────────────┤
│  Python Utils                                   │
│  ├── llm_helper.py      — Gemini 3 Flash via OpenRouter  │
│  ├── pdf_processor.py   — PyMuPDF (fitz)        │
│  ├── sarvam_helper.py   — Indian language TTS/STT│
│  └── databricks_storage.py — Unity Catalog / Delta Lake  │
└─────────────────────────────────────────────────┘

Prerequisites

  • Python 3.11+
  • Node.js 18+ and npm
  • uv (recommended) or pip
  • An OpenRouter API key (for Gemini 3 Flash vision model)
  • (Optional) A Sarvam AI API key (for Indian language voice features)

Quick Start (Local)

1. Clone and enter the project

git clone https://github.com/<your-org>/databricks-hack.git
cd databricks-hack/databricks_app

2. Set up environment variables

Create a .env file:

OPENROUTER_API_KEY="sk-or-v1-..."
SARVAM_API_KEY="sk_..."          # optional, for voice features

3. Install Python dependencies

# Using uv (recommended)
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt

# Or using pip
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

4. Install frontend dependencies and build

cd frontend
npm install
npm run build      # builds to ../static/
cd ..

5. Run the server

# Load .env and start
export $(grep -v '^#' .env | xargs)
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Open http://localhost:8000 in your browser.

6. (Optional) Frontend dev mode with hot reload

In a separate terminal, for live frontend development:

cd frontend
npm run dev        # starts Vite dev server on port 5173

The Vite dev server proxies /api/* requests to localhost:8000 (configured in vite.config.js). Use http://localhost:5173 during frontend development.

Usage

  1. Upload a PDF form (drag & drop or click to browse)
  2. Analyze — AI detects all blank input fields (underlines, boxes, checkboxes)
  3. Fill — Enter values via text input or toggle voice mode for speech-to-text
  4. Download — Generate and download the filled PDF

Deploy to Databricks Apps

One-command deploy

./deploy.sh

The script handles everything:

  • Installs Databricks CLI if missing
  • Authenticates (reads DATABRICKS_TOKEN env var or prompts)
  • Pushes secrets from .env to the pdf-form-filler scope
  • Binds resources (secrets + SQL warehouse) to the app
  • Builds the React frontend
  • Stages, uploads, and deploys

Manual deploy

# 1. Install Databricks CLI
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

# 2. Authenticate
databricks auth login --host https://dbc-cec27335-fd8f.cloud.databricks.com

# 3. Push secrets
databricks secrets put-secret pdf-form-filler openrouter-api-key --string-value "sk-or-v1-..."
databricks secrets put-secret pdf-form-filler sarvam-api-key --string-value "sk_..."

# 4. Build frontend
cd frontend && npm run build && cd ..

# 5. Upload (exclude dev artifacts)
rsync -a --delete \
  --exclude='.venv' --exclude='frontend/node_modules' \
  --exclude='__pycache__' --exclude='*.pyc' --exclude='.env' \
  . /tmp/deploy-stage/
databricks workspace import-dir /tmp/deploy-stage \
  /Workspace/Users/<you>/pdf-form-filler --overwrite

# 6. Bind resources
databricks apps update pdf-form-filler --json '{
  "resources": [
    {"name":"openrouter-api-key","secret":{"scope":"pdf-form-filler","key":"openrouter-api-key","permission":"READ"}},
    {"name":"sarvam-api-key","secret":{"scope":"pdf-form-filler","key":"sarvam-api-key","permission":"READ"}},
    {"name":"sql-warehouse","sql_warehouse":{"id":"<WAREHOUSE_ID>","permission":"CAN_USE"}}
  ]
}'

# 7. Deploy
databricks apps deploy pdf-form-filler \
  --source-code-path /Workspace/Users/<you>/pdf-form-filler

Project Structure

databricks_app/
├── main.py                 # FastAPI entry point (serves static + API)
├── app.yaml                # Databricks Apps runtime config
├── requirements.txt        # Python dependencies
├── deploy.sh               # One-command deploy script
├── .env                    # API keys (not committed)
├── api/
│   ├── pdf.py              # /api/upload, /api/analyze, /api/generate
│   ├── voice.py            # /api/tts, /api/stt, /api/clean-value
│   └── analytics.py        # /api/analytics/*
├── utils/
│   ├── llm_helper.py       # Gemini 3 Flash vision LLM (field detection)
│   ├── pdf_processor.py    # PDF ↔ image, text overlay (PyMuPDF)
│   ├── sarvam_helper.py    # Sarvam AI TTS/STT
│   └── databricks_storage.py  # Unity Catalog Volumes + Delta Lake
├── frontend/
│   ├── package.json
│   ├── vite.config.js
│   └── src/
│       ├── App.jsx
│       ├── api.js          # Fetch wrappers for /api/*
│       ├── styles/index.css
│       └── components/
│           ├── form-filler/  # FormFiller, PdfPreview, FieldEditor, VoiceInput
│           ├── analytics/    # Analytics dashboard
│           ├── help/         # Help page
│           └── layout/       # Navbar, StatusBadge
├── static/                 # Built frontend (generated by npm run build)
└── setup/
    └── create_tables.sql   # Delta Lake table definitions

Key Technologies

Component Technology Purpose
Vision LLM Gemini 3 Flash (via OpenRouter) Detect form fields with bounding boxes
PDF Processing PyMuPDF (fitz) PDF → image, text overlay
Voice Input Sarvam AI Indian language TTS/STT (10 languages)
Frontend React 18 + Vite Interactive form filling UI
Backend FastAPI + Uvicorn REST API serving
Storage Unity Catalog Volumes PDF file storage
Analytics Delta Lake Form submission tracking
Charts Recharts Analytics visualizations

Supported Voice Languages

English (India), Hindi, Tamil, Telugu, Kannada, Malayalam, Marathi, Bengali, Gujarati, Punjabi

Environment Variables

Variable Required Description
OPENROUTER_API_KEY Yes OpenRouter API key for Gemini 3 Flash
SARVAM_API_KEY No Sarvam AI key for voice features
SQL_WAREHOUSE_ID No Databricks SQL warehouse (for analytics on Databricks)
LLM_MODEL_NAME No Override vision model (default: google/gemini-3-flash-preview)
LLM_API_BASE No Override API base URL (default: OpenRouter)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors