Skip to content

nus-mtechse-dataontology/DataOntology

Repository files navigation

DataOntology

A Natural Language Query (NLQ) to SQL pipeline — ask questions in plain English, get answers from a flight database.

Run on Local using Docker

pre-requisites

  • aws installation is setup using the script given and configure AWS profile
  • Run RDS connect script to start the secure tunnel to connect to RDS

running the app

  • docker build --no-cache -t data-ontology-api .
  • docker run -p 9000:8000 -e GEMINI_API_KEY="" -e DB_PASSWORD="<REPLACE-WITH-DB-PASSWORD" data-ontology-api

Running tests

  • curl -X POST http://localhost:9000/query/query -H "Content-Type: application/json" -d '{"request_id":"test-1","question":"What is the cheapest return flight from SIN to BKK from 2026-06-01 to 2026-06-15?","start_date":"2026-06-01","end_date":"2026-06-15","origin":"SIN","destination":"BKK"}'

Run on local using uv

Prerequisites

  • Python 3.14 or newer
  • Git
  • uv (Python environment and package manager)

Setup

  1. Install uv if not already available:
python3.14 -m pip install --user uv
  1. Clone the repository:
git clone git@github.com:nus-mtechse-dataontology/DataOntology.git
  1. Set up the Python environment:
uv venv
uv pip install -e ".[dev]"
  1. Create a .env file in the repository root:
GEMINI_API_KEY=your_gemini_api_key_here
DB_PATH=resources/flights.db

Running the app

uv run python src/main.py

The server starts at http://127.0.0.1:8000

Querying the API

POST /query/query — Submit a natural language question:

curl -X POST http://localhost:8000/query/query \
  -H "Content-Type: application/json" \
  -d '{"request_id": "test-1", "question": "What is the cheapest return flight from SIN to BKK?"}'

GET /query/get_query — Health check:

curl http://localhost:8000/query/get_query

Running tests

uv run pytest

All tests should pass on a clean checkout (168 tests).

Environment Variables

Variable Required Default Description
GEMINI_API_KEY Yes API key for Google Gemini LLM
GEMINI_MODEL No gemini-3-flash-preview Gemini model name
DB_PATH No resources/flights.db Path to SQLite database
SEMANTIC_MODEL_PATH No src/ontology/semantic_layer.json Path to semantic model

Project Structure

src/
├── main.py                          # Application entry point
├── lifecycle_hooks/startup.py       # Dependency wiring (Orchestrator setup)
├── endpoints/routes/
│   ├── query/query_routes.py        # POST /query/query endpoint
│   └── telegram/telegram_routes.py  # Telegram webhook endpoint
├── orchestrator/
│   ├── orchestrator.py              # 7-stage NLQ pipeline runner
│   ├── response_builder.py          # Formats results for users
│   └── error_response_builder.py    # Standardised error responses
├── prompt_builder/                  # Builds LLM prompts from templates
├── llm_gateway/providers/           # LLM integrations (Gemini, OpenAI)
├── validators/
│   ├── syntactic/                   # Parses LLM JSON → QueryPlan
│   └── semantic/                    # Validates intent, params, formats
├── compiler/sql_compiler.py         # Compiles QueryPlan → parameterised SQL
├── execution/sql_executor.py        # Executes SQL against SQLite
├── ontology/                        # Semantic model files (JSON)
├── models/                          # Shared Pydantic data contracts
└── configurations/                  # App, admin, logger config

tests/
├── unit/                            # Component-level tests
└── integration/                     # Seam tests and full pipeline tests

Pipeline Flow

User Question → PromptBuilder → LLM Gateway (Gemini)
  → SyntacticValidator → SemanticValidator
  → SQLCompiler → SQLExecutor → ResponseBuilder → Answer

Development Workflow

  • Contracts are defined in models/
  • Follow contract-first and test-driven development
  • Write or update tests before implementing logic
  • Ensure all tests pass before opening a pull request

API Docs


Local E2E Testing (Telegram Bot)

One-time setup

  1. Create a Telegram bot via @BotFather and save the TELEGRAM_BOT_TOKEN
  2. Install ngrok and authenticate:
brew install ngrok
ngrok config add-authtoken <your-ngrok-authtoken>
  1. Set vault credentials for local Postgres:
echo -n "postgres" > vault/postgres.user
echo -n "postgres" > vault/postgres.password

Every session

  1. Start local Postgres:
docker run --name dataontology \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=data_ontology \
  -p 5432:5432 -d postgres

If the container already exists: docker start dataontology

  1. Load seed data:
psql -h localhost -U postgres -d data_ontology -f resources/seed_local.sql
  1. Start the server:
export GEMINI_API_KEY=<your-gemini-api-key>
export TELEGRAM_BOT_TOKEN=<your-telegram-bot-token>
uv run python src/main.py
  1. Start ngrok in a separate terminal:
ngrok http 8000

Copy the https://....ngrok-free.dev URL from the output.

  1. Register the webhook with Telegram (required each session — ngrok URL changes on restart):
curl -X POST "https://api.telegram.org/bot<TOKEN>/setWebhook" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://<ngrok-url>/telegram/webhook"}'

Test questions

Send these to your bot in Telegram to verify each intent, or run them automatically (see below):

Intent Question
Cheapest flight on route What is the cheapest flight from SIN to BKK between 1 June and 30 June 2025?
Destinations under budget Where can I fly from SIN for under 300 SGD between 1 June and 30 June 2025?
Destinations by country From SIN, which airports in Thailand can I fly to in June 2025?
All fare options Show me all fare options from SIN to BKK between 1 June and 30 June 2025
Airlines on route Which airlines fly from SIN to BKK in June 2025?
Last seat urgency Are there any almost-full flights from SIN to BKK in June 2025?

Expected results

  • Cheapest flight — 6 records, AirAsia Economy at SGD 89 cheapest
  • Under budget — 3 destinations (KUL, BKK, CNX), NRT excluded as it exceeds 300 SGD
  • Thailand airports — 2 airports (BKK, CNX)
  • Fare options — 6 records across Economy and Business
  • Airlines — 4 airlines (AirAsia, Malaysia Airlines, Thai Airways, Singapore Airlines)
  • Last seat urgency — 5 flights with ≤5 seats remaining

Running golden questions automatically

The same questions are codified as e2e tests and can be run directly against the orchestrator (no Telegram or ngrok needed):

uv run pytest -m e2e -v

Requires local Postgres running with seed data and GEMINI_API_KEY set. These tests are excluded from the default uv run pytest run and must be triggered deliberately.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors