A Natural Language Query (NLQ) to SQL pipeline — ask questions in plain English, get answers from a flight database.
- aws installation is setup using the script given and configure AWS profile
- Run RDS connect script to start the secure tunnel to connect to RDS
- docker build --no-cache -t data-ontology-api .
- docker run -p 9000:8000 -e GEMINI_API_KEY="" -e DB_PASSWORD="<REPLACE-WITH-DB-PASSWORD" data-ontology-api
- curl -X POST http://localhost:9000/query/query -H "Content-Type: application/json" -d '{"request_id":"test-1","question":"What is the cheapest return flight from SIN to BKK from 2026-06-01 to 2026-06-15?","start_date":"2026-06-01","end_date":"2026-06-15","origin":"SIN","destination":"BKK"}'
- Python 3.14 or newer
- Git
- uv (Python environment and package manager)
- Install uv if not already available:
python3.14 -m pip install --user uv- Clone the repository:
git clone git@github.com:nus-mtechse-dataontology/DataOntology.git- Set up the Python environment:
uv venv
uv pip install -e ".[dev]"- Create a
.envfile in the repository root:
GEMINI_API_KEY=your_gemini_api_key_here
DB_PATH=resources/flights.dbuv run python src/main.pyThe server starts at http://127.0.0.1:8000
POST /query/query — Submit a natural language question:
curl -X POST http://localhost:8000/query/query \
-H "Content-Type: application/json" \
-d '{"request_id": "test-1", "question": "What is the cheapest return flight from SIN to BKK?"}'GET /query/get_query — Health check:
curl http://localhost:8000/query/get_queryuv run pytestAll tests should pass on a clean checkout (168 tests).
| Variable | Required | Default | Description |
|---|---|---|---|
GEMINI_API_KEY |
Yes | — | API key for Google Gemini LLM |
GEMINI_MODEL |
No | gemini-3-flash-preview |
Gemini model name |
DB_PATH |
No | resources/flights.db |
Path to SQLite database |
SEMANTIC_MODEL_PATH |
No | src/ontology/semantic_layer.json |
Path to semantic model |
src/
├── main.py # Application entry point
├── lifecycle_hooks/startup.py # Dependency wiring (Orchestrator setup)
├── endpoints/routes/
│ ├── query/query_routes.py # POST /query/query endpoint
│ └── telegram/telegram_routes.py # Telegram webhook endpoint
├── orchestrator/
│ ├── orchestrator.py # 7-stage NLQ pipeline runner
│ ├── response_builder.py # Formats results for users
│ └── error_response_builder.py # Standardised error responses
├── prompt_builder/ # Builds LLM prompts from templates
├── llm_gateway/providers/ # LLM integrations (Gemini, OpenAI)
├── validators/
│ ├── syntactic/ # Parses LLM JSON → QueryPlan
│ └── semantic/ # Validates intent, params, formats
├── compiler/sql_compiler.py # Compiles QueryPlan → parameterised SQL
├── execution/sql_executor.py # Executes SQL against SQLite
├── ontology/ # Semantic model files (JSON)
├── models/ # Shared Pydantic data contracts
└── configurations/ # App, admin, logger config
tests/
├── unit/ # Component-level tests
└── integration/ # Seam tests and full pipeline tests
User Question → PromptBuilder → LLM Gateway (Gemini)
→ SyntacticValidator → SemanticValidator
→ SQLCompiler → SQLExecutor → ResponseBuilder → Answer
- Contracts are defined in
models/ - Follow contract-first and test-driven development
- Write or update tests before implementing logic
- Ensure all tests pass before opening a pull request
- Swagger UI: http://127.0.0.1:8000/docs
- ReDoc: http://127.0.0.1:8000/redoc/
- Health: http://127.0.0.1:8000/ontology/actuator/health/liveness
- Create a Telegram bot via @BotFather and save the
TELEGRAM_BOT_TOKEN - Install ngrok and authenticate:
brew install ngrok
ngrok config add-authtoken <your-ngrok-authtoken>- Set vault credentials for local Postgres:
echo -n "postgres" > vault/postgres.user
echo -n "postgres" > vault/postgres.password- Start local Postgres:
docker run --name dataontology \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=data_ontology \
-p 5432:5432 -d postgresIf the container already exists: docker start dataontology
- Load seed data:
psql -h localhost -U postgres -d data_ontology -f resources/seed_local.sql- Start the server:
export GEMINI_API_KEY=<your-gemini-api-key>
export TELEGRAM_BOT_TOKEN=<your-telegram-bot-token>
uv run python src/main.py- Start ngrok in a separate terminal:
ngrok http 8000Copy the https://....ngrok-free.dev URL from the output.
- Register the webhook with Telegram (required each session — ngrok URL changes on restart):
curl -X POST "https://api.telegram.org/bot<TOKEN>/setWebhook" \
-H "Content-Type: application/json" \
-d '{"url": "https://<ngrok-url>/telegram/webhook"}'Send these to your bot in Telegram to verify each intent, or run them automatically (see below):
| Intent | Question |
|---|---|
| Cheapest flight on route | What is the cheapest flight from SIN to BKK between 1 June and 30 June 2025? |
| Destinations under budget | Where can I fly from SIN for under 300 SGD between 1 June and 30 June 2025? |
| Destinations by country | From SIN, which airports in Thailand can I fly to in June 2025? |
| All fare options | Show me all fare options from SIN to BKK between 1 June and 30 June 2025 |
| Airlines on route | Which airlines fly from SIN to BKK in June 2025? |
| Last seat urgency | Are there any almost-full flights from SIN to BKK in June 2025? |
- Cheapest flight — 6 records, AirAsia Economy at SGD 89 cheapest
- Under budget — 3 destinations (KUL, BKK, CNX), NRT excluded as it exceeds 300 SGD
- Thailand airports — 2 airports (BKK, CNX)
- Fare options — 6 records across Economy and Business
- Airlines — 4 airlines (AirAsia, Malaysia Airlines, Thai Airways, Singapore Airlines)
- Last seat urgency — 5 flights with ≤5 seats remaining
The same questions are codified as e2e tests and can be run directly against the orchestrator (no Telegram or ngrok needed):
uv run pytest -m e2e -vRequires local Postgres running with seed data and GEMINI_API_KEY set. These tests are excluded from the default uv run pytest run and must be triggered deliberately.