Interpretable ML + Modern React — built for clinicians and patients alike
A full-stack clinical decision support system that surfaces early diabetes risk signals from routine patient data.
Combines an interpretable ML model with a modern React frontend, presenting results tailored for both clinicians and patients.
Warning
Medical Disclaimer — This system is intended for educational and research purposes only. It does not provide medical diagnoses and should not be used as a substitute for professional medical advice.
- Why Clinical Insight Engine?
- Key Features
- Architecture
- Tech Stack
- Getting Started
- Project Structure
- API Reference
- ML Pipeline
- Single-Patient Prediction
- Environment Variables
- Troubleshooting
- Roadmap
- Contributing
- Contributors
Diabetes affects over 500 million adults worldwide, yet early risk signals are often buried in routine clinical data. Clinical Insight Engine bridges that gap:
| Problem | Our Approach |
|---|---|
| Risk models are opaque black boxes | Interpretable Logistic Regression with per-feature impact scores |
| Results are one-size-fits-all | Dual-view output — detailed for clinicians, simplified for patients |
| Predictions lack context | Confidence-aware assessments with actionable follow-up recommendations |
| Patient data sits in silos | Longitudinal tracking with full assessment history |
Collects clinically relevant inputs:
Age · Gender · Hypertension · Heart Disease · Smoking History · BMI · HbA1c · Blood Glucose
|
🩻 Clinician View
|
🧑⚕️ Patient View
|
- Stores assessments with full timestamps
- Enables longitudinal patient risk tracking over time
- Interactive bar charts for factor contributions
- Diabetes correlation heatmap for data exploration
graph TB
subgraph Client["🖥️ Client — React + TypeScript"]
UI["Risk Assessment Form"]
CV["Clinician View"]
PV["Patient View"]
VIZ["Data Visualizations"]
HIST["Assessment History"]
end
subgraph Server["⚙️ Server — Express.js"]
API["REST API Routes"]
VAL["Zod Validation"]
ORM["Drizzle ORM"]
PY["Python Bridge"]
end
subgraph ML["🧠 ML Pipeline — Python"]
PROC["Data Preprocessing"]
MODEL["Logistic Regression"]
INTERP["Feature Interpretation"]
CACHE["Model Cache (pickle)"]
end
subgraph DB["🗄️ PostgreSQL"]
ASSESS["Assessments Table"]
end
Client -->|"HTTP Requests"| API
API --> VAL --> ORM
API --> PY -->|"spawn process"| ML
ORM --> DB
ML -->|"risk scores + factors"| PY
CACHE -.->|"load cached model"| MODEL
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 18 + TypeScript | UI framework with type safety |
| Vite | Lightning-fast dev server & bundler | |
| Tailwind CSS | Utility-first styling with dark mode | |
| TanStack Query | Server state & cache management | |
| React Hook Form + Zod | Form handling with schema validation | |
| Recharts | Interactive data visualizations | |
| Framer Motion | Smooth UI animations | |
| Backend | Express.js | REST API server |
| Drizzle ORM | Type-safe database queries | |
| PostgreSQL 14+ | Relational data storage | |
| Zod | Runtime schema validation | |
| ML Pipeline | Python 3.10+ | ML runtime environment |
| scikit-learn | Logistic Regression model | |
| pandas / NumPy | Data manipulation & preprocessing | |
| pickle | Model & scaler caching |
| Tool | Version | Check | Download |
|---|---|---|---|
| Node.js | 18+ LTS | node -v |
nodejs.org |
| npm | 9+ | npm -v |
bundled with Node |
| Python | 3.10+ | python3 --version |
python.org |
| PostgreSQL | 14+ | psql --version |
postgresql.org |
| Git | Any | git --version |
git-scm.com |
| Docker | 20+ | docker --version |
docker.com |
| Docker Compose | 2+ | docker compose version |
bundled with Docker |
If you have Docker installed, you can skip the manual installation of Node.js, Python, and PostgreSQL entirely. Running the application requires just a single command.
Simply run the following command in the project root:
docker compose upThis command will:
- Spin up a PostgreSQL 16 database container with persistent storage.
- Build the app container including Node.js 20 and a Python 3 virtual environment with all scikit-learn/pandas dependencies.
- Wait for the database to be healthy, then run migrations (
npm run db:push). - Automatically seed the database with sample clinical assessments (in development mode).
- Launch the full-stack server with live-reloading (HMR) enabled.
Once started, open your browser and navigate to:
- Web App & REST API: http://localhost:3000
To stop the services while preserving your data:
docker compose downTo stop the services and completely reset the database (deleting persistent volumes):
docker compose down -vIf you update package.json or requirements.txt dependencies, trigger a clean rebuild:
docker compose up --buildgit clone https://github.com/gopaljilab/Clinical-Insight-Engine.git
cd Clinical-Insight-Engine
npm installLinux / macOS
cp .env.example .envWindows (PowerShell)
Copy-Item .env.example .envWindows (Command Prompt)
copy .env.example .envIf .env.example doesn't exist, create .env manually and add:
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/clinical_insight_engine🧪 Developer Authentication Setup (optional)
For local frontend authentication testing, create a .env.local file (git-ignored):
NODE_ENV=development
NEXT_PUBLIC_APP_URL=http://localhost:3000
DEV_CLINICIAN_EMAIL=developer@cardioguard.local
DEV_CLINICIAN_PASSWORD=DevSecurePassword123!
NEXT_PUBLIC_LOCAL_ENCRYPTION_KEY=your_local_32_character_secret_key_hereRules of thumb:
🔒 .env→ database & server secrets only🔒 .env.local→ local seeded credentials only (never commit)- Restart the dev server after editing
.env.localso Vite reloads variables- Never paste demo credentials into UI, docs, screenshots, or PRs
- Start the app with
npm run dev - Open
http://localhost:5173 - Click Login or Go to App
- Enter your
.env.localseeded credentials - Complete the simulated OTP step
- You'll be redirected to
/dashboard
In development mode, the login form shows a small amber notice reminding you to use local seeded credentials. This banner and the
DEV_*variables are never exposed in production builds.
🐧 Linux (Ubuntu / Debian)
# Install PostgreSQL
sudo apt update && sudo apt install postgresql postgresql-contrib
# Start & enable the service
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Create database & set password
sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'postgres';"
sudo -u postgres psql -c "CREATE DATABASE clinical_insight_engine;"🍎 macOS (Homebrew)
# Install PostgreSQL
brew install postgresql
# Start the service
brew services start postgresql
# Create database & set password
psql postgres -c "ALTER USER postgres WITH PASSWORD 'postgres';"
psql postgres -c "CREATE DATABASE clinical_insight_engine;"🪟 Windows
- Download and install PostgreSQL from postgresql.org/download/windows
- During installation, use:
- Username:
postgres - Password:
postgres - Port:
5432
- Username:
- Create a database named
clinical_insight_engineusing pgAdmin or the PostgreSQL CLI. - Update your
.envfile:
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/clinical_insight_enginePush the database schema:
npm run db:pushThe server runs a PostgreSQL preflight check on startup. If you see
Database startup check failed, verify that:
- PostgreSQL service is running
DATABASE_URLin.envis correct- The migration above has been run
- Port
5432is not blocked
🐧 Linux / 🍎 macOS
# Create virtual environment
python3 -m venv .venv
# Activate
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt🪟 Windows (PowerShell)
# Create virtual environment
py -m venv .venv
# Activate
.\.venv\Scripts\Activate.ps1
# Install dependencies
pip install -r requirements.txtIf the dataset already exists in the project:
# Linux / macOS
cp attached_assets/diabetes_dataset.csv ./diabetes_dataset.csv
# Windows (PowerShell)
Copy-Item attached_assets/diabetes_dataset.csv ./diabetes_dataset.csvIf the dataset is missing, generate synthetic data:
# Linux / macOS
python3 -c "from analyze import create_synthetic_data; create_synthetic_data()"
# Windows
py -c "from analyze import create_synthetic_data; create_synthetic_data()"# Start the full-stack dev server
npm run dev| Service | URL |
|---|---|
| Frontend | http://localhost:5173 |
| Backend API | http://localhost:3000 |
Stop the dev server:
Ctrl + C
Deactivate the Python environment:
deactivateClinical-Insight-Engine/
│
├── client/ # React frontend
│ └── src/
│ ├── components/ # Reusable UI components
│ ├── pages/ # Route-level page components
│ ├── hooks/ # Custom React hooks
│ │ ├── use-assessments.ts # TanStack Query hooks for API calls
│ │ └── use-toast.ts # Toast notification state
│ ├── lib/ # Utilities & API client
│ │ ├── queryClient.ts # Global fetch config + React Query setup
│ │ └── utils.ts # cn() Tailwind class merge utility
│ └── utils/
│ ├── search_filters.ts # Patient search & filter logic
│ └── date_fix.ts # Safe date parser helper
│
├── server/ # Express.js backend
│ ├── index.ts # Server entry point & startup
│ ├── routes.ts # API route definitions
│ ├── storage.ts # Data access layer (DB queries)
│ ├── db.ts # Drizzle ORM + PostgreSQL pool
│ ├── static.ts # Serves built React frontend
│ ├── vite.ts # Vite dev server integration (HMR)
│ └── db_fix.ts # Clean process exit on DB errors
│
├── shared/ # Shared between client & server
│ ├── schema.ts # Drizzle DB schema + Zod types
│ └── routes.ts # Shared API request/response schemas
│
├── script/
│ └── build.ts # esbuild + Vite production build script
│
├── attached_assets/ # Static assets (dataset, images)
│ └── diabetes_dataset.csv
│
├── analyze.py # ML pipeline — training & inference
├── main.py # Python entry point
├── diabetes_dataset.csv # Training dataset (root copy)
├── correlation_heatmap.png # Diabetes feature correlation heatmap
├── patient.json # Sample patient input for CLI prediction
│
├── drizzle.config.ts # Drizzle ORM configuration
├── vite.config.ts # Vite bundler configuration
├── tailwind.config.ts # Tailwind CSS configuration
├── tsconfig.json # TypeScript configuration
├── postcss.config.js # PostCSS configuration
├── components.json # shadcn/ui component registry
├── pyproject.toml # Python project metadata
├── requirements.txt # Python dependencies
├── package.json # Node.js dependencies & scripts
├── package-lock.json # Locked dependency versions
├── uv.lock # uv Python lock file
│
├── README.md # Project documentation
├── ANALYSIS_README.md # ML analysis documentation
├── CONTRIBUTING.md # Contribution guidelines
└── CODE_OF_CONDUCT.md # Community code of conduct
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Application health check endpoint for monitoring |
POST |
/api/assessments |
Submit a new risk assessment |
GET |
/api/assessments |
Retrieve assessment history |
GET |
/api/assessments/:id |
Get a specific assessment by ID |
POST |
/api/ingest/fhir |
Ingest a FHIR R4 JSON bundle |
# Health Check
curl -X GET http://localhost:3000/health
# Submit Assessment
curl -X POST http://localhost:3000/api/assessments \
-H "Content-Type: application/json" \
-d '{
"gender": "Female",
"age": 52,
"hypertension": true,
"heartDisease": false,
"smokingHistory": "former",
"bmi": 30.1,
"hba1cLevel": 6.4,
"bloodGlucoseLevel": 148
}'Allows submitting standard FHIR R4 JSON bundles containing patient demographic details, clinical vitals/lab values, and clinical notes.
- Patient: Extracts
id,name,gender(mapped toMale/Female), and calculates patientagefrombirthDate. - Observation: Extracts clinical values such as
BMI,HbA1c,Blood Glucose, and flagshypertensionandheartDiseaseusing LOINC codes and display terms. - DocumentReference: Extracts note titles, descriptions, and decoded base64 attachments, merging them into a unified clinical note transcript.
To ensure clinical decisions are traceable and verifiable, the pipeline extracts source citations for key clinical features. When note text is found in DocumentReference entries, the parser:
- Performs regex/vitals and keyword scanning for Hypertension (e.g. BP measurements like
145/90or keywords likehypertension), Heart Disease (e.g.CAD,myocardial infarction), and Smoking History (e.g.former smoker,never smoked). - Extracts the exact sentence snippet enclosing the evidence (
source_snippet). - Computes the zero-indexed character bounds
[start, end]within the raw concatenated text (source_index). - If no evidence is found, these values are returned as
null.
A successful FHIR ingestion response returns the extracted clinical note and explainable insights:
{
"status": "success",
"id": 42,
"clinical_note": "Routine visit. BP reading 145/95 noted. Quit smoking last year.",
"explainable_insights": [
{
"insight": "Patient shows signs of hypertension",
"source_snippet": "BP reading 145/95 noted",
"source_index": [15, 38]
},
{
"insight": "Patient shows signs of heart disease",
"source_snippet": null,
"source_index": null
},
{
"insight": "Patient has a smoking history (former)",
"source_snippet": "Quit smoking last year",
"source_index": [40, 62]
}
]
}On the Clinician View tab of the results page, the clinical note is rendered in an interactive viewer:
- Interactive Highlights: Clicking any cited insight automatically highlights the matching text in the note.
- Auto-Scroll: The highlighted source text is scrolled smoothly into view.
- Keyboard Navigation:
- Use Arrow Down / Arrow Right to move to the next cited insight.
- Use Arrow Up / Arrow Left to move to the previous cited insight.
- Press Escape to clear the selection and highlight.
curl -X POST http://localhost:3000/api/ingest/fhir \
-H "Content-Type: application/json" \
-d '{
"resourceType": "Bundle",
"type": "collection",
"entry": [
{
"resource": {
"resourceType": "Patient",
"id": "pat-123",
"name": [
{
"use": "official",
"given": ["John", "Edward"],
"family": "Smith"
}
],
"gender": "male",
"birthDate": "1980-01-01"
}
},
{
"resource": {
"resourceType": "Observation",
"code": {
"coding": [
{
"system": "http://loinc.org",
"code": "39156-5",
"display": "Body Mass Index"
}
]
},
"valueQuantity": {
"value": 24.5,
"unit": "kg/m2"
}
}
}
]
}'The machine learning pipeline (analyze.py) implements an interpretable risk assessment model:
graph LR
A["📂 Raw Data"] --> B["🧹 Cleaning & Validation"]
B --> C["⚙️ Feature Engineering"]
C --> D["📏 StandardScaler"]
D --> E["📊 Logistic Regression"]
E --> F["🎯 Risk Score 0–100%"]
E --> G["📋 Feature Importance"]
F --> H["💾 Cached Model"]
G --> H
| Step | Details |
|---|---|
| Data Cleaning | Filters unrealistic values (BMI < 10, glucose < 50, HbA1c < 3) and replaces with medians |
| Encoding | Gender → binary; Smoking history → one-hot encoding |
| Scaling | StandardScaler on age, BMI, HbA1c, blood glucose |
| Model | LogisticRegression with balanced class weights |
To prevent ingestion, extraction, NLP, and prediction pipelines from crashing or truncating notes when encountering legacy character sets or invalid sequences, a robust text sanitization layer is integrated at all boundaries (dataset imports, API payloads, CLI inputs, and daemon loops).
Clinical records are typically aggregated from disparate Electronic Health Records (EHR) systems, legacy laboratory reports, and clinician templates. These exports often use legacy encodings (e.g., Windows CP1252, ISO-8859-1) or copy-pasted smart quotes/dashes from word processors. If these raw streams are processed directly by modern UTF-8 parsers without sanitization, they raise UnicodeDecodeError exceptions, crash the daemon, or silently truncate vital note data.
- Safe Byte Decoding: Gracefully decodes byte streams using UTF-8. If malformed sequences are encountered, they are logged as warnings and replaced rather than throwing fatal exceptions. Fallbacks to CP1252/Latin-1 are triggered dynamically if needed.
- Unicode Normalization: Normalizes all characters to standard Unicode Normalization Form KC (NFKC).
- Null Bytes Removal: Strips null bytes (
\x00) to prevent C-level string truncation bugs in downstream tools. - Control Characters Cleanup: Discards non-printable control characters (Unicode category
CcandCf) while fully preserving formatting whitespaces (\t,\n,\r). - Smart Quote & Dash Normalization: Converts curly quotes (
“,”,‘,’) and typographic dashes (–,—) to standard ASCII equivalents. - Unusual Whitespace Normalization: Normalizes zero-width spaces (
\u200b), non-breaking spaces (\xa0), and other Unicode spaces into standard ASCII spaces or empty strings. - Medical Symbol Preservation: Fully preserves essential medical symbols like degrees (
°), micro/mu (μ), plus-minus (±), and percentages (%) to maintain data integrity.
# Linux/macOS
python3 analyze.py
# Windows
py analyze.pyCreate a patient JSON file:
{
"gender": "Female",
"age": 52,
"hypertension": true,
"heartDisease": false,
"smokingHistory": "former",
"bmi": 30.1,
"hba1cLevel": 6.4,
"bloodGlucoseLevel": 148
}Run prediction:
# Linux/macOS
python3 analyze.py predict_file patient.json
# Windows
py analyze.py predict_file patient.json| Variable | File | Description |
|---|---|---|
DATABASE_URL |
.env |
PostgreSQL connection string |
NODE_ENV |
.env.local |
Set to development for local dev features |
SESSION_SECRET |
.env |
Required in production for signed Express sessions |
DEV_CLINICIAN_EMAIL |
.env.local |
Seeded clinician email (dev only) |
DEV_CLINICIAN_PASSWORD |
.env.local |
Seeded clinician password (dev only) |
NEXT_PUBLIC_LOCAL_ENCRYPTION_KEY |
.env.local |
Local encryption key (dev only) |
ENABLE_PHI_REDACTION |
.env |
Enable privacy-preserving PHI redaction (defaults to true) |
Security:
.env.localis git-ignored and should never be committed. Production builds do not expose dev credentials.
Request limits: JSON and URL-encoded API payloads are limited to
256kbby default. Add route-specific upload handling before increasing this global limit. Production sessions: When the app runs behind a TLS-terminating reverse proxy or load balancer, Express trusts one proxy hop in production so secure session cookies are issued fromX-Forwarded-Proto: httpsrequests.
"PostgreSQL is unreachable"
- Verify PostgreSQL is running:
sudo systemctl status postgresql(Linux) orbrew services list(macOS) - Confirm
DATABASE_URLin.envmatches your local credentials - Ensure port
5432is not blocked by another process - Check that the
clinical_insight_enginedatabase exists
"Database startup check failed"
- Run
npm run db:pushto create/update the required tables - Verify your
.envfile is in the project root (not insideserver/orclient/)
Python model errors
- Ensure the virtual environment is activated:
source .venv/bin/activate - Verify dependencies:
pip install -r requirements.txt - If
diabetes_dataset.csvis missing, copy it:cp attached_assets/diabetes_dataset.csv ./ - Or generate synthetic data:
python3 -c "from analyze import create_synthetic_data; create_synthetic_data()"
Port conflicts
- The dev server defaults to port 5173 (Vite)
- If occupied, Vite will automatically pick the next available port
- Check for processes:
lsof -i :5173(Linux/macOS) ornetstat -ano | findstr :5173(Windows)
- 📈 Longitudinal patient risk tracking across visits
- 💡 Counterfactual reasoning — "What single change reduces risk most?"
- 🔬 Cohort discovery and population-level insights
- 🏥 Integration with Electronic Health Records (EHR)
- ⚖️ Advanced bias detection and ML fairness metrics
- ☁️ Cloud deployment (Vercel / Render)
We love contributions! Whether it's a bug fix, a new feature, or improved docs — every PR makes a difference.
- Fork the repository
- Create your feature branch (
git checkout -b feat/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feat/amazing-feature) - Open a Pull Request
Please read our Contributing Guide and Code of Conduct before submitting.
Gopal Gupta Computer Science Engineer · Full-Stack Developer · Data Science & ML Enthusiast
Built with ❤️ for better preventive healthcare
⭐ Star this repo if you find it useful — it helps others discover the project!
- All schema changes must go through drizzle-kit generate.
- Improve heading hierarchy for better readability
- Ensure consistent spacing between sections
- Use proper Markdown formatting for code blocks and lists
- Align all installation and usage steps properly
- Introduction
- Features
- Tech Stack
- Installation
- Usage
- Project Structure
- Contribution Guidelines
- License
- Add badges (optional): build, license, contributors
- Add screenshots for better UI understanding
- Standardize code blocks for commands
Improve onboarding experience for new contributors and users by making README more structured, readable, and professional.
