- Introduction
- Key Features
- Architecture Overview
- Installation and Setup
- API Endpoints
- Directory Structure
Vortex is an integrated financial platform that combines reinforcement learning-based portfolio optimization, retrieval-augmented generation for financial research, and comprehensive KYC verification into a unified system. Built on Pathway's real-time data processing capabilities and leveraging modern LLM architectures, Vortex addresses critical challenges in automated portfolio management including risk-adaptive allocation, explainable AI decisions, and regulatory compliance through robust KYC workflows. The platform features a Next.js dashboard for user interaction, a Flask orchestration layer, and microservices for each specialized domain.
- Challenge: Traditional portfolio systems lack adaptability to individual risk profiles and market dynamics.
- Solution: PPO-based reinforcement learning with Heterogeneous Graph Attention Networks (HGAT) enables dynamic portfolio allocation that adapts to market conditions and user risk tolerance for
- Monthly Fine-tuning: Policy Transfer Regularization (PTR) enables continuous model improvement without catastrophic forgetting.
- Explainability: LangGraph-based agents generate human-readable explanations for portfolio decisions.
- Challenge: Generic retrieval systems fail to capture domain-specific nuances in financial documents.
- Solution: Pathway-powered vector store with real-time document ingestion and specialized retrieval tools for financial analysis.
- Multi-Source Retrieval: Aggregates data from news articles, SEC filings, and fundamental data sources.
- Ensemble Scoring: Combines sentiment analysis, technical indicators, and fundamental metrics for stock scoring.
- MCP Server: Model Context Protocol server enables agentic integrations with external tools.
- Challenge: Manual document verification is slow, error-prone, and lacks consistency.
- Solution: Automated document parsing using PaddleOCR with cross-document validation and ML-based risk scoring.
- Document Support: PAN Card, Aadhaar Card, and ITR documents with automatic field extraction.
- Video Verification: Selfie video analysis with face matching (Aadhaar-to-PAN, PAN-to-Video) and liveness detection.
- Risk Scoring: ML model evaluates investor risk based on questionnaire responses and financial data.
- Challenge: Batch processing introduces latency in portfolio updates and market data ingestion.
- Solution: Kafka-based streaming architecture enables real-time stock data ingestion and processing.
- Stock Data Producer: Streams OHLCV data from multiple sources to Kafka topics.
- Consumer Pipeline: Calculates display features (daily change, trend, volatility) and triggers monthly fine-tuning.
- User Stream: Processes user onboarding data for KYC workflows.
- Install Docker & Docker Compose:
sudo apt-get update
sudo apt-get install docker.io docker-compose
You can install Docker using the platform package manager (example commands below), then bring up the full stack with docker compose up --build.
sudo apt-get update
sudo apt-get install -y docker.io docker-compose-plugin
- Install Node.js 18+:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
nvm install 18
nvm use 18
- Install Python 3.10+:
sudo apt-get install python3.10 python3.10-venv python3-pip
- Create and activate virtual environment:
python3.11 -m venv venv
source venv/bin/activate- Install project dependencies for each module:
pip install -r requirements.txt
cd flask_server
pip install -r requirements.txt
cd ..
cd KYC
pip install -r KYC_requirements.txt
pip install -r VV_requirements.txt
cd ..
cd SmartFolio
pip install -r requirements.txt
cd ..
cd FinRAGFinal
pip install -r requirements.txt
cd ..
cd dashboard
npm i
cd ..- Copy required data files from provided zip:
The following folders contain large data files and pre-trained models that are not included in the repository. Extract them from the provided vortex_data.zip file.
# Extract the data zip file (provided separately)
unzip vortex_data.zipRequired folders to copy:
| Folder | Description | Destination |
|---|---|---|
finrag_tree/ |
RAPTOR tree indexes and vector stores | FinRAGFinal/finrag_tree/ |
dataset_default/ |
Training datasets and expert cache (~920MB) | SmartFolio/dataset_default/ |
checkpoints_risk*/ |
Pre-trained model checkpoints for each risk level | SmartFolio/checkpoints_risk*/ |
data/ |
Raw OHLCV data files | data/ |
After extraction, the folder structure should include:
Vortex/
├── data/
│ └── ohlcv_raw_1.csv
├── FinRAGFinal/
│ └── finrag_tree/
│ ├── chroma_vectorstore/
│ ├── pathway_vectorstore/
│ ├── tree.json
│ └── tree.pkl
└── SmartFolio/
├── checkpoints_risk01/
├── checkpoints_risk03/
├── checkpoints_risk05/
├── checkpoints_risk07/
├── checkpoints_risk09/
└── dataset_default/
├── corr/
├── custom/
├── daily_stock_custom/
├── data_train_predict_custom/
├── expert_cache/
└── index_data/
git clone https://github.com/aupc2061/Vortex.git
cd VortexCreate a .env file in the root directory with the following parameters:
# OpenAI API Key
OPENAI_API_KEY=your_openai_api_key
# Pathway License Keys
PW_LKEY=your_pathway_license_key
PW_LIKEY=your_pathway_license_key
# Encryption Key (for sensitive data)
ENCRYPTION_KEY=your_encryption_key
# MongoDB Configuration
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/?appName=AppName
MONGODB_DB_NAME=kyc_app
# Flask Configuration
FLASK_ENV=development
FLASK_HOST=0.0.0.0
FLASK_PORT=8000
FLASK_DEBUG=true
# Security
SECRET_KEY=your_secret_key
# Langfuse (Observability)
LANGFUSE_ENV=default
LANGFUSE_PUBLIC_KEY=your_langfuse_public_key
LANGFUSE_SECRET_KEY=your_langfuse_secret_key
# Service Ports
COMBINED_API_PORT=8000
PAYLOAD_API_PORT=8001
KYCV_MCP_PORT=8123
RISK_MCP_PORT=8124
# Cloudinary Configuration (for document storage)
CLOUDINARY_CLOUD_NAME=your_cloud_name
CLOUDINARY_API_KEY=your_api_key
CLOUDINARY_API_SECRET=your_api_secret
# Portfolio API URL
PORTFOLIO_API_URL=http://localhost:8080
This is the recommended method for running the complete platform.
docker-compose up --buildServices Started:
| Service | Port | Description |
|---|---|---|
| Dashboard | 3000 | Next.js Frontend |
| Flask API | 8000 | Main Backend Orchestration |
| FinRAG API | 8002 | RAG Service |
| SmartFolio API | 8080 | Portfolio Service |
| KYC Combined | 8004 | KYC Verification |
| KYC Payload | 8001 | Payload Storage |
| KYC Admin | 8080 | Admin Operations |
Access the Application:
- Dashboard: http://localhost:3000
- API Documentation: http://localhost:8000/docs
Docker Commands:
docker compose up --buildImportant: Always activate the virtual environment in every terminal before running any commands:
source venv/bin/activatecd dashboard
npm i
npm run devThe dashboard will be available at http://localhost:3000
cd flask_server
python3.11 app.pyThe API server will be available at http://localhost:8000
set -a; source keys.env; set +a
cd KYC
python3.11 MCP_Server_KYCV.py --host 0.0.0.0 --port 8123set -a; source keys.env; set +a
cd KYC
python3.11 MCP_Server_RiskScore.py --host 0.0.0.0 --port 8124cd SmartFolio
python -m uvicorn api.server:app --host 0.0.0.0 --port 8080cd SmartFolio
python3.11 start_mcp.pycd FinRAGFinal
python3.11 mcp_api.py --host 0.0.0.0 --port 8002cd FinRAGFinal/Streaming
python3.11 gdrive_connector.pyFirst, start the Kafka services:
cd streaming/docker
docker-compose -f docker-compose.kafka.yml up -dThen run the producers and consumers:
# From project root
python -m streaming.producer.stock_producer
python -m streaming.consumer.stock_consumerPM_Agent/
├── dashboard/ # Next.js Frontend
│ ├── app/
│ │ ├── (auth)/ # Authentication Pages
│ │ │ ├── login/
│ │ │ ├── register/
│ │ │ └── forgot-password/
│ │ ├── consumer/ # Consumer Dashboard
│ │ │ ├── dashboard/
│ │ │ └── questionnaire/
│ │ ├── company/ # Admin Dashboard
│ │ │ ├── dashboard/
│ │ │ └── reviews/
│ │ └── admin/ # Admin Panel
│ ├── components/
│ │ ├── ui/ # Reusable UI Components
│ │ └── layout/ # Layout Components
│ └── lib/
│ ├── api.ts # API Client
│ └── utils.ts # Utility Functions
│
├── flask_server/ # Main Backend API
│ ├── app.py # Flask Application
│ ├── config.py # Configuration
│ ├── models.py # Data Models
│ ├── orchestration_helper.py # Portfolio Orchestration
│ ├── payload_store_mongo.py # MongoDB Payload Storage
│ ├── alerts_store_mongo.py # Alert Storage
│ ├── verification.py # KYC Verification Logic
│ ├── setup_mongodb_indexes.py # Database Indexes
│ ├── requirements.txt
│ └── Dockerfile
│
├── SmartFolio/ # Portfolio Management (Git Submodule)
│ ├── api/
│ │ └── server.py # FastAPI Endpoints
│ ├── model/ # PPO Policy Networks
│ ├── trainer/
│ │ ├── irl_trainer.py # IRL Training
│ │ └── ptr_ppo.py # PTR-PPO Implementation
│ ├── explainibility_agents/ # XAI & Trading Agents
│ │ ├── mcp/ # MCP Server
│ │ └── tradingagents/ # Trading Agent Logic
│ ├── gen_data/ # Dataset Builders
│ │ ├── build_dataset_yf.py
│ │ └── update_monthly_dataset.py
│ ├── checkpoints_risk01/ # Risk 0.1 Model
│ ├── checkpoints_risk03/ # Risk 0.3 Model
│ ├── checkpoints_risk05/ # Risk 0.5 Model
│ ├── checkpoints_risk07/ # Risk 0.7 Model
│ ├── checkpoints_risk09/ # Risk 0.9 Model
│ ├── main.py # Training Entry Point
│ ├── Dockerfile
│ └── Dockerfile.mcp
│
├── FinRAGFinal/ # Financial RAG (Git Submodule)
│ ├── src/
│ │ └── finrag/
│ │ ├── vectorstore/ # Pathway Vector Store
│ │ ├── retrieval/ # Multi-Source Retrieval
│ │ ├── scoring/ # Ensemble Stock Scoring
│ │ └── orchestrator/ # Query Orchestration
│ ├── api.py # FastAPI Endpoints
│ ├── mcp_server.py # MCP Server
│ ├── mcp_api.py # MCP API
│ ├── requirements.txt
│ ├── Dockerfile
│ └── Dockerfile.mcp
│
├── KYC/ # KYC & Risk Scoring
│ ├── combined_api.py # Document Verification API
│ ├── payload_api.py # Payload Storage API
│ ├── admin_api.py # Admin Operations API
│ ├── investor_risk_scorer.py # ML Risk Scoring
│ ├── kyc_mcp_server.py # MCP Server
│ ├── kyc_alerts.py # Alert Generation
│ ├── kyc_master_store.py # Master JSON Storage
│ ├── encryption_utils.py # Data Encryption
│ ├── Dockerfile
│ └── ENDPOINT_FLOW_EXPLANATION.md
│
├── streaming/ # Real-time Data Pipeline
│ ├── config.py # Configuration
│ ├── run_all.py # Main Entry Point
│ ├── producer/
│ │ ├── stock_producer.py # Stock OHLCV Producer
│ │ ├── user_producer.py # User Data Producer
│ │ └── run_producers.py # Producer Manager
│ ├── consumer/
│ │ ├── stock_consumer.py # Stock Data Consumer
│ │ ├── user_consumer.py # User Data Consumer
│ │ └── run_consumers.py # Consumer Manager
│ ├── shared/
│ │ ├── locks.py # Thread-safe Locks
│ │ ├── state.py # Shared State
│ │ └── utils.py # Utilities
│ └── docker/
│ └── docker-compose.kafka.yml
│
├── docker-compose.yml # Full Stack Deployment
├── .env # Environment Variables
├── .gitmodules # Submodule Configuration
└── README.md # This File