A professional voice assistant system powered by Google's Gemini 2.5 Flash model, featuring both standalone voice interaction and real-time telephony integration with Asterisk ARI and Gemini Live API.
- 🔄 Migrated from OpenAI to Gemini 2.5 Flash: More efficient and cost-effective AI responses
- 🎆 NEW: Real-time Gemini Live API Integration: Direct voice-to-voice conversation with ultra-low latency
- 📡 NEW: Asterisk ARI with externalMedia: Bidirectional audio streaming for telephony integration
- 🎤 NEW: Voice Activity Detection: Intelligent interruption handling for natural conversations
- 🔊 NEW: slin16 Audio Format: Optimized for Asterisk with 16-bit signed linear PCM at 16kHz
- 🏢 Professional Architecture: Complete restructure with modular design
- 📦 Package Structure: Proper Python package with clear separation of concerns
- 🔧 Enhanced Configuration: Pydantic-based settings management
- 📈 Better Logging: Comprehensive logging and error handling
- 🧪 Test Coverage: Unit tests and testing framework
- 📚 Documentation: Complete documentation and setup guides
# 1. Activate virtual environment
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/Mac
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure API key
cp .env.example .env
# Edit .env and add your Google API key
# 4. Run the voice assistant
python src/main.pyvoice_assistant_ari_llm/
├── src/ # 🎯 Source code
│ ├── voice_assistant/ # 📦 Main package
│ │ ├── core/ # 🧠 Core assistant logic
│ │ │ ├── assistant.py # Main VoiceAssistant class
│ │ │ └── conversation.py # Conversation management
│ │ ├── ai/ # 🤖 AI integration
│ │ │ ├── gemini_client.py # Gemini 2.5 Flash client
│ │ │ └── prompts.py # System prompts
│ │ ├── audio/ # 🎵 Audio processing
│ │ │ ├── speech_recognition.py # Speech-to-text
│ │ │ ├── text_to_speech.py # Text-to-speech
│ │ │ └── audio_utils.py # Audio utilities
│ │ ├── telephony/ # 📞 Telephony integration
│ │ │ ├── ari_handler.py # Asterisk ARI handler
│ │ │ └── call_manager.py # Call management
│ │ └── utils/ # 🛠️ Utilities
│ │ ├── logger.py # Logging configuration
│ │ └── exceptions.py # Custom exceptions
│ └── main.py # 🚀 Main entry point
├── config/ # ⚙️ Configuration
│ ├── settings.py # Pydantic settings
│ └── environment.py # Environment management
├── tests/ # 🧪 Test suite
│ ├── test_ai/ # AI component tests
│ ├── test_audio/ # Audio component tests
│ └── test_core/ # Core logic tests
├── docs/ # 📚 Documentation
│ ├── README.md # Detailed documentation
│ ├── API.md # API reference
│ └── SETUP.md # Setup instructions
├── scripts/ # 📜 Utility scripts
│ ├── run_assistant.py # Simple run script
│ └── setup.py # Setup utilities
├── asterisk-config/ # 📞 Asterisk configuration
├── sounds/ # 🔊 Audio files
├── requirements.txt # 📋 Dependencies
├── .env.example # 📝 Environment template
└── README.md # 📖 This file
- Latest Model: Uses Google's Gemini 2.5 Flash for intelligent responses
- Cost Efficient: More affordable than previous OpenAI integration
- Fast Responses: Optimized for real-time conversation
- Fallback System: Graceful handling of API failures
- Speech Recognition: Google Speech Recognition for accurate voice input
- Text-to-Speech: Google TTS with standard voice for clear output
- Audio Utils: Comprehensive audio processing utilities
- Real-time Processing: Low-latency audio handling
- Asterisk ARI: Full integration with Asterisk PBX
- Call Management: Handle incoming/outgoing calls
- Real-time Audio: Process phone conversations in real-time
- Multi-channel: Support multiple concurrent calls
- Modular Design: Clean separation of concerns
- Type Safety: Full type hints throughout
- Error Handling: Comprehensive exception management
- Logging: Structured logging with configurable levels
- Testing: Unit tests and testing framework
- Python 3.8+
- Google API key (free tier available)
- Microphone and speakers
- (Optional) Asterisk PBX for telephony
-
Environment Setup:
# Ensure virtual environment is active .venv\Scripts\activate # Verify Python version python --version # Should be 3.8+
-
Install Dependencies:
pip install -r requirements.txt
-
Get Google API Key:
- Visit Google AI Studio
- Sign in and create a new API key
- Copy the key for configuration
-
Configure Environment:
cp .env.example .env # Edit .env and set GOOGLE_API_KEY=your-key-here -
Test Installation:
python src/main.py pytest -v [for all testcases]
The flagship feature - real-time conversational AI through phone calls:
# Quick start with real-time integration
./start_realtime.sh
# Or manually
python src/run_realtime_server.pyReal-time Features:
- 📡 Bidirectional Audio Streaming: Direct WebSocket audio with Asterisk externalMedia
- 🎤 Voice Activity Detection: Intelligent speech start/stop detection
- ⚡ Ultra-low Latency: Direct Gemini Live API integration
- 🔄 Interruption Handling: Natural conversation flow with mid-response interruptions
- 🔊 slin16 Format: Optimized 16-bit signed linear PCM at 16kHz
- 📈 Session Management: Complete conversation state tracking
Test Extensions:
- 1000: Main Gemini Voice Assistant (full real-time integration)
- 1001: External Media Test (direct WebSocket audio)
- 1002: Basic Audio Test (echo and playback)
python src/main.pyFeatures:
- 🎤 Voice input with timeout handling
- 🧠 AI processing with Gemini 2.5 Flash
- 🗣️ Speech output with Google TTS
- 📊 Real-time status updates
- 📈 Session statistics
- Normal conversation: Speak naturally after "🎤 Listening"
- Exit: Say "quit", "exit", "goodbye", or "bye"
- Force quit: Press Ctrl+C
For basic phone-based interactions:
# Start basic ARI handler
uvicorn src.voice_assistant.telephony.ari_handler:create_ari_app --host 0.0.0.0 --port 8000
# Configure Asterisk to send events to your handler
# See asterisk-config/ for configuration examplesFor the real-time Gemini Live API integration:
# Run automated setup
python scripts/setup_realtime.py
# This will:
# - Check environment requirements
# - Validate configuration
# - Create required directories
# - Test connections
# - Generate startup scriptsQuick Setup:
- Copy
.env.exampleto.env - Set your
GOOGLE_API_KEY - Configure Asterisk (copy
asterisk-config/*) - Run
./start_realtime.sh
📚 Detailed Setup Guide: See docs/REALTIME_SETUP.md
# Required
GOOGLE_API_KEY=your-google-api-key-here
# AI Settings
GEMINI_MODEL=gemini-2.5-flash
GEMINI_LIVE_MODEL=gemini-2.0-flash-exp
GEMINI_VOICE=Puck
MAX_TOKENS=150
TEMPERATURE=0.7
# Real-time Audio Settings
AUDIO_FORMAT=slin16
AUDIO_SAMPLE_RATE=16000
AUDIO_CHUNK_SIZE=320
AUDIO_BUFFER_SIZE=1600
# Voice Activity Detection
VAD_ENERGY_THRESHOLD=300
VAD_SILENCE_THRESHOLD=0.5
VAD_SPEECH_THRESHOLD=0.1
# Assistant Settings
ASSISTANT_NAME=ARI
VOICE_LANGUAGE=en
LISTEN_TIMEOUT=20.0
PHRASE_TIME_LIMIT=15.0
# Audio Settings
VOICE_VOLUME=0.9
# Logging
LOG_LEVEL=INFO
# LOG_FILE=logs/assistant.log # Optional file logging
# Asterisk ARI Configuration
ARI_BASE_URL=http://localhost:8088/ari
ARI_USERNAME=asterisk
ARI_PASSWORD=1234
STASIS_APP=gemini-voice-assistant
# External Media Configuration
EXTERNAL_MEDIA_HOST=localhost
EXTERNAL_MEDIA_PORT=8090
# Real-time Processing
ENABLE_INTERRUPTION_HANDLING=true
MAX_CALL_DURATION=3600
AUTO_ANSWER_CALLS=trueThe system follows a clean, modular architecture:
- Core Layer: Main assistant logic and conversation management
- AI Layer: Gemini integration and response generation
- Audio Layer: Speech recognition and text-to-speech
- Telephony Layer: Asterisk ARI integration
- Utils Layer: Logging, exceptions, and utilities
- Config Layer: Settings and environment management
The modular design makes it easy to extend:
# Add new AI provider
from voice_assistant.ai.base_client import BaseAIClient
class NewAIClient(BaseAIClient):
def generate_response(self, text: str) -> str:
# Your implementation
pass
# Add new audio processor
from voice_assistant.audio.base_processor import BaseAudioProcessor
class NewAudioProcessor(BaseAudioProcessor):
def process_audio(self, audio_data: bytes) -> str:
# Your implementation
pass# Run all tests
pytest tests/
# Run with coverage
pytest tests/ --cov=src/voice_assistant --cov-report=html
# Run specific test file
pytest tests/test_gemini_client.py -vThe real-time integration provides comprehensive monitoring:
API Endpoints:
- System Status:
GET http://localhost:8000/status - Active Calls:
GET http://localhost:8000/calls - Call Details:
GET http://localhost:8000/calls/{channel_id} - Health Check:
GET http://localhost:8000/health - API Documentation:
GET http://localhost:8000/docs
Real-time Metrics:
- Audio Processing: Latency, buffer sizes, packet counts
- Session Management: Active sessions, conversation turns, duration
- Voice Activity: Speech detection accuracy, interruption handling
- Gemini Live API: Connection status, response times, error rates
- External Media: WebSocket connections, audio quality metrics
The assistant provides comprehensive monitoring:
- Real-time Status: State changes and processing updates
- Conversation Metrics: Success rates and response times
- Error Tracking: Detailed error logs and fallback handling
- Performance Stats: Session duration and interaction counts
Example output:
🤖 Voice Assistant with Gemini 2.5 Flash
============================================================
✅ System Information:
Assistant Name: ARI
AI Model: gemini-2.5-flash
Voice Language: en
Listen Timeout: 20.0s
✅ Virtual environment: Active
✅ Configuration: .env file found
✅ Google API Key: Configured
[💤 Ready - Waiting for input]
[🎤 Listening - Speak now]
👤 You: Hello, how are you?
[🧠 Processing - Thinking...]
[🗣️ Speaking - Response ready]
🤖 Assistant: Hello! I'm doing great, thank you for asking. I'm ARI, your voice assistant powered by Gemini 2.5 Flash. How can I help you today?
-
"Google API key is required":
- Check
.envfile exists and containsGOOGLE_API_KEY - Verify API key is valid and has proper permissions
- Check
-
Microphone not detected:
- Check microphone permissions in system settings
- Try:
pip install pyaudiofor better microphone support - Test with different microphone devices
-
Audio playback issues:
- Verify speakers/headphones are connected
- Check system audio settings
- Try different audio output devices
-
Import errors:
- Ensure virtual environment is activated
- Run:
pip install -r requirements.txt - Check Python version (3.8+ required)
Enable detailed logging:
# Set in .env file
LOG_LEVEL=DEBUG
# Or set environment variable
export LOG_LEVEL=DEBUG # Linux/Mac
set LOG_LEVEL=DEBUG # WindowsIf upgrading from the old OpenAI-based version:
- Backup your data: Save any important configurations
- Update dependencies:
pip install -r requirements.txt - Update environment: Replace
OPENAI_API_KEYwithGOOGLE_API_KEY - Test functionality: Run
python src/main.pyto verify
- ❌ Removed: OpenAI dependency and API key
- ✅ Added: Google Generative AI (Gemini 2.5 Flash)
- 🔄 Updated: Professional project structure
- 📈 Improved: Error handling and logging
- 🧪 Added: Test suite and documentation
This project is licensed under the MIT License.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes with proper tests
- Commit:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open a Pull Request
- 📚 Documentation: Check
docs/README.mdfor detailed guides - 🐛 Issues: Report bugs on GitHub Issues
- 💡 Features: Request features on GitHub Discussions
- 📧 Contact: Open an issue for support questions
🎉 Ready to start talking to your AI assistant!
Run python src/main.py and start your conversation with Gemini 2.5 Flash! 🚀