This example demonstrates how to build a real-time voice conversation agent using AgentScope's RealtimeAgent. The agent supports bidirectional voice streaming, enabling natural voice conversations with low latency and real-time audio transcription.
- Python 3.10 or higher
- Your DashScope API key in an environment variable
DASHSCOPE_API_KEY
Install the required packages:
uv pip install agentscope fastapi uvicorn websockets
# or
# pip install agentscopeRun the FastAPI server:
cd examples/agent/realtime_voice_agent
python run_server.pyThe server will start on http://localhost:8000 by default.
Open your web browser and navigate to:
http://localhost:8000
You will see a web interface with:
- Configuration panel (instructions and user name)
- Voice control buttons (Start Recording, Stop Recording, Disconnect)
- Video recording button (Start Video Recording)
- Text input field
- Message display area
- Video preview area (when video recording is active)
-
Configure the Agent (optional):
- Modify the "Instructions" to customize the agent's behavior
- Enter your name in the "User Name" field
-
Start Voice Recording:
- Click the "🎤 Start Recording" button
- Allow microphone access when prompted by your browser
- Speak naturally to the agent
- The agent will respond with voice and text
-
Stop Recording:
- Click "⏹️ Stop Recording" to pause voice input
-
Video Recording (Optional):
- Click the "📹 Start Video Recording" button to start video recording
- Allow camera access when prompted by your browser
- The system will automatically capture and send video frames to the server at 1 frame per second (1 fps)
- A video preview will be displayed while recording
- Click "🔴 Stop Video Recording" to stop recording
- Note: Video recording requires an active voice chat session. Please start voice chat first before starting video recording.
AgentScope supports multiple realtime voice models. By default, this example uses DashScope's qwen3-omni-flash-realtime model, but you can easily switch to other providers.
- GeminiRealtimeModel
- OpenAIRealtimeModel
Edit run_server.py and replace the model initialization code:
For OpenAI:
from agentscope.realtime import OpenAIRealtimeModel
agent = RealtimeAgent(
name="Friday",
sys_prompt=sys_prompt,
model=OpenAIRealtimeModel(
model_name="gpt-4o-realtime-preview",
api_key=os.getenv("OPENAI_API_KEY"),
voice="alloy", # Options: "alloy", "echo", "marin", "cedar"
),
)For Gemini:
from agentscope.realtime import GeminiRealtimeModel
agent = RealtimeAgent(
name="Friday",
sys_prompt=sys_prompt,
model=GeminiRealtimeModel(
model_name="gemini-2.5-flash-native-audio-preview-09-2025",
api_key=os.getenv("GEMINI_API_KEY"),
voice="Puck", # Options: "Puck", "Charon", "Kore", "Fenrir"
),
)Don't forget to set the corresponding API key environment variable before starting the server!