realtime_voice_agent

Realtime Voice Agent Example

This example demonstrates how to build a real-time voice conversation agent using AgentScope's RealtimeAgent. The agent supports bidirectional voice streaming, enabling natural voice conversations with low latency and real-time audio transcription.

Prerequisites

Python 3.10 or higher
Your DashScope API key in an environment variable DASHSCOPE_API_KEY

Install the required packages:

uv pip install agentscope fastapi uvicorn websockets
# or
# pip install agentscope

Usage

1. Start the Server

Run the FastAPI server:

cd examples/agent/realtime_voice_agent
python run_server.py

The server will start on http://localhost:8000 by default.

2. Open the Web Interface

Open your web browser and navigate to:

http://localhost:8000

You will see a web interface with:

Configuration panel (instructions and user name)
Voice control buttons (Start Recording, Stop Recording, Disconnect)
Video recording button (Start Video Recording)
Text input field
Message display area
Video preview area (when video recording is active)

3. Start Conversation

Configure the Agent (optional):
- Modify the "Instructions" to customize the agent's behavior
- Enter your name in the "User Name" field
Start Voice Recording:
- Click the "🎤 Start Recording" button
- Allow microphone access when prompted by your browser
- Speak naturally to the agent
- The agent will respond with voice and text
Stop Recording:
- Click "⏹️ Stop Recording" to pause voice input
Video Recording (Optional):
- Click the "📹 Start Video Recording" button to start video recording
- Allow camera access when prompted by your browser
- The system will automatically capture and send video frames to the server at 1 frame per second (1 fps)
- A video preview will be displayed while recording
- Click "🔴 Stop Video Recording" to stop recording
- Note: Video recording requires an active voice chat session. Please start voice chat first before starting video recording.

Switching Models

AgentScope supports multiple realtime voice models. By default, this example uses DashScope's qwen3-omni-flash-realtime model, but you can easily switch to other providers.

Supported Models

GeminiRealtimeModel
OpenAIRealtimeModel

How to Switch Models

Edit run_server.py and replace the model initialization code:

For OpenAI:

from agentscope.realtime import OpenAIRealtimeModel

agent = RealtimeAgent(
    name="Friday",
    sys_prompt=sys_prompt,
    model=OpenAIRealtimeModel(
        model_name="gpt-4o-realtime-preview",
        api_key=os.getenv("OPENAI_API_KEY"),
        voice="alloy",  # Options: "alloy", "echo", "marin", "cedar"
    ),
)

For Gemini:

from agentscope.realtime import GeminiRealtimeModel

agent = RealtimeAgent(
    name="Friday",
    sys_prompt=sys_prompt,
    model=GeminiRealtimeModel(
        model_name="gemini-2.5-flash-native-audio-preview-09-2025",
        api_key=os.getenv("GEMINI_API_KEY"),
        voice="Puck",  # Options: "Puck", "Charon", "Kore", "Fenrir"
    ),
)

Don't forget to set the corresponding API key environment variable before starting the server!

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
chatbot.html		chatbot.html
run_server.py		run_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Realtime Voice Agent Example

Prerequisites

Usage

1. Start the Server

2. Open the Web Interface

3. Start Conversation

Switching Models

Supported Models

How to Switch Models

FilesExpand file tree

realtime_voice_agent

Directory actions

More options

Directory actions

More options

Latest commit

History

realtime_voice_agent

Folders and files

parent directory

README.md

Realtime Voice Agent Example

Prerequisites

Usage

1. Start the Server

2. Open the Web Interface

3. Start Conversation

Switching Models

Supported Models

How to Switch Models