This example demonstrates how to use AgentScope's ChatRoom class to create a multi-agent real-time voice interaction system where two AI agents can have autonomous conversations without user input.
- 🗣️ Real-time Voice Interaction: Two agents communicate through voice in real-time
- 🤖 Autonomous Conversation: Agents converse with each other without user intervention
- ⚙️ Customizable Configuration: Configure agent names and instructions through the web interface
- 🎨 Modern UI: Clean, shadcn-inspired interface for easy interaction
- 📊 Live Transcript: See the conversation transcripts in real-time
The example uses:
- Backend: FastAPI server with WebSocket support
- Frontend: HTML5 with Web Audio API for audio playback
- AgentScope Components:
ChatRoom: Manages multipleRealtimeAgentinstancesRealtimeAgent: Handles real-time voice interaction with AI modelsDashScopeRealtimeModel: DashScope's Qwen3-Omni realtime model
-
Python Dependencies:
pip install agentscope[dashscope] pip install fastapi uvicorn
-
DashScope API Key:
- Set your DashScope API key as an environment variable:
export DASHSCOPE_API_KEY="your-api-key-here"
- Set your DashScope API key as an environment variable:
-
Start the Server:
python run_server.py
-
Open the Web Interface:
- Navigate to
http://localhost:8000in your web browser
- Navigate to
-
Configure Agents:
- Set names and instructions for both Agent 1 and Agent 2
- Example configurations:
- Agent 1 (Alice): "You are Alice, a cheerful and optimistic person who loves to share stories and ask questions. Keep your responses brief and conversational."
- Agent 2 (Bob): "You are Bob, a thoughtful and analytical person who enjoys deep conversations. Keep your responses brief and conversational."
-
Start the Conversation:
- Click the "
▶️ Start Conversation" button - The agents will begin conversing autonomously
- You'll see transcripts and system messages in the message panel
- Audio playback will stream in real-time
- Click the "
-
Stop the Conversation:
- Click the "⏹️ Stop Conversation" button when you want to end the session
- WebSocket Connection: Client connects via WebSocket to
/ws/{user_id}/{session_id} - Session Creation:
- Client sends
client_session_createevent with agent configurations - Server creates two
RealtimeAgentinstances with specified names and instructions - Server creates a
ChatRoomwith both agents - Server starts the chat room and returns
session_createdevent
- Client sends
- Message Broadcasting:
ChatRoomautomatically broadcasts messages between agents- All events (audio, transcripts, etc.) are forwarded to the frontend
- Session End: Client sends
client_session_endevent to stop the conversation
- WebSocket Setup: Establishes connection and waits for server events
- Session Management: Sends configuration and manages conversation state
- Audio Playback:
- Receives base64-encoded PCM16 audio chunks
- Decodes and queues audio data
- Uses Web Audio API
ScriptProcessorNodefor streaming playback at 24kHz
- Transcript Display: Shows real-time transcripts from both agents
The ChatRoom class manages multiple RealtimeAgent instances:
- Establishes connections for all agents
- Broadcasts messages between agents automatically
- Forwards events to the frontend
- Handles lifecycle management (start/stop)
Each RealtimeAgent:
- Connects to the DashScope realtime API
- Processes audio input from other agents
- Generates voice responses
- Emits events for transcripts, audio, and status updates
To use a different model, modify the DashScopeRealtimeModel configuration in run_server.py:
model=DashScopeRealtimeModel(
model_name="your-model-name",
api_key=os.getenv("DASHSCOPE_API_KEY"),
)To add more agents, modify the agent creation section in run_server.py:
agent3 = RealtimeAgent(
name=agent3_name,
sys_prompt=agent3_instructions,
model=DashScopeRealtimeModel(
model_name="qwen3-omni-flash-realtime",
api_key=os.getenv("DASHSCOPE_API_KEY"),
),
)
chat_room = ChatRoom(agents=[agent1, agent2, agent3])And update the frontend to include configuration fields for the additional agents.
- Ensure your browser supports Web Audio API
- Check browser console for audio-related errors
- Verify the audio format matches the expected PCM16 at 24kHz
- Verify your DashScope API key is set correctly
- Check that port 8000 is not blocked by firewall
- Review server logs for error messages
- Ensure both agent configurations have valid instructions
- Check that the instructions encourage conversational behavior
- Review the console logs for API errors