Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Multi-Agent Realtime Voice Interaction Example

This example demonstrates how to use AgentScope's ChatRoom class to create a multi-agent real-time voice interaction system where two AI agents can have autonomous conversations without user input.

Features

  • 🗣️ Real-time Voice Interaction: Two agents communicate through voice in real-time
  • 🤖 Autonomous Conversation: Agents converse with each other without user intervention
  • ⚙️ Customizable Configuration: Configure agent names and instructions through the web interface
  • 🎨 Modern UI: Clean, shadcn-inspired interface for easy interaction
  • 📊 Live Transcript: See the conversation transcripts in real-time

Architecture

The example uses:

  • Backend: FastAPI server with WebSocket support
  • Frontend: HTML5 with Web Audio API for audio playback
  • AgentScope Components:
    • ChatRoom: Manages multiple RealtimeAgent instances
    • RealtimeAgent: Handles real-time voice interaction with AI models
    • DashScopeRealtimeModel: DashScope's Qwen3-Omni realtime model

Prerequisites

  1. Python Dependencies:

    pip install agentscope[dashscope]
    pip install fastapi uvicorn
  2. DashScope API Key:

    • Set your DashScope API key as an environment variable:
      export DASHSCOPE_API_KEY="your-api-key-here"

Usage

  1. Start the Server:

    python run_server.py
  2. Open the Web Interface:

    • Navigate to http://localhost:8000 in your web browser
  3. Configure Agents:

    • Set names and instructions for both Agent 1 and Agent 2
    • Example configurations:
      • Agent 1 (Alice): "You are Alice, a cheerful and optimistic person who loves to share stories and ask questions. Keep your responses brief and conversational."
      • Agent 2 (Bob): "You are Bob, a thoughtful and analytical person who enjoys deep conversations. Keep your responses brief and conversational."
  4. Start the Conversation:

    • Click the "▶️ Start Conversation" button
    • The agents will begin conversing autonomously
    • You'll see transcripts and system messages in the message panel
    • Audio playback will stream in real-time
  5. Stop the Conversation:

    • Click the "⏹️ Stop Conversation" button when you want to end the session

How It Works

Backend Flow

  1. WebSocket Connection: Client connects via WebSocket to /ws/{user_id}/{session_id}
  2. Session Creation:
    • Client sends client_session_create event with agent configurations
    • Server creates two RealtimeAgent instances with specified names and instructions
    • Server creates a ChatRoom with both agents
    • Server starts the chat room and returns session_created event
  3. Message Broadcasting:
    • ChatRoom automatically broadcasts messages between agents
    • All events (audio, transcripts, etc.) are forwarded to the frontend
  4. Session End: Client sends client_session_end event to stop the conversation

Frontend Flow

  1. WebSocket Setup: Establishes connection and waits for server events
  2. Session Management: Sends configuration and manages conversation state
  3. Audio Playback:
    • Receives base64-encoded PCM16 audio chunks
    • Decodes and queues audio data
    • Uses Web Audio API ScriptProcessorNode for streaming playback at 24kHz
  4. Transcript Display: Shows real-time transcripts from both agents

Key Components

ChatRoom

The ChatRoom class manages multiple RealtimeAgent instances:

  • Establishes connections for all agents
  • Broadcasts messages between agents automatically
  • Forwards events to the frontend
  • Handles lifecycle management (start/stop)

RealtimeAgent

Each RealtimeAgent:

  • Connects to the DashScope realtime API
  • Processes audio input from other agents
  • Generates voice responses
  • Emits events for transcripts, audio, and status updates

Customization

Changing the Model

To use a different model, modify the DashScopeRealtimeModel configuration in run_server.py:

model=DashScopeRealtimeModel(
    model_name="your-model-name",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
)

Adding More Agents

To add more agents, modify the agent creation section in run_server.py:

agent3 = RealtimeAgent(
    name=agent3_name,
    sys_prompt=agent3_instructions,
    model=DashScopeRealtimeModel(
        model_name="qwen3-omni-flash-realtime",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
    ),
)

chat_room = ChatRoom(agents=[agent1, agent2, agent3])

And update the frontend to include configuration fields for the additional agents.

Troubleshooting

No Audio Playback

  • Ensure your browser supports Web Audio API
  • Check browser console for audio-related errors
  • Verify the audio format matches the expected PCM16 at 24kHz

Connection Issues

  • Verify your DashScope API key is set correctly
  • Check that port 8000 is not blocked by firewall
  • Review server logs for error messages

Agents Not Responding

  • Ensure both agent configurations have valid instructions
  • Check that the instructions encourage conversational behavior
  • Review the console logs for API errors

References