realtime-conversational-agent

Real-Time Conversational Agent Template

Overview

This repository provides a full-stack, reusable template for building real-time, multimodal conversational AI agents. It uses leverages the Agent Development Kit (ADK) on a FastAPI backend and a Next.js (React) frontend.

This template handles all the complex WebSocket and media streaming "plumbing," allowing you to focus on what matters: building your agent's logic and UI.

Agent Details

Feature	Description
Interaction Type	Conversational
Complexity	Medium
Agent Type	Single Agent
Components	Live API (Native Audio)
Vertical	Reusable across industry verticals

Features

Real-Time, Bidirectional Audio: Streams user microphone input to the agent and streams the agent's synthesized voice back to the client.
Live Video Feed: Supports streaming from a user's camera or screen share for the agent to analyze.
Live Transcriptions: Displays a real-time transcript of both the user's speech (STT) and the agent's speech (TTS).
Configurable Persona: The agent's identity and instructions are easily configured.

Use Cases & Examples

This template is the foundation for any application that requires an AI to see, hear, and talk in real-time.

Real-Time Tutors: An agent that can watch you solve a math problem (via screen share) and talk you through it. This is the default behavior of this example agent.
Live Customer Support: An agent that can visually guide a user through a website or product setup.
Accessibility Tools: A "be my eyes" agent that can describe a user's surroundings or the content of their screen.
Interactive Assistant: An agent that pairs with you, watches you work, and provides real-time feedback or assistance.

Setup and Installation

Prerequisites

Python 3.11+
Node.js v22+
uv for dependency management and packaging
- See the official uv website for installation.
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Agent Starter Pack (recommended)

Use the Agent Starter Pack to scaffold a production-ready project and choose your deployment target (Vertex AI Agent Engine or Cloud Run), with CI/CD and other production features. The easiest way is with uv (one command, no venv or pip install needed):

uvx agent-starter-pack create my-realtime-agent -a adk@realtime-conversational-agent

If you don't have uv yet: curl -LsSf https://astral.sh/uv/install.sh | sh

The starter pack will prompt you to select deployment options and set up your Google Cloud project.

Alternative: Using pip and a virtual environment

# Create and activate a virtual environment
python -m venv .venv && source .venv/bin/activate # On Windows: .venv\Scripts\activate

# Install the starter pack and create your project
pip install --upgrade agent-starter-pack
agent-starter-pack create my-realtime-agent -a adk@realtime-conversational-agent

Alternative: Local development (run from this sample repo)

Local Setup

Follow these instructions to get the client and server running on your local machine.

Prerequisites

Node.js: v22 or later
Python: 3.11 or later
uv (Recommended): Python package manager
```
pip install uv
```
Platform setup

Choose a platform from either Google AI Studio or Google Cloud Vertex AI:
- Option 1: Google AI Studio (Default)
  - Get a GOOGLE_API_KEY from Google AI Studio. This is the simplest way to get started.
- Option 2: Google Cloud / Vertex AI
  - Create a Google Cloud Project and enable the Vertex AI API.
  - Install the Google Cloud CLI.
  - Authenticate your local environment by running:
```
gcloud auth login
```

Server Setup

The server handles the AI agent logic and WebSocket connections.

Navigate to the server directory:
```
cd server
```
Create and activate a virtual environment:
```
uv venv
source .venv/bin/activate
```
Install Python dependencies:
```
uv pip install .
```

Set SSL Certificate File:

export SSL_CERT_FILE=$(python -m certifi)

Create your environment file:

Create a new file named .env in the server/ directory. Use one of the two templates below based on your authentication method.

server/.env (Option 1: Google AI Studio)

# --- AI Studio ---
GOOGLE_GENAI_USE_VERTEXAI=FALSE

# Get this from Google AI Studio
GOOGLE_API_KEY="PASTE_YOUR_ACTUAL_API_KEY_HERE"

# Configuration for the agent's voice (Example: 'Puck' for gemini-live-2.5-flash)
AGENT_VOICE="Puck"
AGENT_LANGUAGE="en-US"

server/.env (Option 2: Google Cloud / Vertex AI)

# --- Vertex AI ---
GOOGLE_GENAI_USE_VERTEXAI=TRUE

# Your Google Cloud project ID
GOOGLE_CLOUD_PROJECT="PASTE_YOUR_ACTUAL_PROJECT_ID"
# Your Vertex AI location (e.g., us-central1)
GOOGLE_CLOUD_LOCATION="us-central1"

# Configuration for the agent's voice (Example: 'Puck' for gemini-live-2.5-flash)
AGENT_VOICE="Puck"
AGENT_LANGUAGE="en-US"

Run the server:
```
uvicorn main:app --reload
```
The server will be running at http://127.0.0.1:8000.

Client Setup

The client is the Next.js application that the user interacts with.

Open a new terminal and navigate to the client directory:
```
cd client
```
Install Node.js dependencies:
```
npm install
```
Run the client:
```
npm run dev
```
The client will be running at http://localhost:3000.
Open the app:

Open http://localhost:3000 in your browser. You can now click the microphone icon to start a session!

How to Customize Your Agent

This repository is designed for easy reuse. You don't need to change any Python code to completely change your agent's persona.

Simply edit the AGENT_INSTRUCTION in your server/example_agent/prompts.py file.

Example: Expert Math Tutor -> Generic Assistant

To turn your "Math Tutor" agent into a "Generic Assistant", stop your server, replace the AGENT_INSTRUCTION in server/example_agent/prompts.py with the following, and restart the server.

AGENT_INSTRUCTION="You are a helpful and friendly AI assistant. Keep your responses concise."

Disclaimer

This agent sample is provided for illustrative purposes only and is not intended for production use. It serves as a basic example of an agent and a foundational starting point for individuals or teams to develop their own agents.

This sample has not been rigorously tested, may contain bugs or limitations, and does not include features or optimizations typically required for a production environment (e.g., robust error handling, security measures, scalability, performance considerations, comprehensive logging, or advanced configuration options).

Users are solely responsible for any further development, testing, security hardening, and deployment of agents based on this sample. We recommend thorough review, testing, and the implementation of appropriate safeguards before using any derived agent in a live or critical system.

Name		Name	Last commit message	Last commit date
parent directory ..
client		client
server		server
README.md		README.md
realtime-conversational-agent.png		realtime-conversational-agent.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Real-Time Conversational Agent Template

Overview

Agent Details

Features

Use Cases & Examples

Setup and Installation

Prerequisites

Agent Starter Pack (recommended)

Local Setup

Prerequisites

Server Setup

Client Setup

How to Customize Your Agent

Example: Expert Math Tutor -> Generic Assistant

Disclaimer

FilesExpand file tree

realtime-conversational-agent

Directory actions

More options

Directory actions

More options

Latest commit

History

realtime-conversational-agent

Folders and files

parent directory

README.md

Real-Time Conversational Agent Template

Overview

Agent Details

Features

Use Cases & Examples

Setup and Installation

Prerequisites

Agent Starter Pack (recommended)

Local Setup

Prerequisites

Server Setup

Client Setup

How to Customize Your Agent

Example: Expert Math Tutor -> Generic Assistant

Disclaimer