Skip to content

livinhigh/claude-computer-use-agent

Repository files navigation

Author: Ivan Joseph Thomas

Computer Use Demo - FastAPI Edition

A FastAPI-based implementation of Anthropic's Computer Use Demo with multi-display VNC support for isolated browser sessions.

Demo Video

(https://www.youtube.com/watch?v=nhum3jhyFw4)

5-minute walkthrough of the multi-display architecture and features

Overview

This project provides a web-based interface for interacting with Claude AI agents that can control a computer through visual interfaces. It features:

  • Multi-Display Support: Run multiple isolated X11 displays simultaneously
  • Real-time WebSocket Updates: Stream task execution and screenshots
  • Display Isolation: Each browser tab connects to a separate X11 display
  • Session Management: Create, start, and manage agent sessions
  • Task Execution: Execute natural language tasks with visual feedback

Architecture

┌─────────────────┐
│   Frontend      │  React-based UI (3-panel layout)
│   (Browser)     │  - Left: Session/Task History
│                 │  - Middle: VNC Display
│                 │  - Right: Chat Interface
└────────┬────────┘
         │ HTTP/WebSocket
┌────────▼────────┐
│   FastAPI       │  Python backend
│   Server        │  - REST API endpoints
│   (Port 8000)   │  - WebSocket streaming
└────────┬────────┘
         │
┌────────▼────────┐
│   Services      │  Business logic layer
│                 │  - SessionService
│                 │  - AgentExecutionService
└────────┬────────┘
         │
┌────────▼────────┐
│   Claude API    │  Anthropic's Computer Use
│                 │  - Tool execution
│                 │  - Screenshot capture
│                 │  - Action execution
└────────┬────────┘
         │
┌────────▼────────┐
│   X11 Displays  │  Virtual displays (created on-demand)
│   :1, :2, :3... │  - Xvfb (virtual framebuffer)
│                 │  - mutter (window manager)
│                 │  - tint2 (taskbar)
│   VNC Servers   │  - x11vnc (VNC server)
│   Dynamic Ports │  - noVNC (web viewer)
└─────────────────┘

Features

1. Dynamic Multi-Display Architecture

The system creates X11 displays on-demand when sessions are created:

  • Display :1 → First session → VNC/noVNC ports allocated from configured range
  • Display :2 → Second session → Different VNC/noVNC ports
  • Display :3 → Third session → And so on...

Each display has:

  • Separate Xvfb instance (virtual framebuffer)
  • Independent mutter window manager
  • Isolated tint2 taskbar
  • Dedicated Firefox profile with -no-remote flag
  • Unique port pair (VNC + noVNC) within configured ranges

2. Dynamic Display Assignment

Each session gets its own dedicated display:

  • Session 1 → Display :1 (e.g., VNC 5900, noVNC 6080)
  • Session 2 → Display :2 (e.g., VNC 5901, noVNC 6081)
  • Session 3 → Display :3 (e.g., VNC 5902, noVNC 6082)

Displays are created dynamically when sessions are initialized:

  • Ports are randomly allocated from configured ranges
  • Database tracks active port assignments to prevent conflicts
  • Network availability is verified before allocation
  • Shell scripts create the display infrastructure on-the-fly

3. Task Execution

Tasks execute with display-specific isolation:

  • Context variables (ContextVar) ensure async-safe display assignment
  • Tool operations (click, type, screenshot) target the correct X11 display
  • No cross-contamination between concurrent tasks

4. Real-time Updates

WebSocket streaming provides:

  • Assistant text messages
  • Base64-encoded screenshots
  • Tool execution logs
  • Task status transitions

Installation

Prerequisites

  • Docker and Docker Compose
  • Anthropic API key (Get one here)

Quick Start

Step 1: Get your Anthropic API key

Sign up at Anthropic Console and create an API key.

Step 2: Set the API key as environment variable

# Windows PowerShell
$env:ANTHROPIC_API_KEY = "sk-your-actual-api-key-here"

# Optional: Set custom port ranges
$env:VNC_PORT_MIN = "5900"
$env:VNC_PORT_MAX = "5910"
$env:NOVNC_PORT_MIN = "6080"
$env:NOVNC_PORT_MAX = "6090"

# Optional: Set display dimensions
$env:WIDTH = "1024"
$env:HEIGHT = "768"
# Linux/Mac
export ANTHROPIC_API_KEY="sk-your-actual-api-key-here"

# Optional: Set custom port ranges
export VNC_PORT_MIN=5900
export VNC_PORT_MAX=5910
export NOVNC_PORT_MIN=6080
export NOVNC_PORT_MAX=6090

# Optional: Set display dimensions
export WIDTH=1024
export HEIGHT=768

Step 3: Build the Docker image

docker build -f Dockerfile.fastapi -t computer-use-demo:local .

Step 4: Run the container

# Windows PowerShell
docker run `
  -e ANTHROPIC_API_KEY=$env:ANTHROPIC_API_KEY `
  -v //var/run/docker.sock:/var/run/docker.sock `
  -v "$(Get-Location)\computer_use_demo:/home/computeruse/computer_use_demo" `
  -v "$HOME\.anthropic:/home/computeruse/.anthropic" `
  -p 8000:8000 `
  -p 6000-6100:6000-6100 `
  -it computer-use-demo:local
# Linux/Mac
docker run \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  -v //var/run/docker.sock:/var/run/docker.sock \
  -v "$(pwd)/computer_use_demo:/home/computeruse/computer_use_demo" \
  -v "$HOME/.anthropic:/home/computeruse/.anthropic" \
  -p 8000:8000 \
  -it computer-use-demo:local

Step 5: Access the application

Open your browser and navigate to: http://localhost:8000

Environment Variables Reference

Variable Default Description
ANTHROPIC_API_KEY (required) Your Anthropic API key from https://console.anthropic.com/
VNC_PORT_MIN 5900 Minimum VNC server port
VNC_PORT_MAX 5910 Maximum VNC server port
NOVNC_PORT_MIN 6080 Minimum noVNC web viewer port
NOVNC_PORT_MAX 6090 Maximum noVNC web viewer port
WIDTH 1024 Display width in pixels
HEIGHT 768 Display height in pixels
DATABASE_URL sqlite:///./computer_use_demo.db Database connection URL

Docker Port Mapping

Environment Variables

  • ANTHROPIC_API_KEY: Your Anthropic API key (required)
  • WIDTH: Display width in pixels (default: 1024)
  • HEIGHT: Display height in pixels (default: 768)
  • DATABASE_URL: SQLite database path (default: sqlite:///./computer_use_demo.db)

Usage

Web Interface

  1. Open browser: http://localhost:8000
  2. Click "Start New Task" to create a session
  3. Enter task description (e.g., "Open Firefox and search for Python")
  4. Watch real-time execution in VNC display
  5. Open additional tabs for isolated sessions on Display 2

API Access

Interactive documentation available at:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

See API_DOCUMENTATION.md for detailed endpoint reference.

VNC Access

Direct VNC connections:

  • Display 1: vnc://localhost:6080
  • Display 2: vnc://localhost:6081

Or use native VNC client:

  • Display 1: localhost:5900
  • Display 2: localhost:5901

Project Structure

computer_use_demo/
├── database/
│   ├── models.py           # SQLAlchemy database models
│   └── __init__.py         # Database initialization
├── routes/
│   ├── sessions.py         # Session management endpoints
│   ├── tasks.py            # Task execution endpoints
│   ├── health.py           # Health check endpoint
│   └── __init__.py
├── services/
│   ├── session_service.py  # Session business logic
│   └── agent_execution_service.py  # Task execution logic
├── tools/
│   ├── computer.py         # Computer use tool (click, type, screenshot)
│   ├── bash.py             # Bash command execution
│   ├── edit.py             # Text file editing
│   └── ...
├── static/
│   ├── index.html          # Frontend HTML
│   ├── app.js              # Frontend JavaScript
│   └── styles.css          # Frontend styles
├── loop.py                 # Core sampling loop
├── schemas.py              # Pydantic request/response models
├── main.py                 # FastAPI application entry point
└── requirements.txt        # Python dependencies

image/
├── .config/
│   └── tint2/
│       ├── tint2rc         # Taskbar configuration
│       └── applications/   # Desktop files
├── xvfb_startup.sh         # Start virtual displays
├── x11vnc_startup.sh       # Start VNC servers
├── novnc_startup.sh        # Start noVNC proxies
├── mutter_startup.sh       # Start window manager
├── tint2_startup.sh        # Start taskbar
├── firefox_launcher.sh     # Display-specific Firefox launcher
└── start_all.sh            # Master startup script

Key Components

Frontend (app.js)

  • Display Assignment: getOrAssignDisplay() manages display allocation
  • WebSocket Client: Real-time task updates
  • Markdown Rendering: Uses marked.js for message formatting
  • Session Management: Create/switch between sessions
  • Keyboard Handler: Enter to send, Shift+Enter for newline

Backend Services

SessionService:

  • CRUD operations for sessions
  • Session state management (IDLE → ACTIVE → COMPLETED)
  • Task creation and tracking
  • Log storage for WebSocket streaming

AgentExecutionService:

  • Task execution via sampling loop
  • Display isolation with context variables
  • Screenshot and tool result streaming
  • Error handling and cancellation

Display Isolation

Context Variables (agent_execution_service.py):

task_display_num: ContextVar[int] = ContextVar('task_display_num', default=1)

Tool Access (computer.py):

from computer_use_demo.services.agent_execution_service import task_display_num
self.display_num = task_display_num.get()

Firefox Profiles (firefox_launcher.sh):

PROFILE_DIR="/tmp/firefox-profile-display${DISPLAY_NUM}"
firefox-esr -no-remote -profile "$PROFILE_DIR" "$@"

Configuration

Tint2 Taskbar (image/.config/tint2/tint2rc)

panel_items = TL  # T=Taskbar, L=Launcher
launcher_item_app = /usr/share/applications/firefox-esr.desktop
launcher_item_app = /usr/share/applications/libreoffice-calc.desktop
# ... more applications

Supported Models

  • claude-sonnet-4-5-20250929 (default)
  • claude-opus-4-5-20250929
  • claude-sonnet-4-20250514
  • claude-opus-4-20250514

Troubleshooting

Firefox "Already Running" Error

Cause: Missing -no-remote flag or shared profile Solution: Verify firefox_launcher.sh uses -no-remote and display-specific profiles

Icons Not Showing in Taskbar

Cause: Desktop files have wrong permissions or format Solution: Ensure desktop files are mode 644 (readable, not executable)

Tasks Executing on Wrong Display

Cause: Context variable not set Solution: Verify task_display_num.set(display_num) called before execution

WebSocket Connection Fails

Cause: CORS or connection issue Solution: Check browser console for errors, verify WebSocket URL

Development

Running Locally (without Docker)

# Install dependencies
pip install -r computer_use_demo/requirements.txt

# Set environment variables
export ANTHROPIC_API_KEY=your_key_here
export WIDTH=1024
export HEIGHT=768

# Start X11 displays
./image/xvfb_startup.sh
./image/start_all.sh

# Start FastAPI server
uvicorn computer_use_demo.main:app --host 0.0.0.0 --port 8000

Adding New Endpoints

  1. Create route in computer_use_demo/routes/
  2. Add schemas to computer_use_demo/schemas.py
  3. Register router in computer_use_demo/main.py

Database Migrations

Currently using SQLite with SQLAlchemy. For migrations:

from computer_use_demo.database import Base, engine
Base.metadata.create_all(bind=engine)

Security Considerations

⚠️ This is a development/demo project. Before deploying to production:

  • Add authentication (API keys, OAuth)
  • Implement rate limiting
  • Configure CORS properly
  • Use PostgreSQL instead of SQLite
  • Add input validation and sanitization
  • Implement request logging
  • Set up SSL/TLS
  • Restrict VNC access (use VPN or SSH tunnel)

Performance

  • Concurrent Sessions: Supports 2 displays by default (expandable)
  • Task Queue: Background task execution with asyncio
  • Database: SQLite (sufficient for demo, use PostgreSQL for production)
  • Memory: ~2GB per display (Xvfb + mutter + Firefox)

Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

License

This project builds upon Anthropic's Computer Use Demo. See LICENSE file for details.

Acknowledgments

  • Anthropic for the Computer Use API and original demo
  • noVNC project for web-based VNC viewer
  • FastAPI for the excellent web framework

Support

For issues and questions:

About

A python based , fast api app that leverages AI to do computer use agent actions on a containerized docker application using NoVNC. Project based on anthropic's computer use codebase

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors