Author: Ivan Joseph Thomas
A FastAPI-based implementation of Anthropic's Computer Use Demo with multi-display VNC support for isolated browser sessions.
(https://www.youtube.com/watch?v=nhum3jhyFw4)
5-minute walkthrough of the multi-display architecture and features
This project provides a web-based interface for interacting with Claude AI agents that can control a computer through visual interfaces. It features:
- Multi-Display Support: Run multiple isolated X11 displays simultaneously
- Real-time WebSocket Updates: Stream task execution and screenshots
- Display Isolation: Each browser tab connects to a separate X11 display
- Session Management: Create, start, and manage agent sessions
- Task Execution: Execute natural language tasks with visual feedback
┌─────────────────┐
│ Frontend │ React-based UI (3-panel layout)
│ (Browser) │ - Left: Session/Task History
│ │ - Middle: VNC Display
│ │ - Right: Chat Interface
└────────┬────────┘
│ HTTP/WebSocket
┌────────▼────────┐
│ FastAPI │ Python backend
│ Server │ - REST API endpoints
│ (Port 8000) │ - WebSocket streaming
└────────┬────────┘
│
┌────────▼────────┐
│ Services │ Business logic layer
│ │ - SessionService
│ │ - AgentExecutionService
└────────┬────────┘
│
┌────────▼────────┐
│ Claude API │ Anthropic's Computer Use
│ │ - Tool execution
│ │ - Screenshot capture
│ │ - Action execution
└────────┬────────┘
│
┌────────▼────────┐
│ X11 Displays │ Virtual displays (created on-demand)
│ :1, :2, :3... │ - Xvfb (virtual framebuffer)
│ │ - mutter (window manager)
│ │ - tint2 (taskbar)
│ VNC Servers │ - x11vnc (VNC server)
│ Dynamic Ports │ - noVNC (web viewer)
└─────────────────┘
The system creates X11 displays on-demand when sessions are created:
- Display :1 → First session → VNC/noVNC ports allocated from configured range
- Display :2 → Second session → Different VNC/noVNC ports
- Display :3 → Third session → And so on...
Each display has:
- Separate Xvfb instance (virtual framebuffer)
- Independent mutter window manager
- Isolated tint2 taskbar
- Dedicated Firefox profile with
-no-remoteflag - Unique port pair (VNC + noVNC) within configured ranges
Each session gets its own dedicated display:
- Session 1 → Display :1 (e.g., VNC 5900, noVNC 6080)
- Session 2 → Display :2 (e.g., VNC 5901, noVNC 6081)
- Session 3 → Display :3 (e.g., VNC 5902, noVNC 6082)
Displays are created dynamically when sessions are initialized:
- Ports are randomly allocated from configured ranges
- Database tracks active port assignments to prevent conflicts
- Network availability is verified before allocation
- Shell scripts create the display infrastructure on-the-fly
Tasks execute with display-specific isolation:
- Context variables (
ContextVar) ensure async-safe display assignment - Tool operations (click, type, screenshot) target the correct X11 display
- No cross-contamination between concurrent tasks
WebSocket streaming provides:
- Assistant text messages
- Base64-encoded screenshots
- Tool execution logs
- Task status transitions
- Docker and Docker Compose
- Anthropic API key (Get one here)
Step 1: Get your Anthropic API key
Sign up at Anthropic Console and create an API key.
Step 2: Set the API key as environment variable
# Windows PowerShell
$env:ANTHROPIC_API_KEY = "sk-your-actual-api-key-here"
# Optional: Set custom port ranges
$env:VNC_PORT_MIN = "5900"
$env:VNC_PORT_MAX = "5910"
$env:NOVNC_PORT_MIN = "6080"
$env:NOVNC_PORT_MAX = "6090"
# Optional: Set display dimensions
$env:WIDTH = "1024"
$env:HEIGHT = "768"# Linux/Mac
export ANTHROPIC_API_KEY="sk-your-actual-api-key-here"
# Optional: Set custom port ranges
export VNC_PORT_MIN=5900
export VNC_PORT_MAX=5910
export NOVNC_PORT_MIN=6080
export NOVNC_PORT_MAX=6090
# Optional: Set display dimensions
export WIDTH=1024
export HEIGHT=768Step 3: Build the Docker image
docker build -f Dockerfile.fastapi -t computer-use-demo:local .Step 4: Run the container
# Windows PowerShell
docker run `
-e ANTHROPIC_API_KEY=$env:ANTHROPIC_API_KEY `
-v //var/run/docker.sock:/var/run/docker.sock `
-v "$(Get-Location)\computer_use_demo:/home/computeruse/computer_use_demo" `
-v "$HOME\.anthropic:/home/computeruse/.anthropic" `
-p 8000:8000 `
-p 6000-6100:6000-6100 `
-it computer-use-demo:local# Linux/Mac
docker run \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-v //var/run/docker.sock:/var/run/docker.sock \
-v "$(pwd)/computer_use_demo:/home/computeruse/computer_use_demo" \
-v "$HOME/.anthropic:/home/computeruse/.anthropic" \
-p 8000:8000 \
-it computer-use-demo:localStep 5: Access the application
Open your browser and navigate to: http://localhost:8000
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY |
(required) | Your Anthropic API key from https://console.anthropic.com/ |
VNC_PORT_MIN |
5900 | Minimum VNC server port |
VNC_PORT_MAX |
5910 | Maximum VNC server port |
NOVNC_PORT_MIN |
6080 | Minimum noVNC web viewer port |
NOVNC_PORT_MAX |
6090 | Maximum noVNC web viewer port |
WIDTH |
1024 | Display width in pixels |
HEIGHT |
768 | Display height in pixels |
DATABASE_URL |
sqlite:///./computer_use_demo.db | Database connection URL |
ANTHROPIC_API_KEY: Your Anthropic API key (required)WIDTH: Display width in pixels (default: 1024)HEIGHT: Display height in pixels (default: 768)DATABASE_URL: SQLite database path (default: sqlite:///./computer_use_demo.db)
- Open browser:
http://localhost:8000 - Click "Start New Task" to create a session
- Enter task description (e.g., "Open Firefox and search for Python")
- Watch real-time execution in VNC display
- Open additional tabs for isolated sessions on Display 2
Interactive documentation available at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
See API_DOCUMENTATION.md for detailed endpoint reference.
Direct VNC connections:
- Display 1:
vnc://localhost:6080 - Display 2:
vnc://localhost:6081
Or use native VNC client:
- Display 1:
localhost:5900 - Display 2:
localhost:5901
computer_use_demo/
├── database/
│ ├── models.py # SQLAlchemy database models
│ └── __init__.py # Database initialization
├── routes/
│ ├── sessions.py # Session management endpoints
│ ├── tasks.py # Task execution endpoints
│ ├── health.py # Health check endpoint
│ └── __init__.py
├── services/
│ ├── session_service.py # Session business logic
│ └── agent_execution_service.py # Task execution logic
├── tools/
│ ├── computer.py # Computer use tool (click, type, screenshot)
│ ├── bash.py # Bash command execution
│ ├── edit.py # Text file editing
│ └── ...
├── static/
│ ├── index.html # Frontend HTML
│ ├── app.js # Frontend JavaScript
│ └── styles.css # Frontend styles
├── loop.py # Core sampling loop
├── schemas.py # Pydantic request/response models
├── main.py # FastAPI application entry point
└── requirements.txt # Python dependencies
image/
├── .config/
│ └── tint2/
│ ├── tint2rc # Taskbar configuration
│ └── applications/ # Desktop files
├── xvfb_startup.sh # Start virtual displays
├── x11vnc_startup.sh # Start VNC servers
├── novnc_startup.sh # Start noVNC proxies
├── mutter_startup.sh # Start window manager
├── tint2_startup.sh # Start taskbar
├── firefox_launcher.sh # Display-specific Firefox launcher
└── start_all.sh # Master startup script
- Display Assignment:
getOrAssignDisplay()manages display allocation - WebSocket Client: Real-time task updates
- Markdown Rendering: Uses
marked.jsfor message formatting - Session Management: Create/switch between sessions
- Keyboard Handler: Enter to send, Shift+Enter for newline
SessionService:
- CRUD operations for sessions
- Session state management (IDLE → ACTIVE → COMPLETED)
- Task creation and tracking
- Log storage for WebSocket streaming
AgentExecutionService:
- Task execution via sampling loop
- Display isolation with context variables
- Screenshot and tool result streaming
- Error handling and cancellation
Context Variables (agent_execution_service.py):
task_display_num: ContextVar[int] = ContextVar('task_display_num', default=1)Tool Access (computer.py):
from computer_use_demo.services.agent_execution_service import task_display_num
self.display_num = task_display_num.get()Firefox Profiles (firefox_launcher.sh):
PROFILE_DIR="/tmp/firefox-profile-display${DISPLAY_NUM}"
firefox-esr -no-remote -profile "$PROFILE_DIR" "$@"panel_items = TL # T=Taskbar, L=Launcher
launcher_item_app = /usr/share/applications/firefox-esr.desktop
launcher_item_app = /usr/share/applications/libreoffice-calc.desktop
# ... more applicationsclaude-sonnet-4-5-20250929(default)claude-opus-4-5-20250929claude-sonnet-4-20250514claude-opus-4-20250514
Cause: Missing -no-remote flag or shared profile
Solution: Verify firefox_launcher.sh uses -no-remote and display-specific profiles
Cause: Desktop files have wrong permissions or format Solution: Ensure desktop files are mode 644 (readable, not executable)
Cause: Context variable not set
Solution: Verify task_display_num.set(display_num) called before execution
Cause: CORS or connection issue Solution: Check browser console for errors, verify WebSocket URL
# Install dependencies
pip install -r computer_use_demo/requirements.txt
# Set environment variables
export ANTHROPIC_API_KEY=your_key_here
export WIDTH=1024
export HEIGHT=768
# Start X11 displays
./image/xvfb_startup.sh
./image/start_all.sh
# Start FastAPI server
uvicorn computer_use_demo.main:app --host 0.0.0.0 --port 8000- Create route in
computer_use_demo/routes/ - Add schemas to
computer_use_demo/schemas.py - Register router in
computer_use_demo/main.py
Currently using SQLite with SQLAlchemy. For migrations:
from computer_use_demo.database import Base, engine
Base.metadata.create_all(bind=engine)- Add authentication (API keys, OAuth)
- Implement rate limiting
- Configure CORS properly
- Use PostgreSQL instead of SQLite
- Add input validation and sanitization
- Implement request logging
- Set up SSL/TLS
- Restrict VNC access (use VPN or SSH tunnel)
- Concurrent Sessions: Supports 2 displays by default (expandable)
- Task Queue: Background task execution with asyncio
- Database: SQLite (sufficient for demo, use PostgreSQL for production)
- Memory: ~2GB per display (Xvfb + mutter + Firefox)
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
This project builds upon Anthropic's Computer Use Demo. See LICENSE file for details.
- Anthropic for the Computer Use API and original demo
- noVNC project for web-based VNC viewer
- FastAPI for the excellent web framework
For issues and questions:
- GitHub Issues: [repository issues page]
- API Documentation: API_DOCUMENTATION.md
- Anthropic Docs: https://docs.anthropic.com/