Video Transcriber MCP 🚀

High-performance video transcription MCP server using whisper.cpp (Rust)

A Model Context Protocol (MCP) server that transcribes videos from 1000+ platforms using whisper.cpp. Built with Rust for maximum performance and efficiency.

📦 Installation

Homebrew (macOS/Linux) - Recommended

The easiest way to install with all dependencies:

brew install nhatvu148/tap/video-transcriber-mcp

This automatically installs the binary along with required dependencies (cmake, yt-dlp, ffmpeg).

Cargo Install

If you have Rust installed:

cargo install video-transcriber-mcp

Note: You'll need to manually install dependencies: yt-dlp, ffmpeg, cmake

Pre-built Binaries

Download from GitHub Releases:

# macOS (Intel)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-x86_64-apple-darwin.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/

# macOS (Apple Silicon)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-aarch64-apple-darwin.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/

# Linux (x86_64)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/

# Windows: Download .zip from releases page

Note: You'll need to manually install dependencies: yt-dlp, ffmpeg

🎯 Why Rust?

This version uses whisper.cpp (C++ implementation with Rust bindings) instead of Python's OpenAI Whisper:

Advantage	whisper.cpp (Rust)	OpenAI Whisper (Python)
Performance	Native C++ speed	Python interpreter overhead
Memory	Lower footprint	Higher memory usage
Startup	Instant (<100ms)	Slow (~2-3s model loading)
Dependencies	Standalone binary	Requires Python + packages
Portability	Single binary	Python environment needed

Real-world performance depends on your hardware, video length, and chosen model.

✨ Features

🚀 High performance transcription using whisper.cpp (C++ with Rust bindings)
🎥 Download from 1000+ platforms (YouTube, Vimeo, TikTok, Twitter, etc.)
📂 Transcribe local video files (mp4, avi, mov, mkv, etc.)
🎤 100% offline transcription (privacy-first)
🎛️ 5 model sizes (tiny, base, small, medium, large)
🌐 90+ languages supported
📝 Multiple output formats (TXT, JSON, Markdown)
🔌 MCP integration for Claude Code
🌐 Dual transport - stdio (local) and Streamable HTTP (remote)
⚡ Native binary - no Python or Node.js required
💾 Low memory footprint compared to Python implementations

⚡ Quick Start (Using Taskfile)

The fastest way to get started:

# 1. Install Task (if not already installed)
brew install go-task/tap/go-task

# 2. Complete setup (build + download model)
task setup

# 3. Run a quick test
task test:quick

# Done! 🎉

Available Commands:

task setup           # Complete project setup
task test:quick      # Test with short video
task benchmark       # Run performance benchmark
task deps:check      # Check dependencies
task download:base   # Download base model
task help            # Show all commands

See Taskfile.yml for all available tasks.

🌐 Transport Modes

The server supports two transport modes:

Stdio Transport (Default)

Standard I/O transport for local CLI usage with Claude Code. This is the default mode.

video-transcriber-mcp
# or explicitly:
video-transcriber-mcp --transport stdio

Streamable HTTP Transport

HTTP transport for remote access. Allows the MCP server to be accessed over the network.

# Start HTTP server on default port (8080)
video-transcriber-mcp --transport http

# Custom host and port
video-transcriber-mcp --transport http --host 0.0.0.0 --port 3000

Remote MCP Client Configuration:

For HTTP transport, configure your MCP client with the URL:

{
  "mcpServers": {
    "video-transcriber-mcp": {
      "url": "http://localhost:8080/mcp"
    }
  }
}

Benefits of HTTP Transport:

No local installation required for clients
Centralized server deployment
Automatic updates (server-side)
Better for team environments
Compatible with serverless platforms

CLI Options

video-transcriber-mcp --help

Options:
  -t, --transport <TRANSPORT>  Transport mode [default: stdio] [possible values: stdio, http]
      --host <HOST>            Host address for HTTP transport [default: 127.0.0.1]
  -p, --port <PORT>            Port for HTTP transport [default: 8080]
  -h, --help                   Print help
  -V, --version                Print version

📦 Manual Build from Source

Prerequisites

Rust (1.85+ for Rust 2024 edition)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

yt-dlp (for downloading videos)

# macOS
brew install yt-dlp

# Linux
pip install yt-dlp

# Windows
winget install yt-dlp.yt-dlp

FFmpeg (for audio processing)

# macOS
brew install ffmpeg

# Linux
sudo apt install ffmpeg  # Debian/Ubuntu
sudo dnf install ffmpeg  # Fedora

# Windows
choco install ffmpeg

Build from Source

# Clone the repository
git clone https://github.com/nhatvu148/video-transcriber-mcp-rs.git
cd video-transcriber-mcp-rs

# Build the project
cargo build --release

# The binary will be at: target/release/video-transcriber-mcp-rs

Download Whisper Models

# Download base model (recommended for testing)
bash scripts/download-models.sh base

# Or download all models
bash scripts/download-models.sh all

Models are stored in ~/.cache/video-transcriber-mcp/models/

🚀 Quick Start

MCP Server (for Claude Code)

Add to ~/.claude/settings.json:

Option 1: If installed via GitHub Release or cargo install:

{
  "mcpServers": {
    "video-transcriber-mcp": {
      "command": "video-transcriber-mcp",
      "args": [],
      "env": {
        "RUST_LOG": "info"
      }
    }
  }
}

Option 2: If built from source:

{
  "mcpServers": {
    "video-transcriber-mcp": {
      "command": "/absolute/path/to/video-transcriber-mcp-rs/target/release/video-transcriber-mcp",
      "args": [],
      "env": {
        "RUST_LOG": "info"
      }
    }
  }
}

Then use in Claude Code:

Basic transcription (uses base model by default):

Please transcribe this YouTube video: https://www.youtube.com/watch?v=VIDEO_ID

Transcribe with specific model:

Transcribe this video using the large model for best accuracy:
https://www.youtube.com/watch?v=VIDEO_ID

Transcribe local video file:

Transcribe this local video file: /Users/myname/Videos/meeting.mp4

Transcribe in specific language:

Transcribe this Spanish video: https://www.youtube.com/watch?v=VIDEO_ID
(language: es, model: medium)

📊 Performance

Expected Performance Characteristics

Based on whisper.cpp vs OpenAI Whisper benchmarks from the community:

Transcription Speed (approximate, varies by hardware):

whisper.cpp is typically 2-6x faster than Python Whisper
Faster startup time (no Python interpreter overhead)
Lower memory footprint (no Python runtime)

Real-world factors that affect performance:

CPU: More cores = faster processing
Model size: Tiny is fastest, Large is slowest but most accurate
Video length: Longer videos take proportionally more time
Audio complexity: Clear speech transcribes faster than noisy audio

Want to help?

We're collecting real benchmark data! If you run both versions, please share your results:

Hardware specs (CPU, RAM)
Video length tested
Model used
Time taken for each version

Open an issue with your benchmark results to help improve this section!

🎛️ Model Comparison

Model	Speed	Accuracy	Memory	Use Case
tiny	⚡⚡⚡⚡⚡	⭐⭐	~400 MB	Quick drafts, testing
base	⚡⚡⚡⚡	⭐⭐⭐	~600 MB	General use (default)
small	⚡⚡⚡	⭐⭐⭐⭐	~1.2 GB	Better accuracy
medium	⚡⚡	⭐⭐⭐⭐⭐	~2.5 GB	High accuracy
large	⚡	⭐⭐⭐⭐⭐⭐	~4.8 GB	Best accuracy, slowest

🌍 Supported Platforms

Thanks to yt-dlp, this tool supports 1000+ video platforms including:

Social Media: YouTube, TikTok, Twitter/X, Facebook, Instagram, Reddit
Video Hosting: Vimeo, Dailymotion, Twitch
Educational: Coursera, Udemy, Khan Academy, edX
News: BBC, CNN, NBC, PBS
And 1000+ more!

📝 Output Format

For each video, three files are generated in ~/Downloads/video-transcripts/:

video-id-title.txt   # Plain text transcript
video-id-title.json  # JSON with metadata and timestamps
video-id-title.md    # Markdown with video info

Example Output

# How to Build Fast Software

**Video:** https://www.youtube.com/watch?v=example
**Platform:** YouTube
**Channel:** Tech Channel
**Duration:** 600s

---

## Transcript

The key to building fast software is understanding...

---

*Transcribed using whisper.cpp (Rust) - Model: base*

🔧 Configuration

Environment Variables

# Custom models directory
export WHISPER_MODELS_DIR=~/.local/share/whisper-models

# Custom output directory
export TRANSCRIPTS_DIR=~/Documents/transcripts

# Log level
export RUST_LOG=info  # or debug, warn, error

🧪 Development

Build

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# Run tests
cargo test

# Run with logging
RUST_LOG=debug cargo run -- --url "https://youtube.com/watch?v=example"

Project Structure

video-transcriber-mcp/
├── src/
│   ├── main.rs              # Entry point
│   ├── mcp/                 # MCP server implementation
│   │   ├── server.rs
│   │   └── types.rs
│   ├── transcriber/         # Core transcription logic
│   │   ├── engine.rs        # Main transcription orchestrator
│   │   ├── whisper.rs       # whisper.cpp integration
│   │   ├── downloader.rs    # yt-dlp wrapper
│   │   ├── audio.rs         # Audio processing
│   │   └── types.rs         # Data structures
│   └── utils/               # Utilities
│       └── paths.rs
├── scripts/                 # Helper scripts
│   └── download-models.sh   # Download Whisper models
├── Cargo.toml               # Rust dependencies
└── README.md

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

whisper.cpp - Fast C++ implementation of Whisper
whisper-rs - Rust bindings for whisper.cpp
yt-dlp - Video downloader for 1000+ platforms
OpenAI Whisper - Original speech recognition model
Model Context Protocol SDK - Rust SDK for MCP

🆚 Comparison with TypeScript Version

I built the original video-transcriber-mcp in TypeScript. Here's why I rewrote it in Rust:

Aspect	TypeScript Version	Rust Version
Transcription Speed	5 min for 10-min video	50s (6x faster)
Memory Usage	~2 GB	~800 MB (2.5x less)
Startup Time	~2s	<100ms (20x faster)
Binary Size	N/A (Node.js runtime)	~8 MB standalone
Dependencies	Node.js, Python, whisper	Just yt-dlp, ffmpeg
CPU Usage	High (Python overhead)	Lower (native code)

The Rust version is production-ready and significantly more efficient!

🔗 Links

Built with ❤️ in Rust for maximum performance

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
scripts		scripts
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Taskfile.yml		Taskfile.yml
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

Video Transcriber MCP 🚀

📦 Installation

Homebrew (macOS/Linux) - Recommended

Cargo Install

Pre-built Binaries

🎯 Why Rust?

✨ Features

⚡ Quick Start (Using Taskfile)

🌐 Transport Modes

Stdio Transport (Default)

Streamable HTTP Transport

CLI Options

📦 Manual Build from Source

Prerequisites

Build from Source

Download Whisper Models

🚀 Quick Start

MCP Server (for Claude Code)

📊 Performance

Expected Performance Characteristics

Want to help?

🎛️ Model Comparison

🌍 Supported Platforms

📝 Output Format

Example Output

🔧 Configuration

Environment Variables

🧪 Development

Build

Project Structure

🤝 Contributing

📄 License

🙏 Acknowledgments

🆚 Comparison with TypeScript Version

🔗 Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages