This guide walks you through setting up Reader, verifying your installation, and running your first scrape.
- Node.js >= 18 (v22 recommended)
- npm package manager
Note: The Hero browser runtime requires Node.js. Always run your scripts with
nodeornpx tsx.
npm install @vakra-dev/readergit clone https://github.com/vakra-dev/reader.git
cd reader
npm install
npm run buildnpx reader scrape https://example.comYou should see markdown output of the example.com page.
Create a file test-scrape.ts:
import { ReaderClient } from "@vakra-dev/reader";
async function main() {
const reader = new ReaderClient();
const result = await reader.scrape({
urls: ["https://example.com"],
formats: ["markdown"],
});
console.log("Success:", result.batchMetadata.successfulUrls === 1);
console.log("Content length:", result.data[0].markdown?.length);
await reader.close();
}
main().catch(console.error);Run it:
npx tsx test-scrape.tsimport { ReaderClient } from "@vakra-dev/reader";
const reader = new ReaderClient();
const result = await reader.scrape({
urls: ["https://news.ycombinator.com"],
formats: ["markdown", "text"],
});
// Access the markdown content
console.log(result.data[0].markdown);
// Access metadata
console.log("Title:", result.data[0].metadata.website.title);
console.log("Duration:", result.data[0].metadata.duration, "ms");
await reader.close();import { ReaderClient } from "@vakra-dev/reader";
const reader = new ReaderClient();
const result = await reader.scrape({
urls: [
"https://example.com",
"https://example.org",
"https://example.net",
],
formats: ["markdown"],
batchConcurrency: 3,
onProgress: ({ completed, total, currentUrl }) => {
console.log(`[${completed}/${total}] Scraping: ${currentUrl}`);
},
});
console.log(`Scraped ${result.batchMetadata.successfulUrls} URLs`);
console.log(`Failed: ${result.batchMetadata.failedUrls}`);
await reader.close();import { ReaderClient } from "@vakra-dev/reader";
const reader = new ReaderClient();
const result = await reader.crawl({
url: "https://example.com",
depth: 2,
maxPages: 10,
scrape: true,
});
console.log(`Discovered ${result.urls.length} URLs:`);
result.urls.forEach((page) => {
console.log(` - ${page.title}: ${page.url}`);
});
if (result.scraped) {
console.log(`\nScraped ${result.scraped.batchMetadata.successfulUrls} pages`);
}
await reader.close();interface ScrapeResult {
// Array of scraped websites (one per URL)
data: WebsiteScrapeResult[];
// Metadata about the batch operation
batchMetadata: {
totalUrls: number;
successfulUrls: number;
failedUrls: number;
scrapedAt: string; // ISO timestamp
totalDuration: number; // milliseconds
errors?: Array<{ url: string; error: string }>;
};
}
interface WebsiteScrapeResult {
// Content in requested formats
markdown?: string;
html?: string;
json?: string;
text?: string;
// Metadata about this specific scrape
metadata: {
baseUrl: string;
totalPages: number;
scrapedAt: string;
duration: number;
website: WebsiteMetadata; // Title, description, OG tags, etc.
};
}interface CrawlResult {
// Discovered URLs with basic info
urls: Array<{
url: string;
title: string;
description: string | null;
}>;
// Full scrape results (only when scrape: true)
scraped?: ScrapeResult;
// Crawl operation metadata
metadata: {
totalUrls: number;
maxDepth: number;
totalDuration: number;
seedUrl: string;
};
}# Start daemon (once, in a separate terminal or background)
npx reader start --pool-size 5
# Scrape (auto-detects and uses daemon if running)
npx reader scrape https://example.com
# Crawl (auto-detects and uses daemon if running)
npx reader crawl https://example.com -d 2
# Check daemon status
npx reader status
# Stop daemon
npx reader stop
# Force standalone mode (bypass daemon)
npx reader scrape https://example.com --standalone# Scrape a URL to markdown
npx reader scrape https://example.com
# Scrape with multiple formats
npx reader scrape https://example.com -f markdown,text,json
# Scrape multiple URLs concurrently
npx reader scrape url1 url2 url3 -c 3
# Save output to file
npx reader scrape https://example.com -o output.md
# Enable verbose logging
npx reader scrape https://example.com -v
# Show browser window (debugging)
npx reader scrape https://example.com --show-chrome# Crawl a website
npx reader crawl https://example.com -d 2 -m 20
# Crawl and scrape content
npx reader crawl https://example.com -d 2 --scrape| Variable | Description |
|---|---|
LOG_LEVEL |
Logging level: debug, info, warn, error (default: info) |
NODE_ENV |
Set to development for pretty-printed logs |
Hero automatically downloads Chrome on first run. If this fails:
# Manually install Chrome dependencies (Ubuntu/Debian)
sudo apt-get install -y chromium-browser
# Or use the system Chrome
export CHROME_PATH=/usr/bin/chromium-browserThis usually means the target site is blocking requests. Try:
- Use a proxy:
--proxy http://user:pass@host:port - Add delays between requests:
--delay 2000 - Use verbose mode to see what's happening:
-v
Reader is ESM-only. Make sure your package.json has:
{
"type": "module"
}Or use the .mjs extension for your files.
Based on your use case, explore these guides:
| Use Case | Guide |
|---|---|
| Understanding Cloudflare bypass | Cloudflare Bypass |
| Setting up proxies | Proxy Configuration |
| Production server deployment | Production Server |
| High-volume scraping | Browser Pool |
| Docker deployment | Docker |
| Serverless deployment | Serverless |
- Check the Troubleshooting Guide
- Browse Examples
- Open an issue on GitHub