getting-started.md

Getting Started

This guide walks you through setting up Reader, verifying your installation, and running your first scrape.

Prerequisites

Node.js >= 18 (v22 recommended)
npm package manager

Note: The Hero browser runtime requires Node.js. Always run your scripts with node or npx tsx.

Installation

From npm

npm install @vakra-dev/reader

From source

git clone https://github.com/vakra-dev/reader.git
cd reader
npm install
npm run build

Verify Installation

Test the CLI

npx reader scrape https://example.com

You should see markdown output of the example.com page.

Test the API

Create a file test-scrape.ts:

import { ReaderClient } from "@vakra-dev/reader";

async function main() {
  const reader = new ReaderClient();

  const result = await reader.scrape({
    urls: ["https://example.com"],
    formats: ["markdown"],
  });

  console.log("Success:", result.batchMetadata.successfulUrls === 1);
  console.log("Content length:", result.data[0].markdown?.length);

  await reader.close();
}

main().catch(console.error);

Run it:

npx tsx test-scrape.ts

Your First Scrape

Single URL

import { ReaderClient } from "@vakra-dev/reader";

const reader = new ReaderClient();

const result = await reader.scrape({
  urls: ["https://news.ycombinator.com"],
  formats: ["markdown", "text"],
});

// Access the markdown content
console.log(result.data[0].markdown);

// Access metadata
console.log("Title:", result.data[0].metadata.website.title);
console.log("Duration:", result.data[0].metadata.duration, "ms");

await reader.close();

Multiple URLs

import { ReaderClient } from "@vakra-dev/reader";

const reader = new ReaderClient();

const result = await reader.scrape({
  urls: [
    "https://example.com",
    "https://example.org",
    "https://example.net",
  ],
  formats: ["markdown"],
  batchConcurrency: 3,
  onProgress: ({ completed, total, currentUrl }) => {
    console.log(`[${completed}/${total}] Scraping: ${currentUrl}`);
  },
});

console.log(`Scraped ${result.batchMetadata.successfulUrls} URLs`);
console.log(`Failed: ${result.batchMetadata.failedUrls}`);

await reader.close();

Crawl a Website

import { ReaderClient } from "@vakra-dev/reader";

const reader = new ReaderClient();

const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  maxPages: 10,
  scrape: true,
});

console.log(`Discovered ${result.urls.length} URLs:`);
result.urls.forEach((page) => {
  console.log(`  - ${page.title}: ${page.url}`);
});

if (result.scraped) {
  console.log(`\nScraped ${result.scraped.batchMetadata.successfulUrls} pages`);
}

await reader.close();

Understanding the Output

ScrapeResult Structure

interface ScrapeResult {
  // Array of scraped websites (one per URL)
  data: WebsiteScrapeResult[];

  // Metadata about the batch operation
  batchMetadata: {
    totalUrls: number;
    successfulUrls: number;
    failedUrls: number;
    scrapedAt: string;      // ISO timestamp
    totalDuration: number;  // milliseconds
    errors?: Array<{ url: string; error: string }>;
  };
}

interface WebsiteScrapeResult {
  // Content in requested formats
  markdown?: string;
  html?: string;
  json?: string;
  text?: string;

  // Metadata about this specific scrape
  metadata: {
    baseUrl: string;
    totalPages: number;
    scrapedAt: string;
    duration: number;
    website: WebsiteMetadata;  // Title, description, OG tags, etc.
  };
}

CrawlResult Structure

interface CrawlResult {
  // Discovered URLs with basic info
  urls: Array<{
    url: string;
    title: string;
    description: string | null;
  }>;

  // Full scrape results (only when scrape: true)
  scraped?: ScrapeResult;

  // Crawl operation metadata
  metadata: {
    totalUrls: number;
    maxDepth: number;
    totalDuration: number;
    seedUrl: string;
  };
}

CLI Quick Reference

Daemon Mode (Recommended for Multiple Requests)

# Start daemon (once, in a separate terminal or background)
npx reader start --pool-size 5

# Scrape (auto-detects and uses daemon if running)
npx reader scrape https://example.com

# Crawl (auto-detects and uses daemon if running)
npx reader crawl https://example.com -d 2

# Check daemon status
npx reader status

# Stop daemon
npx reader stop

# Force standalone mode (bypass daemon)
npx reader scrape https://example.com --standalone

Scraping

# Scrape a URL to markdown
npx reader scrape https://example.com

# Scrape with multiple formats
npx reader scrape https://example.com -f markdown,text,json

# Scrape multiple URLs concurrently
npx reader scrape url1 url2 url3 -c 3

# Save output to file
npx reader scrape https://example.com -o output.md

# Enable verbose logging
npx reader scrape https://example.com -v

# Show browser window (debugging)
npx reader scrape https://example.com --show-chrome

Crawling

# Crawl a website
npx reader crawl https://example.com -d 2 -m 20

# Crawl and scrape content
npx reader crawl https://example.com -d 2 --scrape

Environment Variables

Variable	Description
`LOG_LEVEL`	Logging level: `debug`, `info`, `warn`, `error` (default: `info`)
`NODE_ENV`	Set to `development` for pretty-printed logs

Common Issues

"Chrome/Chromium not found"

Hero automatically downloads Chrome on first run. If this fails:

# Manually install Chrome dependencies (Ubuntu/Debian)
sudo apt-get install -y chromium-browser

# Or use the system Chrome
export CHROME_PATH=/usr/bin/chromium-browser

"ECONNREFUSED" errors

This usually means the target site is blocking requests. Try:

Use a proxy: --proxy http://user:pass@host:port
Add delays between requests: --delay 2000
Use verbose mode to see what's happening: -v

ESM/CommonJS issues

Reader is ESM-only. Make sure your package.json has:

{
  "type": "module"
}

Or use the .mjs extension for your files.

Next Steps

Based on your use case, explore these guides:

Use Case	Guide
Understanding Cloudflare bypass	Cloudflare Bypass
Setting up proxies	Proxy Configuration
Production server deployment	Production Server
High-volume scraping	Browser Pool
Docker deployment	Docker
Serverless deployment	Serverless

Need Help?

Check the Troubleshooting Guide
Browse Examples
Open an issue on GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

Prerequisites

Installation

From npm

From source

Verify Installation

Test the CLI

Test the API

Your First Scrape

Single URL

Multiple URLs

Crawl a Website

Understanding the Output

ScrapeResult Structure

CrawlResult Structure

CLI Quick Reference

Daemon Mode (Recommended for Multiple Requests)

Scraping

Crawling

Environment Variables

Common Issues

"Chrome/Chromium not found"

"ECONNREFUSED" errors

ESM/CommonJS issues

Next Steps

Need Help?

FilesExpand file tree

getting-started.md

Latest commit

History

getting-started.md

File metadata and controls

Getting Started

Prerequisites

Installation

From npm

From source

Verify Installation

Test the CLI

Test the API

Your First Scrape

Single URL

Multiple URLs

Crawl a Website

Understanding the Output

ScrapeResult Structure

CrawlResult Structure

CLI Quick Reference

Daemon Mode (Recommended for Multiple Requests)

Scraping

Crawling

Environment Variables

Common Issues

"Chrome/Chromium not found"

"ECONNREFUSED" errors

ESM/CommonJS issues

Next Steps

Need Help?