Scripts for fetching, transforming, and seeding skill data into the SkillX database.
| Script | Purpose |
|---|---|
fetch-skillsmp-skills.mjs |
Fetch skills from SkillsMP API (single query, up to 5K) |
fetch-skillsmp-all.mjs |
Fetch ALL skills via multi-query strategy (130K+) |
build-seed-data.mjs |
Add manually curated skills to seed data |
seed-skills.mjs |
Seed skills to the SkillX API in batches (resumable) |
seed-data.json |
Combined seed data (~133K skills) |
seed-batches/ |
Split batch files (1K skills each, 134 files) |
SkillsMP API ──► fetch-skillsmp-skills.mjs ──► seed-data.json ◄── build-seed-data.mjs (manual)
──► fetch-skillsmp-all.mjs ──┘ │
├──► split into seed-batches/*.json (1K each)
▼
seed-skills.mjs ──► /api/admin/seed ──► D1 + Vectorize
SKILLSMP_API_KEY=sk_live_... node scripts/fetch-skillsmp-skills.mjs
SKILLSMP_API_KEY=sk_live_... node scripts/fetch-skillsmp-skills.mjs --reset # start fresh- Fetches up to 5,000 skills (100/page, 50 pages max)
- Resumable: saves progress to
.fetch-progress.jsonafter each page - Merges new skills into
seed-data.json(existing slugs preserved) - Maps GitHub stars →
github_stars(notinstall_count)
Edit build-seed-data.mjs to add skills, then run:
node scripts/build-seed-data.mjsSKILLSMP_API_KEY=sk_live_... node scripts/fetch-skillsmp-all.mjs
SKILLSMP_API_KEY=sk_live_... node scripts/fetch-skillsmp-all.mjs --reset- Uses ~100+ keyword queries to bypass the 5K-per-query API cap
- Deduplicates across queries by skill ID
- Resumable: saves progress to
.fetch-all-progress.json - Fetched 133K+ unique skills in last run
# From seed-data.json (default)
ADMIN_SECRET=xxx API_URL=https://skillx.sh node scripts/seed-skills.mjs
# From batch files (recommended for large datasets)
ADMIN_SECRET=xxx API_URL=https://skillx.sh node scripts/seed-skills.mjs --from-batches
ADMIN_SECRET=xxx API_URL=https://skillx.sh node scripts/seed-skills.mjs --file=5 # single batch
ADMIN_SECRET=xxx API_URL=https://skillx.sh node scripts/seed-skills.mjs --range=6-134 # range of batches
# Options
--reset Clear progress, seed all from scratch
--batch=N Skills per batch sent to API (default: 50)
--skip-vectors Skip Vectorize embedding (faster, backfill later)
--from-batches Read from seed-batches/*.json instead of seed-data.json
--file=N Seed only batch file N (e.g. --file=5 → batch-005.json)
--range=A-B Seed batch files A through B (e.g. --range=6-134)- Resumable: saves progress to
.seed-progress.json - Re-run without
--resetto continue from last checkpoint - Batch files: 134 files × 1K skills each in
seed-batches/
seed-skills.mjsreadsseed-data.jsonand sends batches toPOST /api/admin/seed- API validates
X-Admin-Secretheader - For each skill:
- Upserts into D1 via Drizzle ORM (conflict on
slug) - Chunks content into ~512-token segments (10% overlap)
- Generates 768-dim embeddings via Workers AI (bge-base-en-v1.5)
- Upserts vectors to Vectorize index
- Upserts into D1 via Drizzle ORM (conflict on
- Returns
{ skills: count, vectors: count }
{
"name": "skill-name",
"slug": "author-skill-name",
"description": "Short description",
"content": "Markdown content with features, usage, etc.",
"author": "author-name",
"source_url": "https://github.com/...",
"category": "implementation|testing|planning|devops|security|documentation|marketing|database",
"install_command": "npx skills add author/repo/skill",
"version": "1.0.0",
"is_paid": false,
"price_cents": 0,
"github_stars": 1234,
"install_count": 0,
"avg_rating": 7.2,
"rating_count": 25
}Note: github_stars and install_count are separate metrics. Stars come from GitHub; installs are tracked by actual usage.
- D1 upserts on slug conflict (updates existing skills)
- Vectorize upserts replace existing vectors by ID
- Safe to run multiple times