Extract, enrich, and score UCC filings from all 50 US state Secretary of State portals. Turns raw public records into prioritized, outreach-ready leads for the Merchant Cash Advance industry.
- Collects UCC-1 filing data from 50 state portals via 60+ autonomous agents (handles CAPTCHAs, rate limits, session management, and fallback strategies per state)
- Enriches each filing with data from SEC EDGAR, OSHA, USPTO, Census Bureau, SAM.gov, and optional commercial sources (D&B, Clearbit, ZoomInfo)
- Scores every prospect 0--100 on financing likelihood, assigns a health grade (A--F), and flags growth signals (hiring, permits, equipment purchases, expansion)
- Delivers results through a React web dashboard, REST API, or CLI tool
{
"company": "Pacific Coast Distributors LLC",
"state": "CA",
"ucc_filings": [
{
"filing_number": "2024-0847291",
"secured_party": "National Funding Inc",
"filing_date": "2024-03-15",
"type": "UCC-1"
}
],
"enrichment": {
"revenue_estimate": "$2.4M",
"employee_count": 34,
"growth_signals": ["hiring_detected", "new_permits", "equipment_purchase"],
"health_grade": "B+",
"priority_score": 82,
"industry": "Wholesale Distribution"
},
"recommendation": "HIGH PRIORITY - Active financing, strong growth signals, clean compliance record"
}git clone https://github.com/organvm-iii-ergon/public-record-data-scrapper.git
cd public-record-data-scrapper
npm install --legacy-peer-depsdocker-compose up -d db redis # Start PostgreSQL + Redis
npm run db:migrate && npm run seed # Initialize database
npm run dev:full # Start frontend + API + workerFrontend: http://localhost:5000 | API: http://localhost:3000
# Scrape UCC filings for a company
npm run scrape -- scrape-ucc -c "Company Name" -s CA -o results.json
# Enrich from public sources
npm run scrape -- enrich -c "Company Name" -s CA -o enriched.json
# Batch process from CSV
npm run scrape -- batch -i companies.csv -o ./results
# List all 50 state agents
npm run scrape -- list-statesβββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β React 19 + Vite (Vercel CDN) β
β Dashboard Β· Deal Pipeline Β· Compliance Β· Inbox β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
β Express API + BullMQ Workers β
β 27 domain services Β· OpenAPI 3.0 Β· JWT auth β
βββββββββ¬βββββββββββββββ¬βββββββββββββββββββββββββββββββ
β β
βββββββββΌβββββββ βββββββΌββββββββ ββββββββββββββββββββββ
β PostgreSQL β β Redis 7 β β Agent Orchestrator β
β 9 migrations β β cache/queue β β 60+ collectors β
β multitenancy β β β β circuit breakers β
ββββββββββββββββ βββββββββββββββ βββββββ¬βββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββ
β 50 State SOS Agents β
β CA Β· TX Β· NY Β· FL Β· IL Β· ... β
β + SEC Β· OSHA Β· USPTO Β· Census β
ββββββββββββββββββββββββββββββββββββββ
public-record-data-scrapper/
βββ apps/
β βββ web/ # React 19 dashboard (Radix UI, Tailwind)
β βββ desktop/ # Tauri desktop app
β βββ mobile/ # Mobile app target
βββ server/
β βββ services/ # 27 domain services (scoring, enrichment, compliance, ...)
β βββ integrations/ # Twilio, SendGrid, Plaid, ACH, AWS S3
β βββ routes/ # Express route handlers
β βββ openapi.yaml # API specification
βββ packages/
β βββ core/ # Shared types, DB client, utilities
β βββ ui/ # Shared component library
βββ database/
β βββ schema.sql # Full PostgreSQL schema
β βββ migrations/ # 9 versioned migrations with rollbacks
βββ terraform/ # AWS infrastructure (VPC, RDS, ElastiCache, S3)
βββ k8s/ # Kubernetes manifests
βββ monitoring/ # Prometheus + CloudWatch alert rules
βββ tests/ # Integration + E2E (Playwright)
The Express server exposes a RESTful API documented at /api/docs when running.
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/prospects |
List prospects with filtering and pagination |
GET |
/api/prospects/:id |
Prospect detail with enrichment data |
POST |
/api/prospects/:id/claim |
Claim a prospect for outreach |
POST |
/api/prospects/:id/score |
Trigger re-scoring |
GET |
/api/deals |
List deals with pipeline stage filter |
PATCH |
/api/deals/:id/stage |
Move deal to next pipeline stage |
POST |
/api/communications/send |
Send email or SMS |
GET |
/api/compliance/report |
Generate compliance report |
Full endpoint list: server/openapi.yaml
| Tier | Sources | Cost |
|---|---|---|
| Free / OSS | SEC EDGAR, OSHA, USPTO, Census, SAM.gov | $0 |
| Paid | + D&B, Clearbit, Experian, ZoomInfo, Google Places, NewsAPI | Subscription |
- 50-state UCC collection -- autonomous agents for every Secretary of State portal, with per-state fallback strategies (API, bulk download, vendor feed, scrape)
- ML-based lead scoring -- priority score (0--100), health grade, growth signal detection, revenue estimation, competitive position analysis
- Compliance built in -- CA SB 1235 and NY CFDL disclosure calculators, TCPA consent tracking, suppression list management, immutable audit trail
- Full broker workflow -- prospect dashboard, deal pipeline (Kanban), contact CRM, unified communications inbox (email/SMS/voice), bank statement underwriting (Plaid)
- Production infrastructure -- Terraform-provisioned AWS (VPC, RDS, ElastiCache, S3), Vercel frontend deployment, Docker Compose for local dev, Kubernetes manifests for container orchestration
2,055 tests across 91 files. Zero failures.
npm test # Full suite
npm run test:coverage # V8 coverage report
npm run test:server # Server-side only
npm run test:e2e # Playwright end-to-end| Category | Tests | Scope |
|---|---|---|
| Frontend components | ~500 | React dashboard, pipeline, inbox, forms |
| Server services | ~400 | All 27 domain services |
| State agents | ~250 | 50 state-specific collectors + fallback strategies |
| Agentic system | ~200 | Agent engine, orchestration, council |
| Data pipeline | ~150 | Quality checks, enrichment, stale data detection |
| Server routes | ~150 | API endpoints, webhook verification |
| Integration + E2E | ~100 | Cross-service workflows |
| Security | ~55 | XSS prevention, input sanitization |
docker-compose --profile development up -d # Full stack
docker-compose ps # Verify healthcd terraform
cp terraform.tfvars.example terraform.tfvars # Configure
terraform init && terraform plan # Review
terraform apply # DeployProvisions: VPC with multi-AZ subnets, RDS PostgreSQL (encrypted, Multi-AZ), ElastiCache Redis (encrypted), S3 with lifecycle policies, CloudWatch + SNS alerting, IAM with least-privilege policies.
- Fork and create a feature branch:
git checkout -b feature/your-feature - Install:
npm install --legacy-peer-deps - Develop:
npm run dev:full - Test:
npm test(all 2,055 tests must pass) - Lint:
npm run lint - Commit:
git commit -m "feat: description"(Conventional Commits) - Open a Pull Request
- State agent implementations -- live implementations needed for NY, IL, OH, GA, PA
- Enrichment sources -- state business registries, county assessor records
- Compliance expansion -- additional state disclosure requirements
- Performance -- query optimization for large prospect datasets (10K+)
See CONTRIBUTING.md for the full guide. Report security issues via SECURITY.md.
| Layer | Technology |
|---|---|
| Frontend | React 19, TypeScript 5.9, Vite, Radix UI, Tailwind CSS |
| Backend | Express, Node.js, BullMQ, Zod |
| Database | PostgreSQL 15, Redis 7 |
| Scraping | Puppeteer (headless browser automation) |
| Integrations | Twilio, SendGrid, Plaid, ACH, AWS S3 |
| Testing | Vitest, Testing Library, Playwright |
| Infrastructure | Terraform (AWS), Docker Compose, Kubernetes |
| CI/CD | GitHub Actions |
| Deployment | Vercel (frontend), AWS (backend) |