Skip to content

feat(hosted-keys): add Hunter.io and People Data Labs hosted key support#4742

Open
TheodoreSpeaks wants to merge 4 commits into
stagingfrom
feat/hosted-key-enrichments
Open

feat(hosted-keys): add Hunter.io and People Data Labs hosted key support#4742
TheodoreSpeaks wants to merge 4 commits into
stagingfrom
feat/hosted-key-enrichments

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

Summary

  • Add hosted key support to all Hunter.io and People Data Labs tools — Sim supplies a key from a round-robin env pool when the user hasn't set one via BYOK
  • Register `hunter` and `peopledatalabs` as BYOK providers (type union, contract enum, settings UI)
  • Meter usage via per-tool `pricing.getCost`: Hunter search $0.015 / verify $0.0075, PDL $0.28/credit; free endpoints (Hunter discover/email-count/companies-find, PDL cleaners + autocomplete) billed $0
  • Per-workspace rate limits; PDL search/bulk add a 600 credits/min cap dimension
  • Hide the API Key field on hosted Sim; document `HUNTER_API_KEY_` / `PEOPLEDATALABS_API_KEY_` env vars

Type of Change

  • New feature

Testing

Tested manually. `bun run lint`, `bun run check:api-validation:strict`, `tsc --noEmit`, and the 92 tool/rate-limiter tests all pass.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 26, 2026 7:47pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 26, 2026

PR Summary

Medium Risk
Introduces paid third-party usage metering and rate limits; misconfigured getCost logic could under- or over-charge hosted usage, though the pattern matches existing hosted tools.

Overview
Adds hosted API key support for Hunter.io and People Data Labs so Sim can supply keys from env pools when workspaces do not use BYOK.

Hunter — all six tools get hosting config (HUNTER_API_KEY_* round-robin, byokProviderId: hunter). Billing uses response-derived credits: search tools charge when results exist ($0.015/credit), verifier always charges ($0.0075); discover, email count, and companies find are $0. PDL — all eleven tools are wired similarly (PEOPLEDATALABS_API_KEY_*, ~$0.28/credit). Enrich/identify charge on match; search/bulk charge per returned or matched record; cleaners and autocomplete are $0. Search and bulk add a 600 credits/min rate-limit dimension via new countBulkMatched helper.

Product surfacehunter and peopledatalabs join BYOK types, API contract, and workspace settings UI. Hunter and PDL blocks hide the API key field when hosted (hideWhenHosted). .env.example documents the new key pools.

Reviewed by Cursor Bugbot for commit ecd9ef0. Bugbot is set up for automated code reviews on this repo. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Greptile Summary

This PR adds Sim-hosted API key support for all Hunter.io and People Data Labs tools, registering both as BYOK providers and wiring per-tool pricing and rate-limit configs that meter usage when Sim supplies the key.

  • Hunter tools: domain_search, email_finder, and email_verifier get hosted-key configs with custom getCost functions that reflect Hunter's credit model (search credit charged only on non-empty results; verifier always charges half a credit); discover, email-count, and companies-find are marked free.
  • PDL tools: All 9 tools get hosted-key configs; bulk enrich uses a shared countBulkMatched utility (graceful on missing data), while search/identify tools charge per returned record with a 600-credits/min dimension cap; cleaners and autocomplete are free.
  • Infrastructure: The migration CI step switches from bunx drizzle-kit migrate to a custom bun run ./scripts/migrate.ts that sets statement_timeout = 0 before running, allowing long-running schema changes without hitting server-level query timeouts.

Confidence Score: 4/5

The core hosted-key plumbing is consistent with existing providers, BYOK contract and type union are correctly extended, and the pricing logic matches documented Hunter/PDL billing models. The getCost/extractUsage inconsistency is safe today but fragile if transformResponse is ever refactored.

The implementation is well-structured and follows established patterns. The getCost functions that throw on missing array fields (person_search, company_search, domain_search, person_identify) are inconsistent with their paired extractUsage lambdas that return 0 gracefully — if output fields are ever absent, a successful API call would surface an error to the user. The bulk enrich tools handle this correctly via countBulkMatched.

apps/sim/tools/peopledatalabs/person_search.ts, company_search.ts, person_identify.ts, and apps/sim/tools/hunter/domain_search.ts — the inconsistency between getCost throwing and extractUsage returning 0 is worth aligning before this pattern spreads to more tools.

Important Files Changed

Filename Overview
apps/sim/tools/hunter/domain_search.ts Adds hosting/pricing/rate-limit config for Hunter domain search; getCost throws on missing emails array (defensive but inconsistent with graceful extractUsage patterns).
apps/sim/tools/hunter/email_finder.ts Adds hosting config; pricing checks output.email string presence — correctly charges 1 credit only when an email is returned.
apps/sim/tools/hunter/email_verifier.ts Adds flat per-request pricing (always charges one verification credit regardless of result, matching Hunter's billing model).
apps/sim/tools/peopledatalabs/person_search.ts Adds hosting config with custom rate-limit dimensions (600 credits/min); getCost throws on missing results while extractUsage returns 0 — inconsistent error handling.
apps/sim/tools/peopledatalabs/company_search.ts Same hosting pattern as person_search; same getCost/extractUsage inconsistency for missing results.
apps/sim/tools/peopledatalabs/bulk_person_enrich.ts Uses countBulkMatched utility in both getCost and extractUsage — consistent graceful handling when results are missing.
apps/sim/tools/peopledatalabs/bulk_company_enrich.ts Same pattern as bulk_person_enrich; correctly uses countBulkMatched for both cost and dimension tracking.
apps/sim/tools/peopledatalabs/person_identify.ts Adds hosting config; getCost throws if matches is not an array, though transformResponse always returns an array ([] on 404).
apps/sim/tools/peopledatalabs/utils.ts Adds countBulkMatched helper that returns 0 gracefully for non-array results — good defensive design used by bulk tools.
packages/db/scripts/migrate.ts Adds SET statement_timeout = 0 before migrations to override any server-level query timeout; removes the per-statement safety net but is a standard pattern for long-running migrations.
apps/sim/lib/api/contracts/byok-keys.ts Correctly adds 'hunter' and 'peopledatalabs' to the byokProviderIdSchema enum.
apps/sim/tools/types.ts Extends BYOKProviderId union type with 'hunter' and 'peopledatalabs' — in sync with the contract enum.
apps/sim/app/workspace/[workspaceId]/settings/components/byok/byok.tsx Adds Hunter and People Data Labs entries to the PROVIDERS config with correct icons, descriptions, and placeholder text.
.github/workflows/migrations.yml Migration command updated from bunx drizzle-kit migrate to bun run ./scripts/migrate.ts — uses the custom script that disables statement_timeout.
apps/sim/tools/hunter/types.ts Adds HUNTER_API_KEY_PREFIX, HUNTER_SEARCH_CREDIT_USD, and HUNTER_VERIFICATION_CREDIT_USD constants with clear pricing rationale comments.
apps/sim/tools/peopledatalabs/types.ts Adds PEOPLEDATALABS_API_KEY_PREFIX and PDL_CREDIT_USD constants with well-documented pricing source.

Sequence Diagram

sequenceDiagram
    participant User
    participant ToolExecutor as Tool Executor (index.ts)
    participant HKRL as HostedKeyRateLimiter
    participant API as Hunter.io / PDL API

    User->>ToolExecutor: Execute tool (no BYOK key)
    ToolExecutor->>HKRL: acquireKey(provider, envKeyPrefix, config, workspaceId)
    HKRL-->>ToolExecutor: key (round-robin selected)
    ToolExecutor->>API: HTTP request (hosted key injected)
    API-->>ToolExecutor: Response
    ToolExecutor->>ToolExecutor: transformResponse → output
    ToolExecutor->>ToolExecutor: applyHostedKeyCostToResult()
    ToolExecutor->>ToolExecutor: pricing.getCost(params, output) → cost
    ToolExecutor->>HKRL: reportUsage (custom dimensions only)
    ToolExecutor-->>User: output + cost metadata
Loading

Reviews (1): Last reviewed commit: "Merge branch 'staging' into feat/hosted-..." | Re-trigger Greptile

Comment on lines +26 to +28
if (!Array.isArray(results)) {
throw new Error('PDL person search response missing results, cannot determine cost')
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Asymmetric error handling between getCost and extractUsage

getCost throws when results is not an array, but the paired extractUsage lambda in the same rateLimit.dimensions block returns 0 gracefully for the exact same condition. If output.results were ever missing (e.g., after a future transformResponse refactor), extractUsage would silently record 0 credits while getCost would throw — causing the tool to return an error to the caller even though the upstream API call already succeeded. The same inconsistency appears in company_search.ts and, in slightly different form, in domain_search.ts (emails) and person_identify.ts (matches). countBulkMatched in the bulk tools avoids this by returning 0 rather than throwing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant