Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
b8f0dd5
feat(connectors): add 7 knowledge base connectors (Google Forms, Type…
waleedlatif1 Jun 4, 2026
4d84cb6
fix(connectors): tighten listingCapped semantics per review (WIQL cap…
waleedlatif1 Jun 4, 2026
597d408
fix(connectors): google-forms listingCapped must fire on slice regard…
waleedlatif1 Jun 4, 2026
f87d05d
fix(connectors): s3 streaming size cap for chunked responses without …
waleedlatif1 Jun 4, 2026
6b66855
fix(connectors): ado byte-exact file content fetch, google-forms hash…
waleedlatif1 Jun 4, 2026
8355b80
fix(connectors): ado auth-failure deletion guard, jsm last-page slice…
waleedlatif1 Jun 4, 2026
3823396
fix(connectors): shared streaming size-cap reader for ado file hydrat…
waleedlatif1 Jun 4, 2026
b24b223
fix(knowledge): flag incomplete listings at engine level when paginat…
waleedlatif1 Jun 4, 2026
5f4516b
fix(connectors): ado flags listing incomplete when a non-empty repo h…
waleedlatif1 Jun 4, 2026
1f1f1af
fix(knowledge): engine truncation flag is an absolute deletion block …
waleedlatif1 Jun 4, 2026
847a499
improvement(knowledge): extract shouldReconcileDeletions gate as test…
waleedlatif1 Jun 4, 2026
fa66527
test(connectors): mapTags coverage for the 7 new connectors
waleedlatif1 Jun 4, 2026
9bcd613
fix(connectors): ado probes past the wiql 20k cap before flagging; do…
waleedlatif1 Jun 4, 2026
4766fb3
fix(connectors): ado flags partial repo trees when items listing emit…
waleedlatif1 Jun 4, 2026
3bfec6e
fix(connectors): ado discards foreign-phase cursors; google-forms sca…
waleedlatif1 Jun 4, 2026
c16e94a
fix(connectors): audit fixes across new connectors
waleedlatif1 Jun 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .claude/commands/add-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,24 @@ const response = await fetchWithRetry(url, { ... }, VALIDATE_RETRY_OPTIONS)

If `ExternalDocument.sourceUrl` is set, the sync engine stores it on the document record. Always construct the full URL (not a relative path).

## Capped or Incomplete Listings — `syncContext.listingCapped` (REQUIRED)

If `listDocuments` can ever return **less than the full source set** on a non-incremental sync — a `maxItems`/`maxDocuments`-style cap, or a transient per-item error that drops a still-existing document from the listing — it MUST set `syncContext.listingCapped = true` when that happens.

The sync engine reconciles deletions by comparing the full listing against stored documents: anything not seen is **hard-deleted** (sync-engine.ts, gated on `!syncContext?.listingCapped`). A truncated listing without this flag deletes every real document beyond the cap. This was the single most common bug found when auditing connectors — do not omit it.

```typescript
if (hitLimit && syncContext) {
syncContext.listingCapped = true
}
```

Rules:
- Set it when a user-configured cap truncates the listing while more documents exist
- Set it when a thrown error caused a still-present document to be skipped during listing
- Do NOT set it when the source is genuinely exhausted (deleted documents must still reconcile)
- Do NOT set it for intentional scope filters (e.g. a date cutoff) — out-of-scope documents should be reconciled normally

## Sync Engine Behavior (Do Not Modify)

The sync engine (`lib/knowledge/connectors/sync-engine.ts`) is connector-agnostic. It:
Expand Down Expand Up @@ -515,6 +533,7 @@ export const CONNECTOR_REGISTRY: ConnectorRegistry = {
- `dependsOn` references selector field IDs (not `canonicalParamId`)
- Dependency `canonicalParamId` values exist in `SELECTOR_CONTEXT_FIELDS`
- [ ] `listDocuments` handles pagination with metadata-based content hashes
- [ ] `syncContext.listingCapped = true` set whenever the listing is truncated (max-items cap or transient per-item error) — required to prevent the engine's deletion reconciliation from removing unseen documents
- [ ] `contentDeferred: true` used if content requires per-doc API calls (file download, export, blocks fetch)
- [ ] `contentHash` is metadata-based (not content-based) and identical between stub and `getDocument`
- [ ] `sourceUrl` set on each ExternalDocument (full URL, not relative)
Expand Down
7 changes: 7 additions & 0 deletions .claude/commands/validate-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,13 @@ For each API endpoint the connector calls:
- [ ] No off-by-one errors in pagination tracking
- [ ] The connector does NOT hit known API pagination limits silently (e.g., HubSpot search 10k cap)

### Deletion-Reconciliation Safety (`listingCapped`) — CRITICAL
The sync engine hard-deletes any stored document absent from a full listing. Audit every path where `listDocuments` can return less than the full source set:
- [ ] `syncContext.listingCapped = true` is set when a `maxItems`-style cap truncates the listing while more documents exist
- [ ] `listingCapped` is set when a transient per-item error drops a still-existing document from the listing
- [ ] `listingCapped` is NOT set when the source is genuinely exhausted (deleted documents must reconcile) or for intentional scope filters (date cutoffs)
This is the most common connector bug class — verify it explicitly against `sync-engine.ts`'s reconciliation gate.

### Pagination State Across Pages
- [ ] `syncContext` is used to cache state across pages (user names, field maps, instance URLs, portal IDs, etc.)
- [ ] Cached state in `syncContext` is correctly initialized on first page and reused on subsequent pages
Expand Down
7 changes: 7 additions & 0 deletions apps/docs/app/global.css
Original file line number Diff line number Diff line change
Expand Up @@ -510,6 +510,13 @@ figure[data-rehype-pretty-code-figure],
max-width: 480px !important;
}

/* Search dialog overlay + panel must cover the sticky navbar — both default to z-50,
and the navbar wins the tie by DOM order, leaving it unblurred above the overlay */
.bg-fd-overlay,
[role="dialog"][data-state] {
z-index: 60 !important;
}

pre {
font-size: 0.875rem;
line-height: 1.7;
Expand Down
29 changes: 20 additions & 9 deletions apps/docs/content/docs/en/knowledgebase/connectors.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,23 @@ Connectors continuously sync documents from external services into your knowledg

<Image src="/static/connectors/connectors-sources.png" alt="Connect Source picker showing a searchable list of available connectors including Airtable, Asana, Confluence, Discord, Dropbox, Evernote, Fireflies, GitHub, and Gmail" width={800} height={500} />

Sim ships with 30 built-in connectors:
Sim ships with 49 built-in connectors:

| Category | Connectors |
|----------|-----------|
| **Productivity** | Notion, Confluence, Asana, Linear, Jira, Google Calendar, Google Sheets |
| **Cloud Storage** | Google Drive, Dropbox, OneDrive, SharePoint |
| **Documents** | Google Docs, WordPress, Webflow |
| **Development** | GitHub |
| **Communication** | Slack, Discord, Microsoft Teams, Reddit |
| **Productivity** | Notion, Confluence, Asana, Linear, Jira, Jira Service Management, Monday, Google Calendar, Google Sheets, Google Forms, Typeform |
| **Cloud Storage** | Google Drive, Dropbox, OneDrive, SharePoint, Amazon S3 |
| **Documents** | Google Docs, WordPress, Webflow, DocuSign |
| **Development** | GitHub, GitLab, Azure DevOps, Sentry |
| **Communication** | Slack, Discord, Microsoft Teams, Reddit, YouTube |
| **Email** | Gmail, Outlook |
| **CRM** | HubSpot, Salesforce |
| **Support** | Intercom, ServiceNow, Zendesk |
| **Incident Management** | incident.io, Rootly |
| **Data** | Airtable |
| **Note-taking** | Evernote, Obsidian |
| **Meetings** | Fireflies |
| **Meetings** | Zoom, Gong, Grain, Granola, Fathom, Fireflies |
| **Recruiting** | Greenhouse, Ashby |

## Adding a Connector

Expand All @@ -41,13 +43,18 @@ From inside a knowledge base, click **+ New connector** in the top right to open

Most connectors use **OAuth** — select an existing credential from the dropdown or click **Connect new account** to authorize through the service. Tokens are refreshed automatically.

A few connectors use **API keys** instead:
Other connectors use **API keys** or **personal access tokens** instead. The setup modal tells you which credential each connector expects — for example:

| Connector | Where to get the key |
|-----------|---------------------|
| **Evernote** | Developer Token (starts with `S=`) from your Evernote account settings |
| **Obsidian** | Install the [Local REST API](https://github.com/coddingtonbear/obsidian-local-rest-api) plugin, then copy the key from its settings |
| **Fireflies** | Generate from the Integrations page in your Fireflies account |
| **Typeform** | Personal access token from your Typeform account settings |
| **Azure DevOps** | Personal access token with Wiki (Read), Work Items (Read), and Code (Read) scopes |
| **YouTube** | YouTube Data API key from the Google Cloud Console |
| **Amazon S3** | Secret Access Key (the Access Key ID, region, and bucket are entered as config fields) |
| **Sentry** | Auth token with `project:read` and `event:read` scopes |

<Callout type="info">
If you rotate an API key in the external service, update it in Sim as well — OAuth tokens refresh automatically, but API keys do not.
Expand All @@ -63,6 +70,10 @@ Each connector has source-specific fields that control what gets synced. Example
- **Notion** — sync an entire workspace, a specific database, or a single page tree
- **GitHub** — specify a repository, branch, and optional file extension filter
- **Confluence** — enter your Atlassian domain and optionally filter by space key or content type
- **Azure DevOps** — choose what to sync (wiki pages, work items, repository files, or all), with optional work item type/state filters, a custom WIQL query, and repository/branch/path filters
- **Amazon S3** — point at a bucket with an optional key prefix and a customizable file extension allowlist; S3-compatible stores (Cloudflare R2, MinIO) are supported via a custom endpoint
- **YouTube** — sync a channel (by `@handle` or ID) or playlist, with an optional published-after date filter and the option to exclude Shorts
- **Sentry** — filter issues by search query (e.g. `is:unresolved`), environment, and time window; self-hosted Sentry is supported via a custom host
- **Obsidian** — provide your vault URL (`https://127.0.0.1:27124` by default) and optionally restrict to a folder path
- **Fireflies** — optionally filter by host email or cap the number of transcripts synced

Expand Down Expand Up @@ -188,5 +199,5 @@ You can add as many connectors as you need to a single knowledge base. Each mana
{ question: "What happens when I delete a connector?", answer: "The connector is removed and future syncs stop. You're given the option to also delete all documents that were synced by that connector. If you don't check that option, they stay in the knowledge base as-is." },
{ question: "What does the Disabled status mean?", answer: "After 10 consecutive full-sync failures, the connector is automatically disabled to stop retrying. Reconnect the OAuth account or click Resume to re-enable it." },
{ question: "Do metadata tags count against a limit?", answer: "Yes. Tag slots are shared across all documents in a knowledge base — 17 slots total. Multiple connectors draw from the same pool, so plan accordingly if several connectors each auto-populate tags." },
{ question: "Do I need to re-authenticate connectors?", answer: "OAuth connectors refresh tokens automatically. API key connectors (Evernote, Obsidian, Fireflies) need manual updates if you rotate the key in the external service." },
{ question: "Do I need to re-authenticate connectors?", answer: "OAuth connectors refresh tokens automatically. API key and personal access token connectors need manual updates if you rotate the credential in the external service." },
]} />
2 changes: 1 addition & 1 deletion apps/docs/content/docs/en/mothership/knowledge.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ For knowledge bases that should stay current automatically, connectors sync cont

Connectors are configured through the knowledge base settings, not through Mothership chat. Once connected, all synced content is immediately searchable by Mothership and by any Agent block with the knowledge base attached.

Sim ships with 30 built-in connectors, including Notion, Google Drive, Slack, GitHub, Confluence, HubSpot, Salesforce, Gmail, and more.
Sim ships with 49 built-in connectors, including Notion, Google Drive, Slack, GitHub, Confluence, HubSpot, Salesforce, Gmail, and more.

Examples of what you can sync:

Expand Down
27 changes: 26 additions & 1 deletion apps/sim/connectors/ashby/ashby.ts
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,32 @@ function renderFeedbackValue(value: unknown): string {

/**
* Stable, metadata-based content hash for a candidate document. Identical between the
* listing stub and the fully-fetched document so unchanged candidates are skipped.
* listing stub and the fully-fetched document so unchanged candidates are skipped,
* which keeps the `getDocument` re-hydration (notes + feedback fetches) cheap: the
* sync engine only re-hydrates a deferred stub when this hash differs from the stored
* document's hash (see `lib/knowledge/connectors/sync-engine.ts`).
*
* Known limitation — notes/feedback freshness depends on `candidate.updatedAt`.
* Candidate notes (`candidate.listNotes`) and interview feedback
* (`applicationFeedback.list`) are separate Ashby objects, not candidate fields. This
* hash is derived solely from the candidate's own `updatedAt`, so a new note or newly
* submitted feedback is only re-synced if Ashby advances `candidate.updatedAt` as a
* side effect of that write.
*
* As of this writing Ashby's public API docs do not specify what counts as a
* "modification" for `candidate.updatedAt` or for `candidate.list` syncToken
* incremental sync, and no third-party ATS-integration vendor (Merge, Nango, Knit)
* documents it either — so this behavior is unverified. If Ashby does NOT touch
* `candidate.updatedAt` on note/feedback writes, those additions will not be picked up
* until some other candidate field changes; a forced full sync re-hydrates everything
* regardless. No cheaper listing-time signal exists to fold into this hash: the
* `candidate.list` object exposes no note/feedback count, and syncToken carries the
* same unspecified change semantics as `updatedAt`.
*
* Refs:
* - https://developers.ashbyhq.com/reference/candidatelist
* - https://developers.ashbyhq.com/reference/candidatecreatenote
* - https://developers.ashbyhq.com/docs/pagination-and-incremental-sync
*/
function buildContentHash(id: string, updatedAt: string | null): string {
return `ashby:${id}:${updatedAt ?? ''}`
Expand Down
Loading
Loading