feat(logs): redact PII from workflow logs via configurable rules#5136
Conversation
Enterprise PII redaction for workflow execution logs, configured under Data Retention as org-scoped rules (each rule picks entity types + which workspaces it applies to). Reuses the guardrails Presidio engine in mask mode at the log-persist choke point, with a check-digit-validated VIN recognizer. Also adds per-workspace data-retention-hours overrides.
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview The Data Retention settings page gains a PII Redaction section—modals with a searchable Presidio entity picker, immediate saves for rules, and unsaved-changes handling—while retention-hour saves stay separate. Form hydration is keyed to the active org so switches do not cross-save. At log completion, Supporting pieces: shared client-safe PII entity catalog, Reviewed by Cursor Bugbot for commit ad7d1fa. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
@greptile review |
…rkflow-log # Conflicts: # packages/db/migrations/meta/0241_snapshot.json # packages/db/migrations/meta/_journal.json # scripts/check-api-validation-contracts.ts
Greptile SummaryThis PR adds enterprise-grade PII redaction for workflow execution logs, applied at the single
Confidence Score: 4/5Safe to merge after addressing the orphaned-override visibility bug in the settings UI. The redaction engine, Presidio integration, VIN recognizer, and API contract changes are all solid. The one concrete defect is in the settings UI: deleting the default (all-workspaces) rule while per-workspace override rules exist removes those overrides from view entirely — they stay active in the database and keep redacting PII, but an admin has no way to see, edit, or delete them without first re-adding a default rule. This creates a control-plane blind spot for enterprise privacy configuration that should be fixed before shipping. apps/sim/ee/data-retention/components/data-retention-settings.tsx — the Workspace overrides section gating logic needs attention. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant E as ExecutionLogger
participant DB as Database
participant R as redactPIIFromExecution
participant P as maskPIIBatch (validate_pii.ts)
participant PY as validate_pii.py (Presidio)
E->>DB: SELECT orgSettings FROM workspace LEFT JOIN organization
DB-->>E: orgSettings (DataRetentionSettings)
E->>E: resolveEffectivePiiRedaction(orgSettings, workspaceId)
alt "enabled = false"
E-->>E: return payload unchanged
else "enabled = true"
E->>R: redactPIIFromExecution(payload, entityTypes)
R->>R: collect all eligible strings (two-pass walk)
alt "totalBytes > 16 MB"
R-->>R: scrub all strings with REDACTION_FAILED
else within budget
R->>P: maskPIIBatch(texts, entityTypes) chunked 256 KB each
loop for each chunk
P->>PY: spawn subprocess, write JSON to stdin
PY-->>P: __SIM_RESULT__ passed results
end
P-->>R: masked string array
end
R->>R: substitute masked strings back into payload structure
R-->>E: redacted RedactablePayload
end
E->>DB: INSERT/UPDATE workflowExecutionLogs (redacted data)
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant E as ExecutionLogger
participant DB as Database
participant R as redactPIIFromExecution
participant P as maskPIIBatch (validate_pii.ts)
participant PY as validate_pii.py (Presidio)
E->>DB: SELECT orgSettings FROM workspace LEFT JOIN organization
DB-->>E: orgSettings (DataRetentionSettings)
E->>E: resolveEffectivePiiRedaction(orgSettings, workspaceId)
alt "enabled = false"
E-->>E: return payload unchanged
else "enabled = true"
E->>R: redactPIIFromExecution(payload, entityTypes)
R->>R: collect all eligible strings (two-pass walk)
alt "totalBytes > 16 MB"
R-->>R: scrub all strings with REDACTION_FAILED
else within budget
R->>P: maskPIIBatch(texts, entityTypes) chunked 256 KB each
loop for each chunk
P->>PY: spawn subprocess, write JSON to stdin
PY-->>P: __SIM_RESULT__ passed results
end
P-->>R: masked string array
end
R->>R: substitute masked strings back into payload structure
R-->>E: redacted RedactablePayload
end
E->>DB: INSERT/UPDATE workflowExecutionLogs (redacted data)
Reviews (11): Last reviewed commit: "fix(logs): re-hydrate data-retention for..." | Re-trigger Greptile |
Greptile SummaryThis PR adds enterprise-grade PII redaction for workflow execution logs, a new per-workspace data-retention-hours override, and a check-digit-validated VIN recognizer. The core redaction pipeline is well-engineered: a deterministic two-pass collect/substitute approach batches all strings from a single execution into one chunked Presidio call, and a fail-safe ensures PII is never persisted when masking fails.
Confidence Score: 3/5The redaction pipeline and workspace-override logic are structurally sound, but three copies of a wrong comment on a security-sensitive field create a genuine risk of operators deploying empty-entity-type rules under the false impression that they cover all PII. The core machinery — two-pass collect/substitute, byte-chunked Presidio calls, fail-safe scrubbing, and the workspace ?? org ?? plan-default retention resolution — is implemented correctly. Three conflicting entityTypes comments in schema.ts, primitives.ts, and pii-redaction.ts say the opposite of what retention.ts implements. An enterprise admin who follows those comments and creates a rule with no entity types selected will have no PII masked at all — a silent failure in a privacy-protection feature. packages/db/schema.ts, apps/sim/lib/api/contracts/primitives.ts, and apps/sim/lib/logs/execution/pii-redaction.ts all carry the wrong entityTypes comment that contradicts retention.ts. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant E as Executor
participant L as ExecutionLogger
participant R as retention.ts
participant P as pii-redaction.ts
participant V as validate_pii.ts
participant Py as validate_pii.py
participant DB as Postgres
E->>L: completeWorkflowExecution
L->>L: filterForDisplay + redactApiKeys
L->>DB: SELECT org.dataRetentionSettings
DB-->>L: orgSettings
L->>R: resolveEffectivePiiRedaction
R-->>L: enabled + entityTypes
alt PII enabled
L->>DB: isWorkspaceOnEnterprisePlan
DB-->>L: true
L->>P: redactPIIFromExecution
P->>P: collect string leaves
loop Per 256KB chunk
P->>V: maskPIIBatch
V->>Py: spawn subprocess
Py-->>V: masked strings
V-->>P: masked strings
end
P-->>L: redacted payload
end
L->>DB: UPDATE workflowExecutionLogs
L-->>E: WorkflowExecutionLog
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant E as Executor
participant L as ExecutionLogger
participant R as retention.ts
participant P as pii-redaction.ts
participant V as validate_pii.ts
participant Py as validate_pii.py
participant DB as Postgres
E->>L: completeWorkflowExecution
L->>L: filterForDisplay + redactApiKeys
L->>DB: SELECT org.dataRetentionSettings
DB-->>L: orgSettings
L->>R: resolveEffectivePiiRedaction
R-->>L: enabled + entityTypes
alt PII enabled
L->>DB: isWorkspaceOnEnterprisePlan
DB-->>L: true
L->>P: redactPIIFromExecution
P->>P: collect string leaves
loop Per 256KB chunk
P->>V: maskPIIBatch
V->>Py: spawn subprocess
Py-->>V: masked strings
V-->>P: masked strings
end
P-->>L: redacted payload
end
L->>DB: UPDATE workflowExecutionLogs
L-->>E: WorkflowExecutionLog
Reviews (2): Last reviewed commit: "Merge remote-tracking branch 'origin/sta..." | Re-trigger Greptile |
|
@greptile review |
…t lazy - Extend PII redaction to span error/errorMessage/toolCalls and top-level error/completionFailure/trigger/executionState (Bugbot: PII in execution metadata). executionState is safe to redact — resume reads from the separate pausedExecutions table, not the log copy. - Lazy-import validate_pii in pii-redaction so the Python/child_process guardrails module stays out of the static middleware/RSC graph. - Type the org retention mutation to the contract body (optional, non-null).
…stays org-scoped - Remove the unused per-workspace data-retention-hours override (no UI; superseded by workspace-scoped PII rules). Reverts cleanup-dispatcher to org-only retention, drops resolveEffectiveRetentionHours, the workspace.dataRetentionSettings column + migration, and the workspace data-retention route/contract/hooks. Fixes Bugbot's null-as-unset finding by removing the buggy path entirely; org retention behavior is unchanged. - Stop re-checking isWorkspaceOnEnterprisePlan at persist time (it returns false on transient errors, which would fail-open and leak PII). Enabled rules already imply entitlement; redact whenever rules apply (fail-safe).
|
@greptile review |
- Drop the per-string size cap in PII redaction: oversized strings were left unmasked (leak). Nothing is skipped now; large payloads still fail-safe via the total-bytes ceiling + per-chunk timeout (scrub, never leak). - Add executionData.environment (incl. variables) to the redaction set.
|
@greptile review |
…tion Each rule now targets one scope — all workspaces (workspaceId: null) or a single workspace — with workspaceId unique across rules. Resolution is most-specific-wins (a workspace's own rule overrides the all rule), not union; an empty specific rule exempts that workspace. Matches Access Control's resolveWorkspaceGroup precedence. UI 'Applies to' becomes a single-select; Add rule disables when all scopes are taken.
|
@greptile review |
Reshape the PII redaction settings into a 'Default (all workspaces)' block plus a 'Workspace overrides' list, making the most-specific-wins precedence explicit (overrides replace the default; unlisted workspaces use it). Same data model (workspaceId null = default), UI only.
Drop the uppercase section labels and the overrides description; gate the Workspace overrides section behind a configured default; use a single Delete action; 'Add redaction' creates the all-workspaces default and disappears once set.
|
@greptile review |
Attach an 'error' listener to the child's stdin in both runPythonScript (the batch masking hot path) and executePythonPIIDetection. A 256KB chunk can exceed the OS pipe buffer, so if the Python process exits mid-read (OOM/kill) the EPIPE emitted on stdin was unhandled and would crash the Node process. Funnel it into the promise rejection so the fail-safe scrub path handles it gracefully.
The top-level correlation field is copied from pre-redaction trigger data, so webhook/schedule correlation values could persist unredacted. Add it to the redaction set alongside trigger/environment.
|
@greptile review |
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
1 issue from previous review remains unresolved.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 7313e18. Configure here.
The contract accepted multiple rules with the same workspaceId (or several null all-rules); resolution is first-match, so duplicates could disagree with the UI. Add a schema refine rejecting duplicate scopes.
|
@greptile review |
2 similar comments
|
@greptile review |
|
@greptile review |
|
@BugBot review |
The form hydrated once via a boolean ref, so switching the active org left stale retention days + PII rules and saves targeted the new org with old config. Key hydration on orgId so it re-loads per org.
|
@greptile review |
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit ad7d1fa. Configure here.

Summary
maskmode, applied at the single log-persist choke point (completeWorkflowExecution), covering both the inline and externalized write paths.structuredClone), and fail-safe (scrubs rather than leaks on error).workspace ?? org ?? plan default) via a new nullableworkspace.data_retention_settingscolumn.ChipModal(grouped checkbox grid for entity types, workspace multiselect), persisting on save with an unsaved-changes guard.Type of Change
Testing
bun run lint,bun run check:api-validation:strict, andbun run check:migrations origin/stagingall pass. Migration is a single additive nullable column (expand-phase, backward-compatible).Checklist