fix(zoom): iteratively strip tags in transcript parser to close incomplete-sanitization gap#4745
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Exports Reviewed by Cursor Bugbot for commit 79c057e. Configure here. |
Greptile SummaryThis PR hardens the WebVTT transcript parser in the Zoom connector against a CodeQL "Incomplete multi-character sanitization" finding by replacing a single-pass tag-stripping regex with a do-while loop that re-strips until the string reaches a stable fixed point. Regression tests are added to cover the hardened path.
Confidence Score: 5/5Safe to merge — the change is tightly scoped to a single pure function with no side effects, and the new tests verify both normal operation and the adversarial inputs the fix targets. The do-while loop terminates correctly because each pass can only shorten or preserve the string, guaranteeing convergence. The regex No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Raw cue text lines joined] --> B[Apply voice-tag regex\n replace v Speaker text /v → Speaker: text]
B --> C[withoutTags = withSpeakers]
C --> D{Loop iteration:\nprevious = withoutTags\nwithoutTags = strip tags}
D --> E{withoutTags === previous?}
E -- No, tags were found --> D
E -- Yes, stable fixed-point --> F[Collapse whitespace + trim]
F --> G{Non-empty?}
G -- Yes --> H[Push to segments]
G -- No --> I[Skip]
Reviews (2): Last reviewed commit: "test(zoom): cover iterative sanitization..." | Re-trigger Greptile |
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 79c057e. Configure here.
Summary
apps/sim/connectors/zoom/zoom.ts(referenced in the v0.6.91 release PR v0.6.91: file zoom, Zoom KB connector, error classifications, LiteLLM support, executor code cleanup #4743)..replace(/<\/?[^>]+>/g, '')with a stable-fixed-point loop that re-strips until no further tags can be found. Prevents crafted speaker-name inputs (e.g. nested or overlapping tags) from surviving sanitization.Why
Speaker labels in WebVTT transcripts originate from Zoom meeting attendees' display names — user-controllable content. A single-pass tag regex can be bypassed by inputs that, once partially stripped, reconstruct a tag from the surrounding fragments. Iterating until the string stabilises guarantees no nested tag layers remain.
Type of Change
Testing
Checklist