[CUS-12104] extracting content from file based on no of spaces/tabs etc.#382
[CUS-12104] extracting content from file based on no of spaces/tabs etc.#382ManojTestsigma wants to merge 1 commit intodevfrom
Conversation
📝 WalkthroughWalkthroughThis PR introduces file handling and text extraction capabilities for a Testsigma addon by adding Maven configuration and four new Java classes: utility helpers for file operations and text extraction, plus two Windows Advanced actions that leverage these utilities to extract text between specified words or at delimiter-based positions from files. Changes
Sequence DiagramsequenceDiagram
participant TestEngine
participant Action as ExtractText Action
participant TextHelper as TextExtractionHelper
participant FileHelper as FileHelper
participant FileSystem as File/URL
TestEngine->>Action: execute(startWord, endWord, filePath, varName)
activate Action
Action->>Action: Validate inputs (non-empty)
alt Validation fails
Action->>TestEngine: FAILED
else Validation passes
Action->>TextHelper: extractTextBetweenWords(filePath, startWord, endWord)
activate TextHelper
TextHelper->>FileHelper: urlToFileConverter(filePath)
activate FileHelper
FileHelper->>FileSystem: Download/resolve file
FileSystem-->>FileHelper: File reference
deactivate FileHelper
TextHelper->>TextHelper: Read file content
TextHelper->>TextHelper: Strip HTML tags (if HTML file)
TextHelper->>TextHelper: Search between startWord and endWord
TextHelper-->>Action: Extracted text (or null)
deactivate TextHelper
alt Extraction returns null
Action->>TestEngine: FAILED
else Extraction succeeds
Action->>Action: Store in runtime variables
Action->>TestEngine: SUCCESS
end
end
deactivate Action
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@file_actions/src/main/java/com/testsigma/addons/utils/FileHelper.java`:
- Around line 25-29: The temp file created in FileHelper (tempFile created via
File.createTempFile and populated with FileUtils.copyURLToFile) is never
deleted, causing disk leaks; change the API to make ownership explicit by
replacing the current return of File tempFile with an AutoCloseable wrapper
(e.g., TempFileResource) that holds the File and implements close() to delete
the file, update FileHelper to return that TempFileResource instead of File
(move File.createTempFile and copy into the wrapper), and update callers to use
try-with-resources on TempFileResource (or, as a simpler alternative, call
tempFile.deleteOnExit() and clearly document that callers must delete the
file)—ensure references to FileHelper, tempFile, File.createTempFile,
FileUtils.copyURLToFile and the method signature are updated accordingly.
- Around line 15-26: The code in FileHelper.java currently accepts arbitrary
http(s) URLs and downloads them without validation, timeouts, or size checks;
update the URL-handling branch that creates URL/temporary File (the block using
URL urlObject, baseName/extension, File.createTempFile and
FileUtils.copyURLToFile) to: 1) validate the URL scheme and host against an
allowlist/denylist to prevent SSRF, 2) perform a HEAD request or open a
connection to check Content-Length and reject large files above a configured
MAX_BYTES, 3) use the FileUtils.copyURLToFile overload with connection and read
timeout values (or wrap URLConnection with setConnectTimeout/setReadTimeout),
and 4) stream the download with a guarded byte-count (abort and delete tempFile
if the cap is exceeded) so the file is never fully buffered in memory or allowed
to exceed disk limits. Ensure errors close streams and clean up tempFile on
failure.
In
`@file_actions/src/main/java/com/testsigma/addons/utils/TextExtractionHelper.java`:
- Around line 117-119: The switch on delimiterType in TextExtractionHelper
currently calls delimiterType.trim() before matching, which makes the literal
tab case ("\\t" / "\t") unreachable; change the logic to inspect the
raw/untrimmed delimiter string first (check for the literal "\t" case) before
calling trim()—mirror how literal-space is handled—and apply the same fix to the
other occurrence later in the file (the second switch around lines 127-134) so
literal tab input is recognized.
- Around line 47-49: The logger currently emits sensitive file contents and
extracted text (variables like content and extracted) inside
TextExtractionHelper; remove or redact these detailed payloads and instead log
only non-sensitive metadata (e.g., original content length, extracted substring
length, delimiter/type used, and success/failure) from the methods that perform
substring extraction and file reading (look for uses of variables named content,
extracted and the logger.info calls in TextExtractionHelper). Replace calls that
print the actual content with messages such as "extraction succeeded:
extractedLength=X, sourceLength=Y, delimiter=Z" or "extraction failed: reason",
ensuring no raw file data or PII is logged.
In
`@file_actions/src/main/java/com/testsigma/addons/windowsAdvanced/ExtractTextInBetweenWordsFromFile.java`:
- Around line 68-71: The current logger.info call in the
ExtractTextInBetweenWordsFromFile action logs the raw extracted text
(extractedText) which may be sensitive; update the logging to avoid printing the
value and instead log the variable name (variableNameStr) and the length of the
extractedText (e.g., extractedText.length()) before storing it in runTimeData
via runTimeData.setKey(...) and runTimeData.setValue(...); locate the
logger.info usage that references variableNameStr and extractedText and replace
the message accordingly while keeping runTimeData interactions unchanged.
- Around line 41-52: The code in ExtractTextInBetweenWordsFromFile dereferences
startWord, endWord, filePath and variableName before validation and only checks
isEmpty(), which allows whitespace-only values and causes NPEs for nulls; change
the flow to first normalize each input by checking for null, calling toString()
safely, then trim() to remove surrounding whitespace (e.g., compute
normalizedStart = startWord==null? "": startWord.getValue()==null? "":
startWord.getValue().toString().trim()), perform validation against the
trimmed/normalized strings, and only after validation log safe metadata (avoid
logging raw values) using the normalized variables; update uses of
startWordStr/endWordStr/filePathStr/variableNameStr accordingly.
In
`@file_actions/src/main/java/com/testsigma/addons/windowsAdvanced/ExtractWordByDelimiterPositionFromFile.java`:
- Around line 25-27: The TestData annotation on the field delimiterType has
allowedValues that don't match what resolveDelimiter expects, preventing UI
aliases from reaching the resolver; update the allowedValues for delimiterType
(and the other analogous TestData occurrence around lines 68-72) to include both
backend tokens and common aliases — e.g. include "," and "comma", "period" and
".", "\t" and "tab", "multispace" and "multi-space"/"space" — and adjust the
description string to list the same aliases so the UI can send values that
resolveDelimiter can handle.
- Around line 87-91: The current code stores the extracted token in runTimeData
(runTimeData.setKey / setValue) but then prints the raw value via logger.info
and setSuccessMessage; remove the sensitive value from logs and messages by
changing logger.info to only mention the variable name and action (e.g., "Stored
extracted word in variable 'variableNameStr'") and adjust setSuccessMessage
likewise to omit extractedWord (e.g., "Successfully extracted word at position X
using delimiter 'Y' and stored in variable 'variableNameStr'"); keep storing
extractedWord in runTimeData but never echo extractedWord in logger.info or
setSuccessMessage to avoid exposing sensitive data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8053317c-f510-48af-ad49-08a078d40698
📒 Files selected for processing (5)
file_actions/dependency-reduced-pom.xmlfile_actions/src/main/java/com/testsigma/addons/utils/FileHelper.javafile_actions/src/main/java/com/testsigma/addons/utils/TextExtractionHelper.javafile_actions/src/main/java/com/testsigma/addons/windowsAdvanced/ExtractTextInBetweenWordsFromFile.javafile_actions/src/main/java/com/testsigma/addons/windowsAdvanced/ExtractWordByDelimiterPositionFromFile.java
please review this addon and publish as PUBLIC
Addon name : file actions
Addon accont: https://jarvis.testsigma.com/ui/tenants/3072/addons
Jira: https://testsigma.atlassian.net/browse/CUS-12104
fix
Added 2 NLPS
Summary by CodeRabbit
Release Notes