Skip to content

[CUS-12104] extracting content from file based on no of spaces/tabs etc.#382

Open
ManojTestsigma wants to merge 1 commit intodevfrom
CUS-12104
Open

[CUS-12104] extracting content from file based on no of spaces/tabs etc.#382
ManojTestsigma wants to merge 1 commit intodevfrom
CUS-12104

Conversation

@ManojTestsigma
Copy link
Copy Markdown
Contributor

@ManojTestsigma ManojTestsigma commented Apr 20, 2026

please review this addon and publish as PUBLIC

Addon name : file actions
Addon accont: https://jarvis.testsigma.com/ui/tenants/3072/addons
Jira: https://testsigma.atlassian.net/browse/CUS-12104

fix

Added 2 NLPS

  1. Extracts text in between two words.
  2. extracts text by skipping given delimiter.

Summary by CodeRabbit

Release Notes

  • New Features
    • Extract text between specified start and end words from files
    • Extract words at specific positions using delimiter-based splitting (comma, period, tab, space, multi-space separators)
    • Added support for both local and remote file processing

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

This PR introduces file handling and text extraction capabilities for a Testsigma addon by adding Maven configuration and four new Java classes: utility helpers for file operations and text extraction, plus two Windows Advanced actions that leverage these utilities to extract text between specified words or at delimiter-based positions from files.

Changes

Cohort / File(s) Summary
Maven Configuration
file_actions/dependency-reduced-pom.xml
Declares Maven project coordinates, Java 11 target, build plugins (shade, source), and test-scoped JUnit Jupiter dependency with transitive exclusions.
File & Text Extraction Utilities
file_actions/src/main/java/com/testsigma/addons/utils/FileHelper.java, file_actions/src/main/java/com/testsigma/addons/utils/TextExtractionHelper.java
Adds utility methods for URL-to-file conversion, HTML file detection, HTML tag stripping, and text extraction between words or at delimiter-specified positions; includes delimiter resolution and validation helpers.
Windows Advanced Actions
file_actions/src/main/java/com/testsigma/addons/windowsAdvanced/ExtractTextInBetweenWordsFromFile.java, file_actions/src/main/java/com/testsigma/addons/windowsAdvanced/ExtractWordByDelimiterPositionFromFile.java
Implements two new test actions with input validation, file/URL resolution, text extraction calls, runtime variable storage, and comprehensive error handling and logging.

Sequence Diagram

sequenceDiagram
    participant TestEngine
    participant Action as ExtractText Action
    participant TextHelper as TextExtractionHelper
    participant FileHelper as FileHelper
    participant FileSystem as File/URL

    TestEngine->>Action: execute(startWord, endWord, filePath, varName)
    activate Action
    Action->>Action: Validate inputs (non-empty)
    alt Validation fails
        Action->>TestEngine: FAILED
    else Validation passes
        Action->>TextHelper: extractTextBetweenWords(filePath, startWord, endWord)
        activate TextHelper
        TextHelper->>FileHelper: urlToFileConverter(filePath)
        activate FileHelper
        FileHelper->>FileSystem: Download/resolve file
        FileSystem-->>FileHelper: File reference
        deactivate FileHelper
        TextHelper->>TextHelper: Read file content
        TextHelper->>TextHelper: Strip HTML tags (if HTML file)
        TextHelper->>TextHelper: Search between startWord and endWord
        TextHelper-->>Action: Extracted text (or null)
        deactivate TextHelper
        alt Extraction returns null
            Action->>TestEngine: FAILED
        else Extraction succeeds
            Action->>Action: Store in runtime variables
            Action->>TestEngine: SUCCESS
        end
    end
    deactivate Action
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested Reviewers

  • Ganesh-Testsigma
  • vigneshtestsigma

Poem

🐰 Hops of joy through files and text!
New extraction magic comes next,
From URLs to delimiters fine,
Between the words, a perfect line!
Words extracted, variables blessed,
Actions shine, put to the test!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly references the main change: adding text extraction capabilities based on delimiters (spaces/tabs etc.), which aligns with the new NLPs for extracting content from files by delimiter position.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch CUS-12104

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@file_actions/src/main/java/com/testsigma/addons/utils/FileHelper.java`:
- Around line 25-29: The temp file created in FileHelper (tempFile created via
File.createTempFile and populated with FileUtils.copyURLToFile) is never
deleted, causing disk leaks; change the API to make ownership explicit by
replacing the current return of File tempFile with an AutoCloseable wrapper
(e.g., TempFileResource) that holds the File and implements close() to delete
the file, update FileHelper to return that TempFileResource instead of File
(move File.createTempFile and copy into the wrapper), and update callers to use
try-with-resources on TempFileResource (or, as a simpler alternative, call
tempFile.deleteOnExit() and clearly document that callers must delete the
file)—ensure references to FileHelper, tempFile, File.createTempFile,
FileUtils.copyURLToFile and the method signature are updated accordingly.
- Around line 15-26: The code in FileHelper.java currently accepts arbitrary
http(s) URLs and downloads them without validation, timeouts, or size checks;
update the URL-handling branch that creates URL/temporary File (the block using
URL urlObject, baseName/extension, File.createTempFile and
FileUtils.copyURLToFile) to: 1) validate the URL scheme and host against an
allowlist/denylist to prevent SSRF, 2) perform a HEAD request or open a
connection to check Content-Length and reject large files above a configured
MAX_BYTES, 3) use the FileUtils.copyURLToFile overload with connection and read
timeout values (or wrap URLConnection with setConnectTimeout/setReadTimeout),
and 4) stream the download with a guarded byte-count (abort and delete tempFile
if the cap is exceeded) so the file is never fully buffered in memory or allowed
to exceed disk limits. Ensure errors close streams and clean up tempFile on
failure.

In
`@file_actions/src/main/java/com/testsigma/addons/utils/TextExtractionHelper.java`:
- Around line 117-119: The switch on delimiterType in TextExtractionHelper
currently calls delimiterType.trim() before matching, which makes the literal
tab case ("\\t" / "\t") unreachable; change the logic to inspect the
raw/untrimmed delimiter string first (check for the literal "\t" case) before
calling trim()—mirror how literal-space is handled—and apply the same fix to the
other occurrence later in the file (the second switch around lines 127-134) so
literal tab input is recognized.
- Around line 47-49: The logger currently emits sensitive file contents and
extracted text (variables like content and extracted) inside
TextExtractionHelper; remove or redact these detailed payloads and instead log
only non-sensitive metadata (e.g., original content length, extracted substring
length, delimiter/type used, and success/failure) from the methods that perform
substring extraction and file reading (look for uses of variables named content,
extracted and the logger.info calls in TextExtractionHelper). Replace calls that
print the actual content with messages such as "extraction succeeded:
extractedLength=X, sourceLength=Y, delimiter=Z" or "extraction failed: reason",
ensuring no raw file data or PII is logged.

In
`@file_actions/src/main/java/com/testsigma/addons/windowsAdvanced/ExtractTextInBetweenWordsFromFile.java`:
- Around line 68-71: The current logger.info call in the
ExtractTextInBetweenWordsFromFile action logs the raw extracted text
(extractedText) which may be sensitive; update the logging to avoid printing the
value and instead log the variable name (variableNameStr) and the length of the
extractedText (e.g., extractedText.length()) before storing it in runTimeData
via runTimeData.setKey(...) and runTimeData.setValue(...); locate the
logger.info usage that references variableNameStr and extractedText and replace
the message accordingly while keeping runTimeData interactions unchanged.
- Around line 41-52: The code in ExtractTextInBetweenWordsFromFile dereferences
startWord, endWord, filePath and variableName before validation and only checks
isEmpty(), which allows whitespace-only values and causes NPEs for nulls; change
the flow to first normalize each input by checking for null, calling toString()
safely, then trim() to remove surrounding whitespace (e.g., compute
normalizedStart = startWord==null? "": startWord.getValue()==null? "":
startWord.getValue().toString().trim()), perform validation against the
trimmed/normalized strings, and only after validation log safe metadata (avoid
logging raw values) using the normalized variables; update uses of
startWordStr/endWordStr/filePathStr/variableNameStr accordingly.

In
`@file_actions/src/main/java/com/testsigma/addons/windowsAdvanced/ExtractWordByDelimiterPositionFromFile.java`:
- Around line 25-27: The TestData annotation on the field delimiterType has
allowedValues that don't match what resolveDelimiter expects, preventing UI
aliases from reaching the resolver; update the allowedValues for delimiterType
(and the other analogous TestData occurrence around lines 68-72) to include both
backend tokens and common aliases — e.g. include "," and "comma", "period" and
".", "\t" and "tab", "multispace" and "multi-space"/"space" — and adjust the
description string to list the same aliases so the UI can send values that
resolveDelimiter can handle.
- Around line 87-91: The current code stores the extracted token in runTimeData
(runTimeData.setKey / setValue) but then prints the raw value via logger.info
and setSuccessMessage; remove the sensitive value from logs and messages by
changing logger.info to only mention the variable name and action (e.g., "Stored
extracted word in variable 'variableNameStr'") and adjust setSuccessMessage
likewise to omit extractedWord (e.g., "Successfully extracted word at position X
using delimiter 'Y' and stored in variable 'variableNameStr'"); keep storing
extractedWord in runTimeData but never echo extractedWord in logger.info or
setSuccessMessage to avoid exposing sensitive data.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8053317c-f510-48af-ad49-08a078d40698

📥 Commits

Reviewing files that changed from the base of the PR and between 3500768 and 422a383.

📒 Files selected for processing (5)
  • file_actions/dependency-reduced-pom.xml
  • file_actions/src/main/java/com/testsigma/addons/utils/FileHelper.java
  • file_actions/src/main/java/com/testsigma/addons/utils/TextExtractionHelper.java
  • file_actions/src/main/java/com/testsigma/addons/windowsAdvanced/ExtractTextInBetweenWordsFromFile.java
  • file_actions/src/main/java/com/testsigma/addons/windowsAdvanced/ExtractWordByDelimiterPositionFromFile.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant