Skip to content

[CUS-11975] created an add-on for click on image using ai.#378

Open
ManojTestsigma wants to merge 1 commit intodevfrom
CUS-11975
Open

[CUS-11975] created an add-on for click on image using ai.#378
ManojTestsigma wants to merge 1 commit intodevfrom
CUS-11975

Conversation

@ManojTestsigma
Copy link
Copy Markdown
Contributor

@ManojTestsigma ManojTestsigma commented Apr 16, 2026

please review this addon and publish as PUBLIC

Addon name : image_based_actions
Addon accont: https://jarvis.testsigma.com/ui/tenants/2817/addons
Jira: https://testsigma.atlassian.net/browse/CUS-11975

fix

created an nlp to click on image using ai.

Summary by CodeRabbit

New Features

  • Added AI-powered Windows action that automatically detects and clicks screen elements based on natural language descriptions, with visual highlighting of detected locations and comprehensive error handling.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

A new Windows action class ClickOnImageUsingAi was added that captures the full desktop, sends it to an AI service for element location detection, parses the AI-generated JSON response, computes click coordinates through coordinate space transformations, uploads an annotated screenshot to S3, and performs the click operation.

Changes

Cohort / File(s) Summary
Windows AI Image Clicking Action
image_based_actions/src/main/java/com/testsigma/addons/windows/ClickOnImageUsingAi.java
New public class implementing desktop screenshot capture, AI-powered element localization via JSON parsing, multi-stage coordinate scaling (image-space → physical pixels → logical DIP coordinates), bounding box visualization and annotation, S3 screenshot upload, and java.awt.Robot-based clicking with comprehensive error handling and defensive validation.

Sequence Diagram

sequenceDiagram
    actor User
    participant Action as ClickOnImageUsingAi
    participant Desktop as Desktop/Robot
    participant AI as AI Service
    participant Parser as JSON Parser
    participant S3 as AWS S3
    
    User->>Action: Execute action with prompt
    Action->>Desktop: Capture full desktop screenshot
    Desktop-->>Action: Screenshot image data
    Action->>Action: Format image (PNG/JPEG)
    Action->>AI: Send screenshot + location prompt
    AI-->>Action: Return JSON response<br/>(bounding box coordinates)
    Action->>Parser: Parse AI JSON response
    Parser-->>Action: Validated coordinates
    Action->>Action: Compute click position<br/>(2-stage coordinate scaling)
    Action->>Action: Annotate image with<br/>bounding box & click point
    Action->>S3: Upload annotated screenshot
    S3-->>Action: Upload confirmation
    Action->>Desktop: Execute click at<br/>computed coordinates
    Desktop->>User: Click performed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

  • Ganesh-Testsigma
  • vigneshtestsigma

Poem

🐰✨ A rabbit clicks with AI's keen sight,
Scanning pixels, finding targets bright,
Screenshots to wisdom, then down goes the paw,
Coordinates scaled with mathematical awe! 🖱️

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding a new Windows action for clicking on images using AI. It is specific, concise, and directly related to the primary purpose of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch CUS-11975

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@image_based_actions/src/main/java/com/testsigma/addons/windows/ClickOnImageUsingAi.java`:
- Around line 293-294: The inline comment above the color setting in
ClickOnImageUsingAi (the line using g.setColor(Color.MAGENTA)) incorrectly says
"Green bounding rectangle"; update either the comment or the color so they
match: either change the comment to "Magenta bounding rectangle" or change
Color.MAGENTA to Color.GREEN (or Color.GREEN.darker()/brighter as appropriate)
in the paint/draw method where g.setColor is called to ensure the comment and
the visual color are consistent.
- Line 156: Update the error message passed to setErrorMessage in
ClickOnImageUsingAi (the call currently setting "Failed to get the image
response from ai (contact to support)") to correct the grammar to "Failed to get
the image response from ai (contact support)"; locate the setErrorMessage(...)
invocation in the ClickOnImageUsingAi class/method and replace the string
accordingly.
- Around line 144-147: The model identifier passed to AIRequest currently
includes the "anthropic." prefix; update the call where AIRequest is created
(the AIRequest instance and its setModel usage) to use "claude-opus-4-6" instead
of "anthropic.claude-opus-4-6" (the block that sets fullPrompt, calls
aiRequest.setModel(...), and attaches files like screenshotFile should be
modified accordingly).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7115b6e4-9405-4a08-9db4-fcdd4d027d7e

📥 Commits

Reviewing files that changed from the base of the PR and between 54a1d53 and 6438925.

📒 Files selected for processing (1)
  • image_based_actions/src/main/java/com/testsigma/addons/windows/ClickOnImageUsingAi.java

Comment on lines +144 to +147
AIRequest aiRequest = new AIRequest();
aiRequest.setPrompt(fullPrompt);
aiRequest.setModel("anthropic.claude-opus-4-6");
aiRequest.setFiles(List.of(screenshotFile));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What are the valid model names for Claude models on Vertex AI in 2026?

💡 Result:

The valid model names (API model IDs) for Claude models on Vertex AI as of April 2026, based on official documentation from Google Cloud and Anthropic, are: Current/latest models (recommended): - claude-opus-4-6 - claude-sonnet-4-6 - claude-haiku-4-5@20251001 - claude-sonnet-4-5@20250929 - claude-opus-4-5@20251101 - claude-opus-4-1@20250805 Legacy/older models (still available but marked deprecated in some docs): - claude-sonnet-4@20250514 - claude-opus-4@20250514 - claude-3-5-haiku@20241022 - claude-3-haiku@20240307 - claude-3-7-sonnet@20250219 Note: Availability may vary by region. Check Vertex AI Model Garden for the most up-to-date list and retirement dates (e.g., some retire no sooner than 2027). No confirmed ID found for newer models like Sonnet 5 "Fennec" on Vertex AI.

Citations:


Remove the "anthropic." prefix from the model identifier.

The model ID should be "claude-opus-4-6" not "anthropic.claude-opus-4-6". The Vertex AI API model IDs for Claude do not include the "anthropic." prefix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@image_based_actions/src/main/java/com/testsigma/addons/windows/ClickOnImageUsingAi.java`
around lines 144 - 147, The model identifier passed to AIRequest currently
includes the "anthropic." prefix; update the call where AIRequest is created
(the AIRequest instance and its setModel usage) to use "claude-opus-4-6" instead
of "anthropic.claude-opus-4-6" (the block that sets fullPrompt, calls
aiRequest.setModel(...), and attaches files like screenshotFile should be
modified accordingly).

// ── Step 5: Parse the AI response ──
JsonNode responseNode = parseJson(aiResponse);
if (responseNode == null) {
setErrorMessage("Failed to get the image response from ai (contact to support)" );
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor grammatical issue in error message.

The error message contains "contact to support" which should be "contact support".

Proposed fix
-                setErrorMessage("Failed to get the image response from ai (contact to support)" );
+                setErrorMessage("Failed to get the image response from AI (contact support)");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
setErrorMessage("Failed to get the image response from ai (contact to support)" );
setErrorMessage("Failed to get the image response from AI (contact support)");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@image_based_actions/src/main/java/com/testsigma/addons/windows/ClickOnImageUsingAi.java`
at line 156, Update the error message passed to setErrorMessage in
ClickOnImageUsingAi (the call currently setting "Failed to get the image
response from ai (contact to support)") to correct the grammar to "Failed to get
the image response from ai (contact support)"; locate the setErrorMessage(...)
invocation in the ClickOnImageUsingAi class/method and replace the string
accordingly.

Comment on lines +293 to +294
// Green bounding rectangle
g.setColor(Color.MAGENTA);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Comment does not match implementation.

The comment says "Green bounding rectangle" but the code uses Color.MAGENTA. Either update the comment or the color to match.

Proposed fix (update comment)
-        // Green bounding rectangle
+        // Magenta bounding rectangle
         g.setColor(Color.MAGENTA);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Green bounding rectangle
g.setColor(Color.MAGENTA);
// Magenta bounding rectangle
g.setColor(Color.MAGENTA);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@image_based_actions/src/main/java/com/testsigma/addons/windows/ClickOnImageUsingAi.java`
around lines 293 - 294, The inline comment above the color setting in
ClickOnImageUsingAi (the line using g.setColor(Color.MAGENTA)) incorrectly says
"Green bounding rectangle"; update either the comment or the color so they
match: either change the comment to "Magenta bounding rectangle" or change
Color.MAGENTA to Color.GREEN (or Color.GREEN.darker()/brighter as appropriate)
in the paint/draw method where g.setColor is called to ensure the comment and
the visual color are consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants