Skip to content

fix(core): add clickAndHold screenshot output parity#2015

Open
BABTUNA wants to merge 1 commit intobrowserbase:mainfrom
BABTUNA:fix-clickandhold-tool-output
Open

fix(core): add clickAndHold screenshot output parity#2015
BABTUNA wants to merge 1 commit intobrowserbase:mainfrom
BABTUNA:fix-clickandhold-tool-output

Conversation

@BABTUNA
Copy link
Copy Markdown

@BABTUNA BABTUNA commented Apr 19, 2026

why

clickAndHold was inconsistent with other coordinate-based vision tools
(click, type, dragAndDrop):

  • it did not capture a post-action screenshot
  • it did not implement toModelOutput

This made model feedback less reliable and broke parity with existing
vision tool behavior.

what changed

  • Updated clickAndHold tool to:
    • capture a screenshot after action execution
    • return richer success payload (describe, duration, coordinates,
      screenshotBase64)
    • implement toModelOutput with text + media output (matching other
      vision tools)
  • Added a dedicated ClickAndHoldToolResult type in public agent types.
  • Added clickAndHold to vision-action compression list so older
    screenshots from this tool are pruned consistently with other vision tools.
  • Added unit coverage for message processing to assert clickAndHold is
    treated as a vision action for screenshot compression.

test plan

  • prettier --check on touched files
  • eslint on touched files
  • tsc -p packages/core/tsconfig.json --noEmit
  • Added unit test:
    • packages/core/tests/unit/message-processing.test.ts

Summary by cubic

Brings clickAndHold to parity with other vision tools by capturing a post-action screenshot and emitting proper model output. Improves feedback quality and enables consistent screenshot compression.

  • Bug Fixes
    • Capture a post-action screenshot and include it in the result/model output.
    • Implement toModelOutput (text + optional image), matching click, type, and dragAndDrop.
    • Return a richer payload: describe, duration, coordinates, screenshotBase64.
    • Treat clickAndHold as a vision action for screenshot compression; added a unit test.
    • Add public ClickAndHoldToolResult type.

Written for commit 85fb256. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 19, 2026

⚠️ No Changeset found

Latest commit: 85fb256

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

@github-actions github-actions bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Apr 19, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant LLM as AI Model/Agent
    participant Tool as ClickAndHoldTool
    participant Page as Browser Page
    participant SH as ScreenshotHandler
    participant MP as MessageProcessor

    Note over LLM, MP: Tool Execution Flow
    LLM->>Tool: execute(coordinates, duration)
    Tool->>Page: processCoordinates()
    Tool->>Page: mouse.down() / mouse.up()
    
    activate Tool
    Tool->>SH: NEW: waitAndCaptureScreenshot(page)
    SH-->>Tool: screenshotBase64
    deactivate Tool
    
    Tool-->>LLM: CHANGED: ClickAndHoldToolResult (inc. screenshot)

    Note over LLM, MP: Model Response Formatting
    LLM->>Tool: toModelOutput(result)
    alt NEW: Result has screenshot
        Tool-->>LLM: Return [Text Content, Media Content]
    else Error
        Tool-->>LLM: Return [Text Error JSON]
    end

    Note over LLM, MP: Context Window Management (Cleanup)
    LLM->>MP: processMessages(history)
    loop For each Vision Action Message
        MP->>MP: Identify click/type/dragAndDrop
        MP->>MP: CHANGED: Identify clickAndHold
        opt Message is old/redundant
            MP->>MP: Prune screenshotBase64 (keep text only)
        end
    end
    MP-->>LLM: Compressed message history
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. external-contributor Tracks PRs mirrored from external contributor forks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant