fix(core): add clickAndHold screenshot output parity by BABTUNA · Pull Request #2015 · browserbase/stagehand

BABTUNA · 2026-04-19T22:17:08Z

why

clickAndHold was inconsistent with other coordinate-based vision tools
(click, type, dragAndDrop):

it did not capture a post-action screenshot
it did not implement toModelOutput

This made model feedback less reliable and broke parity with existing
vision tool behavior.

what changed

Updated clickAndHold tool to:
- capture a screenshot after action execution
- return richer success payload (describe, duration, coordinates,
  screenshotBase64)
- implement toModelOutput with text + media output (matching other
  vision tools)
Added a dedicated ClickAndHoldToolResult type in public agent types.
Added clickAndHold to vision-action compression list so older
screenshots from this tool are pruned consistently with other vision tools.
Added unit coverage for message processing to assert clickAndHold is
treated as a vision action for screenshot compression.

test plan

prettier --check on touched files
eslint on touched files
tsc -p packages/core/tsconfig.json --noEmit
Added unit test:
- packages/core/tests/unit/message-processing.test.ts

Summary by cubic

Brings clickAndHold to parity with other vision tools by capturing a post-action screenshot and emitting proper model output. Improves feedback quality and enables consistent screenshot compression.

Bug Fixes
- Capture a post-action screenshot and include it in the result/model output.
- Implement toModelOutput (text + optional image), matching click, type, and dragAndDrop.
- Return a richer payload: describe, duration, coordinates, screenshotBase64.
- Treat clickAndHold as a vision action for screenshot compression; added a unit test.
- Add public ClickAndHoldToolResult type.

^{Written for commit 85fb256. Summary will update on new commits. Review in cubic}

changeset-bot · 2026-04-19T22:17:12Z

⚠️ No Changeset found

Latest commit: 85fb256

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

github-actions · 2026-04-19T22:17:17Z

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

cubic-dev-ai

No issues found across 4 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

Architecture diagram

sequenceDiagram
    participant LLM as AI Model/Agent
    participant Tool as ClickAndHoldTool
    participant Page as Browser Page
    participant SH as ScreenshotHandler
    participant MP as MessageProcessor

    Note over LLM, MP: Tool Execution Flow
    LLM->>Tool: execute(coordinates, duration)
    Tool->>Page: processCoordinates()
    Tool->>Page: mouse.down() / mouse.up()
    
    activate Tool
    Tool->>SH: NEW: waitAndCaptureScreenshot(page)
    SH-->>Tool: screenshotBase64
    deactivate Tool
    
    Tool-->>LLM: CHANGED: ClickAndHoldToolResult (inc. screenshot)

    Note over LLM, MP: Model Response Formatting
    LLM->>Tool: toModelOutput(result)
    alt NEW: Result has screenshot
        Tool-->>LLM: Return [Text Content, Media Content]
    else Error
        Tool-->>LLM: Return [Text Error JSON]
    end

    Note over LLM, MP: Context Window Management (Cleanup)
    LLM->>MP: processMessages(history)
    loop For each Vision Action Message
        MP->>MP: Identify click/type/dragAndDrop
        MP->>MP: CHANGED: Identify clickAndHold
        opt Message is old/redundant
            MP->>MP: Prune screenshotBase64 (keep text only)
        end
    end
    MP-->>LLM: Compressed message history

fix(core): add clickAndHold screenshot model output

85fb256

github-actions bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Apr 19, 2026

cubic-dev-ai bot reviewed Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): add clickAndHold screenshot output parity#2015

fix(core): add clickAndHold screenshot output parity#2015
BABTUNA wants to merge 1 commit intobrowserbase:mainfrom
BABTUNA:fix-clickandhold-tool-output

BABTUNA commented Apr 19, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

changeset-bot bot commented Apr 19, 2026

Uh oh!

github-actions bot commented Apr 19, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BABTUNA commented Apr 19, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

test plan

Summary by cubic

Uh oh!

changeset-bot bot commented Apr 19, 2026

⚠️ No Changeset found

Uh oh!

github-actions bot commented Apr 19, 2026

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BABTUNA commented Apr 19, 2026 •

edited by cubic-dev-ai bot

Loading