Skip to content

feat(mcp): add patch_dashboard, get_dashboard_tile, search_dashboards tools (HDX-4139)#2343

Open
brandon-pereira wants to merge 13 commits into
mainfrom
brandon/mcp-dashboard-improvements
Open

feat(mcp): add patch_dashboard, get_dashboard_tile, search_dashboards tools (HDX-4139)#2343
brandon-pereira wants to merge 13 commits into
mainfrom
brandon/mcp-dashboard-improvements

Conversation

@brandon-pereira
Copy link
Copy Markdown
Member

@brandon-pereira brandon-pereira commented May 26, 2026

What

Add three new MCP dashboard tools for more granular, efficient operations alongside the existing full create/update path:

  • hyperdx_get_dashboard_tile — Retrieve a single tile by tileId without loading the full dashboard
  • hyperdx_patch_dashboard — Update dashboard name/tags and/or replace a single tile by tileId in one call. Unmentioned tiles and fields are preserved. Layout fields fall back to existing values when omitted.
  • hyperdx_search_dashboards — Search dashboards by name (case-insensitive partial match) and/or tags

Why

The current MCP dashboard tooling requires passing the full dashboard object on every update — making it slow and token-heavy for targeted edits like changing one tile's query or renaming a dashboard. These tools let the LLM do targeted edits without resubmitting everything.

Closes HDX-4139

Additional improvements

  • Add cross-referencessave_dashboard description now mentions patch_dashboard as the preferred tool for single-tile updates.
  • Document Lucene substring matching — Prominently documented that field:value is ilike (substring match, not equality) and field:val* is prefix-within-substring (not true prefix). Added to tool-level WHERE_DESCRIPTION, query guide prompt, and common mistakes section.
  • Split test file — Broke dashboards.test.ts (3040 lines) into 8 focused files under __tests__/dashboards/ with a shared setup helper.

Testing

  • 96 integration tests across 8 test files, all passing
  • TypeScript type check clean
  • ESLint 0 errors

… tools (HDX-4139)

Add three new MCP dashboard tools for more granular operations:

- hyperdx_get_dashboard_tile: retrieve a single tile by tileId without
  loading the full dashboard
- hyperdx_patch_dashboard: update dashboard name/tags and/or replace a
  single tile by tileId in one call, preserving all other tiles and
  falling back to existing layout when omitted
- hyperdx_search_dashboards: search dashboards by name (case-insensitive)
  and/or tags

Additional improvements:
- Fix empty parameter schema on patch_dashboard and search_dashboards
  caused by Zod .refine() wrapping (moved cross-field validation to
  handler body so inputSchema stays a plain z.object)
- Add 'prefer patch_dashboard' hint to save_dashboard description
- Document Lucene substring matching limitation prominently in tool
  descriptions and query guide prompt (field:value is ilike not equality,
  field:val* is prefix-within-substring not true prefix)
- Split monolithic dashboards.test.ts (3040 lines) into 8 focused test
  files under __tests__/dashboards/
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 26, 2026

🦋 Changeset detected

Latest commit: 317ecbb

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Patch
@hyperdx/app Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 29, 2026 3:37pm
hyperdx-storybook Ready Ready Preview, Comment May 29, 2026 3:37pm

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

E2E Test Results

All tests passed • 192 passed • 3 skipped • 1282s

Status Count
✅ Passed 192
❌ Failed 0
⚠️ Flaky 3
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@brandon-pereira brandon-pereira marked this pull request as ready for review May 26, 2026 21:58
@github-actions github-actions Bot added the review/tier-3 Standard — full human review required label May 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Critical-path files (2):
    • packages/api/src/routers/external-api/v2/dashboards.ts
    • packages/api/src/routers/external-api/v2/utils/dashboards.ts

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Production files changed: 11
  • Production lines changed: 1283 (+ 5981 in test files, excluded from tier calculation)
  • Branch: brandon/mcp-dashboard-improvements
  • Author: brandon-pereira

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Deep Review

🟡 P2 -- recommended

  • packages/api/src/mcp/tools/dashboards/patchDashboard.ts:158-172 -- The merge step substitutes the handler's read-time x/y/w/h/containerId/tabId whenever incoming omits them, but the positional tiles.$ update only matches on tiles.id, so a concurrent save_dashboard that moved the tile or replaced its container is silently clobbered, leaving stale layout values or an orphan containerId/tabId reference.

    • Fix: Require callers to send layout fields explicitly when patching (drop the read-time fallback), or add a version/updatedAt field to queryFilter so a stale read causes the update to miss and surface a conflict error.
    • adversarial, reliability, correctness
  • packages/api/src/mcp/tools/dashboards/schemas.ts:601-651 -- mcpTileSchema and mcpPatchTileSchema are plain z.union over 9 arms with builder arms before the SQL arm and no discriminator, so a config that mixes builder fields (sourceId, select) with SQL fields (configType: 'sql', connectionId, sqlTemplate) parses as the first matching builder arm and silently strips the SQL-only keys.

    • Fix: Use z.discriminatedUnion keyed on config.configType/config.displayType, or move the SQL arm above the builder arms and reject payloads where both shapes appear.
  • packages/api/src/mcp/tools/dashboards/schemas.ts:311-312 -- The new name: z.string().min(1) requirement rejects empty strings on save, but convertTileToExternalChart in utils/dashboards.ts:465 still emits name: tile.config.name ?? '' on read, so a GET → modify → hyperdx_save_dashboard round-trip on a legacy dashboard with an empty-named tile now fails validation with no fault of the caller.

    • Fix: Coerce empty names to a placeholder when emitting from convertTileToExternalChart, or relax .min(1) to a server-side normalization step that defaults blanks rather than rejecting them.
  • packages/api/src/mcp/tools/dashboards/patchDashboard.ts:158-172 -- mergedTile.name falls back to existingInternalTile.config?.name ?? '', so the patch path can persist an empty tile name for a legacy tile even though the sibling hyperdx_save_dashboard now rejects empty names — the new .min(1) invariant is silently subverted on the patch path.

    • Fix: Re-parse the merged tile through mcpTileLayoutSchema (or its non-defaulted twin) before the findOneAndUpdate so the name invariant holds across both write paths.
  • .changeset/mcp-dashboard-improvements.md:1 -- Frontmatter declares the bump as '@hyperdx/api': patch, but the body explicitly says **Breaking (minor):** Tile name on hyperdx_save_dashboard now requires at least 1 character; downstream consumers parsing the changeset for breaking-change signal will not be alerted.

    • Fix: Change the frontmatter bump from patch to minor (or larger) so the version policy matches the body's own classification.
  • packages/api/src/mcp/tools/dashboards/patchDashboard.ts:246-257 -- The new alert-cleanup behavior on the patch path is entirely uncovered by __tests__/dashboards/patchDashboard.test.ts; no test asserts that patching an alert-capable tile to a non-alert displayType deletes that tile's alerts, nor that alerts on sibling tiles are preserved by the single-tile cleanup scope.

    • Fix: Add a test that attaches an alert to a line tile, patches the tile to pie or heatmap, and asserts that alert is removed while a sibling alert on a second tile is untouched.
    • testing, correctness
  • packages/api/src/mcp/tools/dashboards/searchDashboards.ts:60-95 -- The truncation branch (limit(SEARCH_RESULTS_LIMIT + 1), slice to 100, response object with truncated: true and hint) has no test; a regression that flipped the threshold by one or returned the wrong response shape would not surface.

    • Fix: Seed 101 dashboards and assert the response parses as { results: [...], truncated: true, hint: <string> } with results.length === 100; seed 100 and assert it remains a bare array.
  • packages/api/src/mcp/__tests__/dashboards/patchDashboard.test.ts:398-449 -- The "concurrent delete" test removes the tile before the patch tool is invoked, so the handler returns at the read-time miss path on line 124; the positional-$ write-time miss branch (lines 218-234, with the distinct "was not found at write time" message and the "name/tags changes were not applied" rejection contract) is never exercised.

    • Fix: Spy on Dashboard.findOne to delete the tile inside its resolve (between the read and the findOneAndUpdate), and assert the write-time miss error string and the partial-update rejection.
    • testing, adversarial
🔵 P3 nitpicks (16)
  • packages/api/src/routers/external-api/v2/dashboards.ts:2106-2113 -- PUT now short-circuits with res.sendStatus(404) before any tile/source validation, so a request to a wrong-team or non-existent dashboard with an otherwise-invalid body no longer returns 400 with a structured message; clients that previously distinguished the two will see 404 where they expected 400.

    • Fix: Document the precedence change in the OpenAPI block above the route and add a regression test asserting 404 for a non-existent dashboard whose body would otherwise fail validation.
    • adversarial, api-contract
  • packages/api/src/mcp/tools/dashboards/patchDashboard.ts:94-98 -- setPayload: Record<string, unknown> and queryFilter: Record<string, unknown> discard the type safety that sibling saveDashboard.ts:325 and the REST v2 PUT just gained by switching to Partial<IDashboard>; a typo like setPayload.tag = ... would compile.

    • Fix: Type them as UpdateQuery<DashboardDocument>['$set'] and FilterQuery<DashboardDocument>, which model the dotted-path keys tiles.$ and tiles.id natively.
    • kieran-typescript, maintainability
  • packages/api/src/mcp/tools/dashboards/patchDashboard.ts:116, 137-146 -- (existingDashboard.tiles as { id: string }[]) ?? [] widens away the canonical Tile[] typing only for line 137 to re-cast a richer hand-rolled shape; a future change to the canonical Tile type would not be caught at either call site.

    • Fix: Type the local as Tile[] (or pass existingDashboard.tiles through unchanged) and drop both as casts.
  • packages/api/src/mcp/tools/dashboards/schemas.ts:633-651 -- mcpPatchTileSchema is built as nine separate mcpPatchTileLayoutSchema.extend({ config: <arm>.shape.config }) calls, but the handler at patchDashboard.ts:154 immediately erases the inferred type with inputTile as Partial<ExternalDashboardTileWithId>, signalling the inferred Zod type is not usable as-is.

    • Fix: Export z.infer<typeof mcpPatchTileSchema> and type the merge against that union, or reuse mcpTileSchema.shape.config arms inside a single extend({ config: ... }) so the inferred type matches ExternalDashboardTile.
  • packages/api/src/mcp/tools/dashboards/schemas.ts:311-364, 616-628 -- mcpTileLayoutSchema (with defaults) and mcpPatchTileLayoutSchema (without defaults) define the same six layout fields, and the patch union re-extends every one of the nine tile config types; adding a tile type now requires updating two unions and two layout schemas in lock-step.

    • Fix: Factor a mcpTileLayoutCoreSchema with no defaults, apply .default(...) once via an extension for the save path, and build mcpPatchTileSchema as .partial() over the save union.
  • packages/api/src/mcp/tools/dashboards/patchDashboard.ts:80-83, 212 -- Dashboard.findOne and findOneAndUpdate are unwrapped, so a transient Mongo rejection propagates through withToolTracing as an opaque JSON-RPC transport error; this is inconsistent with searchDashboards.ts:97-111, which catches rejections and returns structured isError content the LLM can reason about.

    • Fix: Pick one pattern across all dashboard MCP tools — either catch in each handler, or move the catch into withToolTracing so every tool emits structured isError on rejection.
  • packages/api/src/routers/external-api/v2/utils/dashboards.ts:321-356 -- The heatmap self-heal returns a tile carrying valueExpression: '', which the input schema itself rejects via min(1); the new hyperdx_get_dashboard_tile and hyperdx_search_dashboards paths now expose that placeholder to LLM callers without a marker to detect it.

    • Fix: Drop legacy malformed heatmap tiles on read (matching how PromQL tiles are handled three lines up) and log, or attach an explicit marker so callers can detect the placeholder without parsing the empty string.
  • packages/api/src/routers/external-api/v2/utils/dashboards.ts:938, 1146-1150 -- validateDashboardTiles fetches sources once at the top of its Promise.all, then getInvalidOnClickSearchSources calls getSources(team) again inside the same Promise.all, doubling the Mongo round-trip on every save/patch validation.

    • Fix: Refactor getInvalidOnClickSearchSources to accept a pre-fetched sources argument, mirroring getMissingSources and getHeatmapTilesWithIncompatibleSources.
    • maintainability, reliability
  • packages/api/src/routers/external-api/v2/utils/dashboards.ts:780-784 -- fetchSourcesForValidation is a one-line, non-exported wrapper whose docstring says it exists "so callers outside controllers/sources don't need a second import" — but the only caller is one line above in the same file.

    • Fix: Inline the single call at line 1146 and drop the wrapper plus the SourceForValidation alias that exists only to name its awaited return type.
  • packages/api/src/mcp/tools/dashboards/patchDashboard.ts:42-71 -- The two cross-field validation branches (Provide at least one of: name, tags, or tileId+tile and tileId and tile must both be provided or both omitted) have no tests in patchDashboard.test.ts.

    • Fix: Add three small tests — { dashboardId } only, { dashboardId, tileId } without tile, and { dashboardId, tile } without tileId — asserting each error message.
  • packages/api/src/mcp/tools/dashboards/patchDashboard.ts:73-78 -- The Invalid dashboard ID branch is uncovered for the patch tool (getDashboardTile.test.ts:74-82 covers it for the sibling tool but patchDashboard.test.ts does not).

    • Fix: Add a one-line test calling patchDashboard with dashboardId: 'not-valid' and assert the error string.
  • packages/api/src/routers/external-api/v2/utils/dashboards.ts:1083-1106, 1153-1158 -- The sourceConnectionMismatches check moved from REST-only into the shared validator, so it now also applies to MCP save/patch, but no MCP test exercises it — an LLM mixing sourceId and connectionId from different sources is exactly the failure mode this guards.

    • Fix: Add an integration test creating source S1 on connection C1 and asserting hyperdx_save_dashboard rejects a raw-SQL tile carrying { sourceId: S1, connectionId: C2 }.
  • packages/api/src/mcp/tools/dashboards/schemas.ts:312 -- The breaking .min(1) on tile name has no regression test asserting that hyperdx_save_dashboard rejects tiles: [{ name: '', config: {…} }].

    • Fix: Add a single test asserting isError with a message that mentions name.
  • packages/api/src/mcp/tools/dashboards/patchDashboard.ts:165-169 -- The 'containerId' in incoming / 'tabId' in incoming semantics are the central patch behavior for ref preservation vs clearing, but no test exercises "container preserved on omit", "container switched on explicit value", or legacy empty-string self-heal.

    • Fix: Add tests covering each branch — omit, explicit value, legacy empty-string containerId — so the 'in' contract is locked.
  • packages/api/src/mcp/__tests__/dashboards/setup.ts:28-31, packages/api/src/mcp/__tests__/dashboards/patchDashboard.test.ts:321, 342 -- The split test files add six new as any casts (four null as any initializers in setup.ts; two (beforePatch as any).tiles / (afterPatch as any).tiles in the patch test) where the pre-existing 3040-line dashboards.test.ts had zero.

    • Fix: Type the setup context via Awaited<ReturnType<typeof getLoggedInAgent>> plus the Source/Connection document types, and replace (beforePatch as any).tiles with beforePatch?.tiles since .lean() already exposes that field.
    • project-standards, kieran-typescript
  • packages/api/src/mcp/tools/dashboards/patchDashboard.ts:172 -- The mergedTile literal ends with } as ExternalDashboardTileWithId;, which silently widens; code_style.md prefers satisfies over as for exactly this case.

    • Fix: Replace the trailing as with satisfies ExternalDashboardTileWithId and spread containerId/tabId only when defined rather than assigning undefined.

Reviewers (9): correctness, testing, maintainability, project-standards, security, adversarial, api-contract, kieran-typescript, reliability.

Testing gaps:

  • No REST v2 POST/PUT regression test confirms the migrated 400-message strings stayed byte-identical post-validateDashboardTiles refactor; the existing v2 suite is thin.
  • No GET → save round-trip test on a dashboard whose internal config.name is null/empty exists to confirm .min(1) behavior on real legacy data shapes.
  • mcpTileSchema / mcpPatchTileSchema arm precedence has no unit test confirming whether a mixed builder+SQL config is rejected or routed correctly.

searchDashboards:
- Escape regex metacharacters in query before passing to MongoDB $regex
  to prevent injection (e.g. '[' throwing BadValue, '.foo' matching
  unintended substrings, catastrophic backtracking on patterns like
  '(a+)+$')
- Cap query length at 200 chars via Zod schema
- Wrap Dashboard.find in try/catch returning isError on failure

patchDashboard:
- Work directly with persisted internal tiles array instead of
  round-tripping all tiles through convertToExternalDashboard (which
  strips orphaned container refs from unrelated tiles via self-heal
  logic at utils/dashboards.ts:407-424)
- Convert only the single patched tile to internal format via
  convertToInternalTileConfig
- Use positional $set (tiles.<index>) instead of full-array $set so
  concurrent patches on different tiles don't clobber each other
searchDashboards:
- Regex metacharacters treated as literals (parentheses, brackets, dots)
- Invalid regex chars like '[' don't throw

patchDashboard:
- Positional update verified via raw DB read: untouched tiles are
  byte-identical before and after patching a sibling tile
P0 — patchDashboard concurrent write race:
- Replace `tiles.${tileIndex}` positional set with `tiles.id` in the
  query filter + `tiles.$` positional operator. A concurrent
  save_dashboard that replaces the tiles array can no longer cause us
  to overwrite an unrelated tile at a stale numeric index.
- null result from findOneAndUpdate now returns a specific error
  message indicating the tile may have been removed concurrently.

P2 — patchDashboard missing alert cleanup:
- Restore cleanupDashboardAlerts call after tile update, scoped to
  the single patched tile. Catches displayType changes to configs
  that don't support alerts.

P2 — patch tile name required when it should be optional:
- Make `name` optional on mcpPatchTileLayoutSchema so the LLM can
  send a config-only patch without re-specifying the tile title.
  The merge fallback (incoming.name ?? existing name) is now reachable.
- Add .min(1) to both mcpTileLayoutSchema.name and
  mcpPatchTileLayoutSchema.name to reject blank tile titles at the
  schema boundary.

P2 — searchDashboards empty query/tags bypass:
- Replace `query === undefined` guard with length checks so
  `query: ''` and `tags: []` are rejected instead of silently
  returning all dashboards.
@brandon-pereira
Copy link
Copy Markdown
Member Author

Note for reviewer: most of the diff is refactoring the tests to be multiple files. Would suggest focusing on the src

searchDashboards:
- Use lodash escapeRegExp instead of reimplemented helper
- Inline type narrowing to eliminate query! non-null assertion
- Tighten tags schema to z.string().min(1) to reject tags: ['']
- Add .limit(100) with truncation hint when results exceed cap
- Log errors before returning user-facing message

patchDashboard:
- Coerce legacy empty-string containerId/tabId to undefined in merge
  (mirrors convertTileToExternalChart self-heal for legacy docs)
- Expand concurrent-miss error to state name/tags were also discarded

schemas:
- Drop unused id field from mcpPatchTileLayoutSchema (tileId is the
  authoritative selector; inner id was silently overridden)

tests:
- Switch JSON.stringify byte-equality to toEqual (order-insensitive)
- Add tests: config-only patch preserves name, concurrent tile
  removal detected, empty query/tags/both rejected

changeset:
- Note .min(1) contract tightening on tile name
Extract shared validateDashboardTiles() helper in utils/dashboards.ts
that consolidates the ~95-line tile validation block previously
duplicated across 5 call sites (REST v2 POST/PUT, MCP save create/
update, MCP patch). Net -355 lines.

The helper runs all 7 checks in order: container refs, missing sources,
missing connections, source/connection mismatches (was REST-only, now
shared with MCP), heatmap source-kind gate, onClick dashboard targets,
onClick search source validation.

Also move getSourceConnectionMismatches from a private function in the
REST router to a shared export in utils — MCP callers now get the same
source/connection mismatch validation that was previously REST-only.

Fix cleanupDashboardAlerts to also catch builder tiles with
alert-incompatible display types (Pie, Table, Heatmap, Search,
Markdown). Previously only raw SQL tiles with incompatible types were
cleaned up, leaving stale alert documents on builder tiles that changed
to an unsupported displayType.
@github-actions github-actions Bot added review/tier-4 Critical — deep review + domain expert sign-off and removed review/tier-3 Standard — full human review required labels May 27, 2026
@brandon-pereira brandon-pereira requested review from a team and knudtty and removed request for a team May 27, 2026 22:08
knudtty
knudtty previously approved these changes May 28, 2026
Copy link
Copy Markdown
Contributor

@knudtty knudtty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking

Comment thread packages/api/src/mcp/tools/dashboards/patchDashboard.ts Outdated
Comment thread packages/api/src/mcp/tools/dashboards/patchDashboard.ts Outdated
Comment thread packages/api/src/mcp/tools/dashboards/patchDashboard.ts
Comment thread packages/api/src/mcp/tools/dashboards/searchDashboards.ts Outdated
…hboard tools

Move inline inputSchema definitions from patchDashboard.ts and
searchDashboards.ts into schemas.ts as mcpPatchDashboardSchema and
mcpSearchDashboardsSchema. Also replace unicode escape \u2014 with
literal em dash in patch description.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants