feat(mcp): improve MCP tool quality — error hints, shared helpers, better messages by brandon-pereira · Pull Request #2337 · hyperdxio/hyperdx

brandon-pereira · 2026-05-25T16:03:29Z

What

Targeted improvements to MCP tool quality, identified and validated through performance evals against real agent workflows. These changes reduce tool-call failures, improve agent self-correction, and make tool outputs more actionable.

Query tools: top-level `where` + orderBy resolution

Table & Timeseries now accept a top-level where / whereLanguage that scopes the entire query. A shared mergeWhereIntoSelectItems() helper injects it into each select item's aggCondition (ANDed with any per-item where). No more duplicated merge logic.

Table gets an orderBy resolver: agents frequently write orderBy: "count", which generates ORDER BY count ASC — but ClickHouse can't resolve bare count since it's a function. The resolver maps aggFn names to aliases or synthesized expressions (count(), avg(Duration), etc.).

Timeseries now detects single-bucket collapse (1 row returned with explicit granularity) and adds a hint to adjust the time range or granularity.

Event patterns: topN, trendBuckets, restructured output

topN (1-100, default 20) — control how many patterns are returned
trendBuckets (0-60, default 24) — set to 0 to disable trend sparklines (smaller response)
Output restructured: top-level summary + patterns array + usage guide
Each pattern now includes rank and shareOfTotal for easier analysis

Schema & description improvements

Richer tool descriptions with concrete examples, common mistakes, and expression tips
MCP_AGG_FN_OPTIONS extracted as shared constant (single source of truth for aggFn names)
whereLanguage description now includes syntax matching guidance and common mistakes
groupBy documents multi-column comma-delimited input and arbitrary ClickHouse expressions
valueExpression description shows useful patterns (duration conversion, attribute casting, boolean→numeric)

Infrastructure

Error hints: Duplicated ClickHouse catch blocks extracted into clickHouseErrorResult() with pattern-matched hints for DateTime64 casting, AS alias quoting, and response size overflow
trimToolResponse: Now returns { data, isTrimmed } so callers detect trimming without re-serializing
Dashboard schemas: granularity added to line/stacked_bar external dashboard zod schemas
describeSource: Switched to compact JSON output (saves context tokens)
Better not-found messages: Source/connection not-found errors suggest calling hyperdx_list_sources
Regex fix: errorHint AS-alias regex uses \bAS\b word boundary to avoid false matches on words like DATABASE, ALIAS

Why

These changes came out of running MCP performance evals against real agent tasks. The combined effect:

Error hints cut agent confusion loops on ClickHouse-specific errors
orderBy resolution eliminates the most common ClickHouse ORDER BY failure mode
Top-level where reduces multi-step filter workarounds
Restructured event pattern output makes pattern analysis workflows significantly more reliable

Coming soon: We're building out a structured eval framework for MCP tool performance. Stay tuned — we'll be sharing the harness, benchmark suite, and results publicly.

Files changed (12)

File	Change
`helpers.ts`	`mergeWhereIntoSelectItems()`, `clickHouseErrorResult()`, error hints, better not-found messages
`table.ts`	Top-level where, orderBy resolver, updated descriptions
`timeseries.ts`	Top-level where, single-bucket collapse detection
`schemas.ts`	`MCP_AGG_FN_OPTIONS`, richer descriptions for all fields
`eventPatterns.ts`	`topN`, `trendBuckets` parameters
`runEventPatterns.ts`	Restructured output format, shareOfTotal, rank
`describeSource.ts`	Compact JSON, trimToolResponse API update
`dashboards.ts`	`granularity` in `convertToInternalTileConfig`
`trimToolResponse.ts`	Returns `{ data, isTrimmed }`
`trimToolResponse.test.ts`	Updated tests for new return shape
`queryTool.test.ts`	Test for `topN` property
`zod.ts`	`SQLIntervalSchema` + `granularity` on time chart schemas

changeset-bot · 2026-05-25T16:03:33Z

🦋 Changeset detected

Latest commit: cc1279a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@hyperdx/api	Patch
@hyperdx/app	Patch
@hyperdx/otel-collector	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2026-05-25T16:03:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
hyperdx-oss	Ready	Preview, Comment	May 27, 2026 10:40pm

github-actions · 2026-05-25T16:03:56Z

🟡 Tier 3 — Standard

Introduces new logic, modifies core functionality, or touches areas with non-trivial risk.

Why this tier:

Diff size: 642 production lines changed (Tier 2 max: < 250)

Review process: Full human review — logic, architecture, edge cases.
SLA: First-pass feedback within 1 business day.

Stats

Production files changed: 9
Production lines changed: 642 (+ 352 in test files, excluded from tier calculation)
Branch: brandon/mcp-tool-quality-improvements
Author: brandon-pereira

To override this classification, remove the review/tier-3 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

github-actions · 2026-05-25T16:05:09Z

E2E Test Results

✅ All tests passed • 188 passed • 3 skipped • 1336s

Status	Count
✅ Passed	188
❌ Failed	0
⚠️ Flaky	7
⏭️ Skipped	3

Tests ran across 4 shards in parallel.

View full report →

github-actions · 2026-05-25T16:14:53Z

Deep Review

✅ No critical issues found. Several P2 items below are worth addressing before merge, and a number of P3 nits and testing gaps round out the report.

🟡 P2 -- recommended

packages/api/src/mcp/tools/query/eventDeltas.ts:434 -- the catch block wrapping the sample-row ClickHouse query still emits a bespoke Failed to sample rows: message instead of routing through the new clickHouseErrorResult helper, so DateTime64/AS/V8-string hints never reach agents on this failure path.
- Fix: Replace the hand-rolled error wrapping at this catch site with return clickHouseErrorResult(e); after the abort call.
- _reliability
packages/api/src/mcp/tools/query/table.ts:111 -- resolveOrderBy returns matched select-item aliases verbatim, so an alias containing whitespace or special characters (e.g. Total Count) generates ORDER BY Total Count which ClickHouse rejects, even though renderChartConfig quotes the matching alias on the SELECT side.
- Fix: Wrap the returned alias in double quotes when it contains whitespace or non-identifier characters (or always quote, mirroring the SELECT-side rendering).
- _adversarial
packages/api/src/mcp/tools/query/table.ts:107 -- the alias lookup is case-insensitive and runs before the aggFn-name synthesis, so orderBy: "count" against a select list like [{aggFn:'avg', alias:'count'}, {aggFn:'count'}] resolves to the bare string count, which ClickHouse parses as the function name rather than an identifier.
- Fix: Either prefer aggFn-name synthesis over alias collisions on reserved function tokens, or require alias matches that collide with MCP_AGG_FN_OPTIONS to round-trip through a quoted identifier.
- _adversarial
packages/api/src/mcp/tools/query/runEventPatterns.ts:303 -- the auto-suggested whereSnippet is built as ${bodyExpression}:"phrase", but SAFE_BODY_EXPR_CHARS allows bracket-accessed map keys like SpanAttributes['http.url']; an agent that follows the usage instruction to feed whereSnippet into hyperdx_search (Lucene) will hit a parse error the PR's own description warns about.
- Fix: Detect bracket access in bodyExpression and either emit a SQL-form whereSnippet with a whereLanguage hint, normalize to dot notation, or omit the snippet for unsupported shapes.
- _adversarial
packages/api/src/mcp/tools/query/runEventPatterns.ts:332 -- the response shape moves totalCount/sampledRows/etc. from result.* to a new top-level summary, renames sampledRows to sampledCount, and gates the per-pattern trend field behind trendBuckets > 0; any prompt or downstream caller learned against the prior shape will silently get undefined, and the changeset entry does not flag the contract break.
- Fix: Either bump the tool name (e.g. _v2), ship temporary backwards-compat aliases on the response, or upgrade the changeset to call out the shape change as a breaking output contract.
- _api-contract
packages/api/src/mcp/tools/query/table.ts:225 -- when runConfigTile returns an error result, content[0].text is plain text, the warning-injection JSON.parse throws, and the bare catch {} silently drops the mergeWarnings (including the critical "top-level where was NOT applied" notice); the same pattern repeats in timeseries.ts:126.
- Fix: When the result is isError or JSON parsing fails, append warnings as an extra text content part instead of swallowing them, and extract the parse-mutate-restringify pattern into a shared helper to avoid drifting between the two tools.
- _{correctness, reliability, maintainability, kieran-typescript}
packages/api/src/mcp/tools/query/runEventPatterns.ts:312 -- the new trendBuckets > 0 ? ... : undefined ternary changes the response contract (the per-pattern trend field disappears) but no test covers trendBuckets: 0, so the disabled-trend branch and its impact on the usage string drift unverified.
- Fix: Add a unit/integration test that calls runEventPatterns with trendBuckets: 0 and asserts the trend field is absent on every pattern and the usage string is consistent.
- _testing
packages/api/src/mcp/tools/query/timeseries.ts:142 -- the single-bucket collapse detection (parse JSON content, inspect parsed.result.data.length === 1, inject a hint) has no direct test coverage on either the positive case or any of the early-return guards.
- Fix: Add an integration test that triggers a one-row timeseries response with explicit granularity and asserts the rendered hint appears; cover the negative case (no granularity, multi-row data) as well.
- _testing
packages/api/src/mcp/__tests__/queryTool.test.ts -- the new mergeWarnings end-to-end injection in table.ts/timeseries.ts (parse, append warnings array, restringify with null, 2) is exercised only at the helper level; no test asserts the warnings reach the final response JSON.
- Fix: Add tests that trigger a language mismatch (top-level whereLanguage:'lucene' plus an item with whereLanguage:'sql') and assert the parsed content[0].text contains the expected warnings array.
- _testing

🔵 P3 nitpicks (13)

packages/api/src/mcp/tools/query/helpers.ts:236 -- the builder-config Connection not found branch was missed when the other not-found messages were upgraded to suggest hyperdx_list_sources.
- Fix: Append the same hyperdx_list_sources suggestion to this branch (or extract a shared notFoundError(kind, id) helper).
- _{correctness, maintainability}
packages/api/src/mcp/tools/sources/describeSource.ts:325 -- the final JSON.stringify(finalOutput) dropped the prior null, 2 indent arguments, diverging from every other MCP tool output in this PR.
- Fix: Restore JSON.stringify(finalOutput, null, 2) to keep formatting consistent across tools.
- _{maintainability, api-contract, project-standards}
packages/api/src/mcp/tools/query/helpers.ts:406 -- the AS-alias regex omits the /i flag while the other two hint patterns include it, so lowercase as in any ClickHouse syntax-error context never triggers the hint.
- Fix: Either add /i with a tighter context anchor (e.g. require a preceding unexpected/near), or document the intentional case-sensitivity inline.
- _{correctness, reliability}
packages/api/src/mcp/tools/query/runEventPatterns.ts:317 -- the new field name shareOfTotal is computed against sampled rows, not the full population, which is the opposite of how the name will read to an agent.
- Fix: Rename to shareOfSample or expand the usage string to make the sampling caveat unmissable.
- _correctness
packages/api/src/mcp/tools/query/table.ts:91 -- when two select items share a case-insensitive alias, resolveOrderBy silently picks the first match.
- Fix: Detect duplicate case-insensitive alias matches and emit a mergeWarnings-style note so the agent knows which item won.
- _correctness
packages/api/src/mcp/tools/query/table.ts:127 -- an orderBy: 'count_distinct' with a select item that omits valueExpression falls through both synthesis branches and returns the bare identifier, producing an invalid ClickHouse query.
- Fix: Either require valueExpression for count_distinct at the schema level or emit a typed error/hint when the aggregate has no expression to synthesize from.
- _{adversarial, kieran-typescript}
packages/api/src/mcp/tools/query/runEventPatterns.ts:230 -- with trendBuckets: 0 the pattern miner still runs trend computation; Math.ceil(diffSeconds / 0) yields Infinity, falls through to the default granularity, and produces a one-bucket trend that the formatting layer then discards.
- Fix: Short-circuit trend computation inside minePatterns when trendBuckets <= 0 so the wasted work and the divide-by-zero path are both avoided.
- _correctness
packages/api/src/mcp/tools/query/helpers.ts:415 -- the V8-String overflow hint pattern targets length exceeds the maximum allowed size of V8 String; verify that this matches the actual error string the ClickHouse client or Node emits, otherwise the hint silently never fires.
- Fix: Reproduce the overflow once against the real client/runtime and either confirm the existing pattern or broaden it to also match Cannot create a string longer than / Invalid string length.
- _reliability
packages/api/src/utils/trimToolResponse.ts:32 -- the trimmed-branch return uses as T even though trimObject may inject a __hdx_trimmed sentinel and truncate string values, so the spread-into-output pattern in eventDeltas.ts, describeSource.ts, and runEventPatterns.ts leaks that sentinel into the agent-visible response.
- Fix: Either omit the sentinel before returning, type the trimmed branch as Partial<T> & { __hdx_trimmed?: true }, or destructure the sentinel out before spreading at each call site.
- _{kieran-typescript, adversarial}
packages/api/src/mcp/tools/query/table.ts:86 -- resolveOrderBy lives in table.ts but is imported from timeseries.ts, creating a sibling-tool dependency that obscures the module graph and pins shared logic to a tool registration file.
- Fix: Move resolveOrderBy and AGG_FN_NAMES into helpers.ts (or a dedicated orderBy.ts).
- _{maintainability, kieran-typescript}
packages/api/src/mcp/tools/query/schemas.ts:165 -- the orderBy zod description still says only "Column to sort results by"; agents won't discover the new aggFn-name resolution unless it's surfaced in the schema text.
- Fix: Append a one-line example showing that bare aggFn names like count or avg are auto-resolved.
- _{maintainability}
packages/api/src/mcp/tools/query/helpers.ts:423 -- the file now sits at ~423 lines, over the project's documented 300-line file-size guideline.
- Fix: Extract the new error-hint helpers (clickHouseErrorResult, errorHint) into a sibling errors.ts and consider lifting mergeWhereIntoSelectItems into its own module.
- _{project-standards}
packages/api/src/mcp/tools/query/runEventPatterns.ts:291 -- the new default topN: 20 silently truncates clusterCount patterns; only attentive agents that compare summary.clusterCount vs summary.patternsReturned will notice.
- Fix: Add an explicit truncated: true flag on the response when truncation occurs and call this out in the changeset.
- _api-contract

Reviewers (11): correctness, testing, maintainability, project-standards, kieran-typescript, api-contract, reliability, performance, adversarial, agent-native, learnings.

Testing gaps:

No coverage for the trendBuckets: 0 output-shape branch in runEventPatterns.ts.
No coverage for the single-bucket collapse hint in timeseries.ts (positive or guard cases).
No coverage for mergeWarnings reaching the agent end-to-end (or being dropped when the underlying query errors).
No coverage for the isTrimmed: true note-injection branches in eventDeltas.ts, describeSource.ts, or runEventPatterns.ts after the {data, isTrimmed} migration.
No coverage for resolveOrderBy with aliases containing spaces, with case-insensitive collisions, or with count_distinct/quantile shapes missing valueExpression.
The agent-friendly-cli-principles.md learning under ce-plugin/docs/solutions/ describes related ground (bounded output, actionable error messages) and is worth a glance, though it's CLI-framed.

Changes identified and validated through MCP performance evals: Query tool improvements: - Add top-level 'where' + 'whereLanguage' to table and timeseries tools with mergeWhereIntoSelectItems() shared helper (no inline duplication) - Add orderBy resolution for bare aggFn names (e.g. 'count' → 'count()') - Add single-bucket collapse detection for timeseries with granularity hint - Pass post-merge selectItems to resolveOrderBy (consistency fix) Event patterns improvements: - Add topN parameter (1-100, default 20) to control pattern count - Add trendBuckets parameter (0-60, default 24) with disable option - Restructure output: top-level 'summary' + 'patterns' + 'usage' guide - Add rank, shareOfTotal to each pattern for easier analysis Schema/description improvements: - Richer tool descriptions with examples, common mistakes, expression tips - Extract MCP_AGG_FN_OPTIONS as shared constant - Expanded whereLanguage description with syntax matching guidance - groupBy now documents multi-column and expression support Infrastructure: - Extract duplicated ClickHouse catch blocks into clickHouseErrorResult() with pattern-matched error hints (DateTime64, AS alias, response size) - trimToolResponse now returns { data, isTrimmed } for cheaper detection - Add granularity to external dashboard zod schemas (line/stacked_bar) - describeSource uses compact JSON output - Improve source/connection-not-found messages with recovery suggestions

P1: mergeWhereIntoSelectItems now returns { items, warnings } instead of a bare array. When a top-level where is silently dropped due to language mismatch, a warning is emitted and surfaced in the response JSON so the agent can recover. P2: Add 21 focused unit tests covering every branch of: - mergeWhereIntoSelectItems (empty, inject, AND-combine, mismatch warning, mixed) - errorHint (DateTime64, AS with word boundary, V8 overflow, no-match) - resolveOrderBy (undefined, passthrough, alias match, count/avg/quantile synthesis, case insensitivity, 'none' passthrough) P3 nitpicks: - Filter 'none' out of AGG_FN_NAMES so resolveOrderBy doesn't synthesize an invalid none(expr) ORDER BY clause - Use null, 2 in the single-bucket hint JSON.stringify for consistent pretty-print formatting with formatQueryResult - Narrow eslint-disable comments in trimToolResponse.ts to cover only the as T assertion, not the entire return statement

P0 fixes: - queryTool.test.ts: update event_patterns assertions from old output.result.{patterns,totalCount,sampledRows} shape to new output.{summary,patterns} envelope with sampledCount - dashboards.ts: add granularity to raw-SQL convertToInternalTileConfig pick() list (was silently dropped on PUT) - dashboards.ts: add granularity to convertToExternalTileChartConfig for Line/StackedBar (both builder and raw-SQL) so GET→PUT round-trips preserve the field P2 fixes: - timeseries.ts: apply resolveOrderBy so 'orderBy: count' works on timeseries too, not just table - resolveOrderBy: parse and re-append trailing ASC/DESC direction (e.g. 'count DESC' → 'count() DESC') instead of falling through - resolveOrderBy: skip synthesis for quantile without level to avoid generating invalid quantile(Duration) SQL P3 fixes: - runEventPatterns.ts: set sampleMultiplier: 1 in empty-data branch (semantically correct for no-data case) Tests: 2 new unit tests for ASC/DESC handling and quantile-without-level (30 total passing)

The granularity round-tripping on the external API is a separate concern from MCP tool quality — split it out to keep this PR focused and avoid touching the critical-path dashboards.ts file.

… hints, trim note - resolveOrderBy: synthesize count(DISTINCT expr) for count_distinct aggFn - resolveOrderBy: return canonical alias case on case-insensitive match - runEventPatterns: use shared clickHouseErrorResult for error hints - runEventPatterns: add 'Call hyperdx_list_sources' hint to source/connection errors - errorHint: drop /i flag on AS regex to avoid false positives on lowercase 'as' - describeSource: surface isTrimmed note when response is trimmed - Add unit tests for all new behaviors

… shape The event_deltas tool was merged before trimToolResponse was refactored to return { data, isTrimmed } instead of the raw value. Without this fix, every successful response would serialize the wrapper object ({data: ..., isTrimmed: false}) instead of the inner payload, breaking the documented output contract.

github-actions Bot added the review/tier-2 Low risk — AI review + quick human skim label May 25, 2026

brandon-pereira force-pushed the brandon/mcp-tool-quality-improvements branch from 0e02cd2 to ae01aef Compare May 25, 2026 16:09

github-actions Bot added review/tier-4 Critical — deep review + domain expert sign-off and removed review/tier-2 Low risk — AI review + quick human skim labels May 25, 2026

vercel Bot deployed to Preview May 25, 2026 16:12 View deployment

vercel Bot deployed to Preview May 25, 2026 16:35 View deployment

brandon-pereira added 2 commits May 25, 2026 11:07

brandon-pereira force-pushed the brandon/mcp-tool-quality-improvements branch from ec1c65b to c5d2232 Compare May 25, 2026 17:07

vercel Bot deployed to Preview May 25, 2026 17:10 View deployment

brandon-pereira added 2 commits May 25, 2026 11:10

revert: remove dashboards.ts and zod.ts changes (separate concern)

0255f48

The granularity round-tripping on the external API is a separate concern from MCP tool quality — split it out to keep this PR focused and avoid touching the critical-path dashboards.ts file.

github-actions Bot added review/tier-3 Standard — full human review required and removed review/tier-4 Critical — deep review + domain expert sign-off labels May 25, 2026

vercel Bot deployed to Preview May 25, 2026 17:16 View deployment

vercel Bot deployed to Preview May 25, 2026 17:48 View deployment

brandon-pereira requested review from a team and wrn14897 and removed request for a team May 25, 2026 20:28

wrn14897 previously approved these changes May 27, 2026

View reviewed changes

brandon-pereira added the automerge label May 27, 2026

Merge branch 'main' into brandon/mcp-tool-quality-improvements

95fe704

vercel Bot deployed to Preview May 27, 2026 22:02 View deployment

brandon-pereira dismissed wrn14897’s stale review via cc1279a May 27, 2026 22:37

vercel Bot deployed to Preview May 27, 2026 22:40 View deployment

wrn14897 approved these changes May 27, 2026

View reviewed changes

kodiakhq Bot merged commit a4b9fa8 into main May 27, 2026
19 checks passed

kodiakhq Bot deleted the brandon/mcp-tool-quality-improvements branch May 27, 2026 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): improve MCP tool quality — error hints, shared helpers, better messages#2337

feat(mcp): improve MCP tool quality — error hints, shared helpers, better messages#2337
kodiakhq[bot] merged 7 commits into
mainfrom
brandon/mcp-tool-quality-improvements

brandon-pereira commented May 25, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 25, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brandon-pereira commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Query tools: top-level where + orderBy resolution

Event patterns: topN, trendBuckets, restructured output

Schema & description improvements

Infrastructure

Why

Files changed (12)

Uh oh!

changeset-bot Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟡 Tier 3 — Standard

Uh oh!

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deep Review

🟡 P2 -- recommended

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brandon-pereira commented May 25, 2026 •

edited

Loading

Query tools: top-level `where` + orderBy resolution

changeset-bot Bot commented May 25, 2026 •

edited

Loading

vercel Bot commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading