feat(mcp): improve MCP tool quality — error hints, shared helpers, better messages#2337
Conversation
🦋 Changeset detectedLatest commit: cc1279a The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🟡 Tier 3 — StandardIntroduces new logic, modifies core functionality, or touches areas with non-trivial risk. Why this tier:
Review process: Full human review — logic, architecture, edge cases. Stats
|
E2E Test Results✅ All tests passed • 188 passed • 3 skipped • 1336s
Tests ran across 4 shards in parallel. |
0e02cd2 to
ae01aef
Compare
Deep Review✅ No critical issues found. Several P2 items below are worth addressing before merge, and a number of P3 nits and testing gaps round out the report. 🟡 P2 -- recommended
🔵 P3 nitpicks (13)
Reviewers (11): correctness, testing, maintainability, project-standards, kieran-typescript, api-contract, reliability, performance, adversarial, agent-native, learnings. Testing gaps:
|
Changes identified and validated through MCP performance evals:
Query tool improvements:
- Add top-level 'where' + 'whereLanguage' to table and timeseries tools
with mergeWhereIntoSelectItems() shared helper (no inline duplication)
- Add orderBy resolution for bare aggFn names (e.g. 'count' → 'count()')
- Add single-bucket collapse detection for timeseries with granularity hint
- Pass post-merge selectItems to resolveOrderBy (consistency fix)
Event patterns improvements:
- Add topN parameter (1-100, default 20) to control pattern count
- Add trendBuckets parameter (0-60, default 24) with disable option
- Restructure output: top-level 'summary' + 'patterns' + 'usage' guide
- Add rank, shareOfTotal to each pattern for easier analysis
Schema/description improvements:
- Richer tool descriptions with examples, common mistakes, expression tips
- Extract MCP_AGG_FN_OPTIONS as shared constant
- Expanded whereLanguage description with syntax matching guidance
- groupBy now documents multi-column and expression support
Infrastructure:
- Extract duplicated ClickHouse catch blocks into clickHouseErrorResult()
with pattern-matched error hints (DateTime64, AS alias, response size)
- trimToolResponse now returns { data, isTrimmed } for cheaper detection
- Add granularity to external dashboard zod schemas (line/stacked_bar)
- describeSource uses compact JSON output
- Improve source/connection-not-found messages with recovery suggestions
P1: mergeWhereIntoSelectItems now returns { items, warnings } instead of
a bare array. When a top-level where is silently dropped due to language
mismatch, a warning is emitted and surfaced in the response JSON so the
agent can recover.
P2: Add 21 focused unit tests covering every branch of:
- mergeWhereIntoSelectItems (empty, inject, AND-combine, mismatch warning, mixed)
- errorHint (DateTime64, AS with word boundary, V8 overflow, no-match)
- resolveOrderBy (undefined, passthrough, alias match, count/avg/quantile
synthesis, case insensitivity, 'none' passthrough)
P3 nitpicks:
- Filter 'none' out of AGG_FN_NAMES so resolveOrderBy doesn't synthesize
an invalid none(expr) ORDER BY clause
- Use null, 2 in the single-bucket hint JSON.stringify for consistent
pretty-print formatting with formatQueryResult
- Narrow eslint-disable comments in trimToolResponse.ts to cover only the
as T assertion, not the entire return statement
ec1c65b to
c5d2232
Compare
P0 fixes:
- queryTool.test.ts: update event_patterns assertions from old
output.result.{patterns,totalCount,sampledRows} shape to new
output.{summary,patterns} envelope with sampledCount
- dashboards.ts: add granularity to raw-SQL convertToInternalTileConfig
pick() list (was silently dropped on PUT)
- dashboards.ts: add granularity to convertToExternalTileChartConfig for
Line/StackedBar (both builder and raw-SQL) so GET→PUT round-trips
preserve the field
P2 fixes:
- timeseries.ts: apply resolveOrderBy so 'orderBy: count' works on
timeseries too, not just table
- resolveOrderBy: parse and re-append trailing ASC/DESC direction
(e.g. 'count DESC' → 'count() DESC') instead of falling through
- resolveOrderBy: skip synthesis for quantile without level to avoid
generating invalid quantile(Duration) SQL
P3 fixes:
- runEventPatterns.ts: set sampleMultiplier: 1 in empty-data branch
(semantically correct for no-data case)
Tests: 2 new unit tests for ASC/DESC handling and quantile-without-level
(30 total passing)
The granularity round-tripping on the external API is a separate concern from MCP tool quality — split it out to keep this PR focused and avoid touching the critical-path dashboards.ts file.
… hints, trim note - resolveOrderBy: synthesize count(DISTINCT expr) for count_distinct aggFn - resolveOrderBy: return canonical alias case on case-insensitive match - runEventPatterns: use shared clickHouseErrorResult for error hints - runEventPatterns: add 'Call hyperdx_list_sources' hint to source/connection errors - errorHint: drop /i flag on AS regex to avoid false positives on lowercase 'as' - describeSource: surface isTrimmed note when response is trimmed - Add unit tests for all new behaviors
… shape
The event_deltas tool was merged before trimToolResponse was refactored
to return { data, isTrimmed } instead of the raw value. Without this
fix, every successful response would serialize the wrapper object
({data: ..., isTrimmed: false}) instead of the inner payload, breaking
the documented output contract.
What
Targeted improvements to MCP tool quality, identified and validated through performance evals against real agent workflows. These changes reduce tool-call failures, improve agent self-correction, and make tool outputs more actionable.
Query tools: top-level
where+ orderBy resolutionTable & Timeseries now accept a top-level
where/whereLanguagethat scopes the entire query. A sharedmergeWhereIntoSelectItems()helper injects it into each select item's aggCondition (ANDed with any per-item where). No more duplicated merge logic.Table gets an
orderByresolver: agents frequently writeorderBy: "count", which generatesORDER BY count ASC— but ClickHouse can't resolve barecountsince it's a function. The resolver maps aggFn names to aliases or synthesized expressions (count(),avg(Duration), etc.).Timeseries now detects single-bucket collapse (1 row returned with explicit granularity) and adds a hint to adjust the time range or granularity.
Event patterns: topN, trendBuckets, restructured output
topN(1-100, default 20) — control how many patterns are returnedtrendBuckets(0-60, default 24) — set to 0 to disable trend sparklines (smaller response)summary+patternsarray +usageguiderankandshareOfTotalfor easier analysisSchema & description improvements
MCP_AGG_FN_OPTIONSextracted as shared constant (single source of truth for aggFn names)whereLanguagedescription now includes syntax matching guidance and common mistakesgroupBydocuments multi-column comma-delimited input and arbitrary ClickHouse expressionsvalueExpressiondescription shows useful patterns (duration conversion, attribute casting, boolean→numeric)Infrastructure
clickHouseErrorResult()with pattern-matched hints for DateTime64 casting, AS alias quoting, and response size overflowtrimToolResponse: Now returns{ data, isTrimmed }so callers detect trimming without re-serializinggranularityadded to line/stacked_bar external dashboard zod schemasdescribeSource: Switched to compact JSON output (saves context tokens)hyperdx_list_sourceserrorHintAS-alias regex uses\bAS\bword boundary to avoid false matches on words like DATABASE, ALIASWhy
These changes came out of running MCP performance evals against real agent tasks. The combined effect:
Files changed (12)
helpers.tsmergeWhereIntoSelectItems(),clickHouseErrorResult(), error hints, better not-found messagestable.tstimeseries.tsschemas.tsMCP_AGG_FN_OPTIONS, richer descriptions for all fieldseventPatterns.tstopN,trendBucketsparametersrunEventPatterns.tsdescribeSource.tsdashboards.tsgranularityinconvertToInternalTileConfigtrimToolResponse.ts{ data, isTrimmed }trimToolResponse.test.tsqueryTool.test.tstopNpropertyzod.tsSQLIntervalSchema+granularityon time chart schemas