Add cascade latency levers and streaming robustness by sauhardjain · Pull Request #140 · ServiceNow/eva

sauhardjain · 2026-06-08T21:09:52Z

Summary

This PR adds opt-in latency controls for cascade (STT -> LLM -> TTS) systems and fixes audit/history edge cases that can distort benchmark results. The default cascade configuration remains unchanged unless these flags are enabled.

Motivation

The cascade harness can add avoidable latency when it waits for a full LLM response before sending text to TTS, or when tool calls leave the caller in silence. It also does not expose the parallel tool-call setting used by the ElevenAgents cascade assistant configuration, which makes cross-harness comparisons harder to interpret.

Interrupted and tool-heavy turns need careful handling as well. An interrupted transfer can leave an assistant tool call without a matching tool result, and streamed speech can reach TTS before the LLM call is cancelled or fails. This PR keeps the next model request valid and keeps the audit log aligned with emitted audio.

Changes

Adds EVA_MODEL__PRE_TOOL_SPEECH to let the model produce a brief lead-in before tool calls. The lead-in is model-generated, not templated filler.
Adds EVA_MODEL__LLM_STREAMING for sentence-level Chat Completions streaming to TTS. Responses API deployments warn and use the existing non-streaming path.
Adds EVA_MODEL__PARALLEL_TOOL_CALLS to forward the provider setting when tools are present. Leaving it unset preserves provider defaults; setting it to false matches the ElevenAgents assistant config.
Repairs orphaned assistant tool calls before replaying history to the model, while ignoring malformed non-string tool result IDs.
Records any streamed text that was already emitted if a stream is cancelled or fails mid-turn.
Updates transcript processing so fully spoken streamed responses can span multiple TTS segments.
Documents the new flags in .env.example and the experiment setup guide.

Reviewing

Run one cascade record with the new flags unset and confirm that default behavior is unchanged.

Then run a tool-using cascade record with EVA_MODEL__PRE_TOOL_SPEECH=auto, EVA_MODEL__LLM_STREAMING=true, and EVA_MODEL__PARALLEL_TOOL_CALLS=false. Check that speech can begin before the final LLM response is complete, that the audit log records the assistant response, and that interrupted streamed speech is preserved instead of replaced by a generic error.

- 'cartesia' now maps to Cartesia's latest ink-2 (CartesiaTurnsSTTService): server-driven endpointing, so ModelConfig auto-forces external turn strategies + VAD off. The older ink-whisper is preserved as 'cartesia-multilingual' (standard VAD / smart-turn). - Cartesia STT declares the 16 kHz pipeline input rate (STT_INPUT_SAMPLE_RATE), not SAMPLE_RATE (24 kHz, the TTS output rate). The base STTService doesn't resample, so 24 kHz mislabels 16 kHz audio (~1.5x fast/pitched) and garbles spelled letters / confirmation codes. Other STT providers are unchanged. - pipecat_server logs ink-2 eager-end / resume / committed-end diagnostics; only committed turn boundaries drive aggregation.

…ming robustness CASCADE-only; every lever defaults off, so the canonical config is unchanged unless set. - pre_tool_speech {off, auto}: 'auto' adds a write-aware lead-in directive (disclose cost + confirm before a write action); no deterministic fillers. - llm_streaming: complete_stream() streams Chat-Completions tokens to TTS sentence-by-sentence; Responses-API deployments fall back to non-streaming (one warning). - parallel_tool_calls: tri-state ModelConfig knob, forwarded only when tools are present. - _pair_orphaned_tool_calls: pair an assistant tool_call left unanswered by transfer_to_agent or a barge-in, so Responses-API models don't 400 ("No tool output found") on the next turn. - _record_partial_streamed_output: record already-spoken streamed text on interruption/failure rather than dropping it or speaking a generic error over it. - truncate_to_spoken: match across TTS segments so streaming doesn't truncate scored transcripts.

sauhardjain · 2026-06-09T01:24:27Z

cc @fanny-riols as discussed last week!

sauhardjain added 2 commits June 8, 2026 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cascade latency levers and streaming robustness#140

Add cascade latency levers and streaming robustness#140
sauhardjain wants to merge 2 commits into
ServiceNow:mainfrom
sauhardjain:pr/cascade-orchestration

sauhardjain commented Jun 8, 2026 •

edited

Loading

Uh oh!

sauhardjain commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sauhardjain commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Reviewing

Uh oh!

sauhardjain commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sauhardjain commented Jun 8, 2026 •

edited

Loading