fix(langfuse): replace noisy ToolInvoker input with focused tool call arguments#3520
Conversation
When content tracing is enabled, ToolInvoker TOOL-type observations were writing the full ChatMessage history as input, making individual tool observations in the Langfuse UI noisy and hard to read. This change makes handle() override the input with just the tool call arguments (tool_name + arguments dict) and populate output with the structured tool results (tool_name, result, error) for each ToolCallResult. The span name update (component_name - [tool_names]) is unchanged and continues to work regardless of content tracing state. Fixes deepset-ai#3435
|
Heads-up for maintainers This PR is from a fork and touches integrations whose integration tests require API keys. Affected integrations:
Please run the integration tests locally ( |
Coverage report (langfuse)Click to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||
bogdankostic
left a comment
There was a problem hiding this comment.
Thanks for raising this PR @chakshu-dhannawat! The input half works well. But the output rewrite needs to be adapted (see in-line code comments). Verified end-to-end against a real Langfuse trace from a ToolInvoker inside an Agent the TOOL observation's input becomes the focused [{tool_name, arguments}], but its output is unchanged (still the full {"tool_messages": [...], "state": {...}} blob).
Also, the PR body says populating the Tool Call Name filter is out of scope because the SDK has no parameter for it. Half right: langfuse-sdk 4.12.0 indeed has no explicit field (span.update() has no tool-call parameter). But Langfuse computes toolCallNames server-side at ingestion by pattern-matching the observation's output for tool-call-like objects. A flat output item counts as a tool call iff it has name (not tool_name!) + arguments + a marker key (id/index/known type).
That means neither of this PR's shapes can ever populate the filter. {"tool_name", "result", "error"} has neither a recognized name key nor arguments. But a small rename/enrichment of the output items does:
tool_results.append(
{
"id": tcr.origin.id or "",
"name": tcr.origin.tool_name,
"arguments": tcr.origin.arguments,
"result": tcr.result,
"error": tcr.error,
}
)| # Replace the noisy full message history with just the tool call arguments | ||
| span.raw_span().update(input=tool_calls_input) | ||
|
|
||
| output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("messages", []) |
There was a problem hiding this comment.
ToolInvoker.run emits its results under tool_messages, not messages. So .get("messages", []) is always [], tool_results stays empty, and span.update(output=...) is never called.
| output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("messages", []) | |
| output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("tool_messages", []) |
| ] | ||
| }, | ||
| "haystack.component.output": { | ||
| "messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)] |
There was a problem hiding this comment.
This is why the test passes despite the bug above: the fixture puts the tool response under a "messages" key, but a real ToolInvoker emits tool_messages. Please key it the way the component actually outputs so the test exercises the real path:
| "messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)] | |
| "tool_messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)] |
| "haystack.component.name": "tool_invoker", | ||
| "haystack.component.type": "ToolInvoker", | ||
| "haystack.component.input": {"messages": [ChatMessage.from_assistant(text="", tool_calls=[tool_call])]}, | ||
| "haystack.component.output": {"messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]}, |
There was a problem hiding this comment.
Same fixture-key issue here.
| "haystack.component.output": {"messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]}, | |
| "haystack.component.output": {"tool_messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]}, |
ToolInvoker.run emits results under 'tool_messages', not 'messages', so the span output was never populated. Correct the output key and update the test fixtures to key results as the component actually outputs them, so the test exercises the real path.
|
Good catch, thank you @bogdankostic, you're right. I've pushed a fix:
The |
Summary
Fixes #3435.
When a Haystack
Agentruns withLangfuseConnectorenabled,ToolInvokerTOOL-type observations were writing the entire ChatMessage history as theirinputblob, making individual tool observations noisy and hard to read in the Langfuse UI.This PR fixes
DefaultSpanHandler.handle()so that, when content tracing is enabled, the ToolInvoker observation'sinputandoutputare replaced with structured, focused data:input: list of{"tool_name": ..., "arguments": {...}}— one entry per tool call extracted from the assistant messagesoutput: list of{"tool_name": ..., "result": ..., "error": bool}— one entry perToolCallResultfrom the tool response messagesThe existing span name update (
tool_invoker - ['search_tool']) is unchanged and still runs regardless of content tracing state.Changes
integrations/langfuse/src/.../tracer.py: extend theToolInvokerbranch inDefaultSpanHandler.handle()to collect tool call arguments while building the name, then callspan.raw_span().update(input=..., output=...)when content tracing is onintegrations/langfuse/tests/test_tracer.py: two new tests:test_handle_tool_invoker_input_output_with_content_tracing— verifies input/output are set to focused datatest_handle_tool_invoker_no_content_tracing— verifies no input/output update when content tracing is disabledTest plan
[{"tool_name": "search_tool", "arguments": {...}}](not full message list)[{"tool_name": "search_tool", "result": "...", "error": False}]HAYSTACK_CONTENT_TRACING_ENABLEDis false