Skip to content

fix(langfuse): replace noisy ToolInvoker input with focused tool call arguments#3520

Open
chakshu-dhannawat wants to merge 4 commits into
deepset-ai:mainfrom
chakshu-dhannawat:fix/langfuse-tool-invoker-content
Open

fix(langfuse): replace noisy ToolInvoker input with focused tool call arguments#3520
chakshu-dhannawat wants to merge 4 commits into
deepset-ai:mainfrom
chakshu-dhannawat:fix/langfuse-tool-invoker-content

Conversation

@chakshu-dhannawat

Copy link
Copy Markdown
Contributor

Summary

Fixes #3435.

When a Haystack Agent runs with LangfuseConnector enabled, ToolInvoker TOOL-type observations were writing the entire ChatMessage history as their input blob, making individual tool observations noisy and hard to read in the Langfuse UI.

This PR fixes DefaultSpanHandler.handle() so that, when content tracing is enabled, the ToolInvoker observation's input and output are replaced with structured, focused data:

  • input: list of {"tool_name": ..., "arguments": {...}} — one entry per tool call extracted from the assistant messages
  • output: list of {"tool_name": ..., "result": ..., "error": bool} — one entry per ToolCallResult from the tool response messages

The existing span name update (tool_invoker - ['search_tool']) is unchanged and still runs regardless of content tracing state.

Note on toolCallNames: The Langfuse "Tool Call Name" filter reads from an internal toolCallNames field that the Langfuse SDK v4 does not expose via span.update(). That is a Langfuse SDK limitation and is out of scope for this fix.

Changes

  • integrations/langfuse/src/.../tracer.py: extend the ToolInvoker branch in DefaultSpanHandler.handle() to collect tool call arguments while building the name, then call span.raw_span().update(input=..., output=...) when content tracing is on
  • integrations/langfuse/tests/test_tracer.py: two new tests:
    • test_handle_tool_invoker_input_output_with_content_tracing — verifies input/output are set to focused data
    • test_handle_tool_invoker_no_content_tracing — verifies no input/output update when content tracing is disabled

Test plan

# 40 tests pass
python3 -m pytest integrations/langfuse/tests/test_tracer.py -x -q
  • ToolInvoker input is [{"tool_name": "search_tool", "arguments": {...}}] (not full message list)
  • ToolInvoker output is [{"tool_name": "search_tool", "result": "...", "error": False}]
  • No input/output update when HAYSTACK_CONTENT_TRACING_ENABLED is false
  • All existing tests still pass

When content tracing is enabled, ToolInvoker TOOL-type observations were
writing the full ChatMessage history as input, making individual tool
observations in the Langfuse UI noisy and hard to read.

This change makes handle() override the input with just the tool call
arguments (tool_name + arguments dict) and populate output with the
structured tool results (tool_name, result, error) for each ToolCallResult.

The span name update (component_name - [tool_names]) is unchanged and
continues to work regardless of content tracing state.

Fixes deepset-ai#3435
@chakshu-dhannawat chakshu-dhannawat requested a review from a team as a code owner July 2, 2026 03:00
@chakshu-dhannawat chakshu-dhannawat requested review from bogdankostic and removed request for a team July 2, 2026 03:00
@CLAassistant

CLAassistant commented Jul 2, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Heads-up for maintainers

This PR is from a fork and touches integrations whose integration tests require API keys.
Those tests are skipped in CI because fork PRs don't have access to repo secrets for security reasons.

Affected integrations:

  • langfuse

Please run the integration tests locally (hatch run test:integration inside each folder) before approving.

@github-actions github-actions Bot added the type:documentation Improvements or additions to documentation label Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Coverage report (langfuse)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/langfuse/src/haystack_integrations/tracing/langfuse
  tracer.py
Project Total  

This report was generated by python-coverage-comment-action

@bogdankostic bogdankostic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this PR @chakshu-dhannawat! The input half works well. But the output rewrite needs to be adapted (see in-line code comments). Verified end-to-end against a real Langfuse trace from a ToolInvoker inside an Agent the TOOL observation's input becomes the focused [{tool_name, arguments}], but its output is unchanged (still the full {"tool_messages": [...], "state": {...}} blob).

Also, the PR body says populating the Tool Call Name filter is out of scope because the SDK has no parameter for it. Half right: langfuse-sdk 4.12.0 indeed has no explicit field (span.update() has no tool-call parameter). But Langfuse computes toolCallNames server-side at ingestion by pattern-matching the observation's output for tool-call-like objects. A flat output item counts as a tool call iff it has name (not tool_name!) + arguments + a marker key (id/index/known type).

That means neither of this PR's shapes can ever populate the filter. {"tool_name", "result", "error"} has neither a recognized name key nor arguments. But a small rename/enrichment of the output items does:

tool_results.append(
    {
        "id": tcr.origin.id or "",
        "name": tcr.origin.tool_name,
        "arguments": tcr.origin.arguments,
        "result": tcr.result,
        "error": tcr.error,
    }
)

# Replace the noisy full message history with just the tool call arguments
span.raw_span().update(input=tool_calls_input)

output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("messages", [])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToolInvoker.run emits its results under tool_messages, not messages. So .get("messages", []) is always [], tool_results stays empty, and span.update(output=...) is never called.

Suggested change
output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("messages", [])
output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("tool_messages", [])

]
},
"haystack.component.output": {
"messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why the test passes despite the bug above: the fixture puts the tool response under a "messages" key, but a real ToolInvoker emits tool_messages. Please key it the way the component actually outputs so the test exercises the real path:

Suggested change
"messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)]
"tool_messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)]

"haystack.component.name": "tool_invoker",
"haystack.component.type": "ToolInvoker",
"haystack.component.input": {"messages": [ChatMessage.from_assistant(text="", tool_calls=[tool_call])]},
"haystack.component.output": {"messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fixture-key issue here.

Suggested change
"haystack.component.output": {"messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]},
"haystack.component.output": {"tool_messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]},

ToolInvoker.run emits results under 'tool_messages', not 'messages', so
the span output was never populated. Correct the output key and update the
test fixtures to key results as the component actually outputs them, so the
test exercises the real path.
@chakshu-dhannawat

chakshu-dhannawat commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Good catch, thank you @bogdankostic, you're right. ToolInvoker.run emits its results under tool_messages (@component.output_types(tool_messages=...)), so the previous .get("messages", []) always returned [] and the output was never set. The test passed only because its fixture used the same wrong key.

I've pushed a fix:

  • tracer.py: read the output from tool_messages
  • test_tracer.py: key the fixtures as the component actually outputs, so the test now exercises the real path (verified it fails without the tracer fix)

The input half is unchanged. Ready for another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:langfuse type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(langfuse): Observations missing toolCalls/toolCallNames fields; ToolInvoker misses tool arguments

3 participants