fix(langfuse): replace noisy ToolInvoker input with focused tool call arguments by chakshu-dhannawat · Pull Request #3520 · deepset-ai/haystack-core-integrations

chakshu-dhannawat · 2026-07-02T03:00:19Z

Summary

When a Haystack Agent runs with LangfuseConnector enabled, ToolInvoker TOOL-type observations were writing the entire ChatMessage history as their input blob, making individual tool observations noisy and hard to read in the Langfuse UI.

This PR fixes DefaultSpanHandler.handle() so that, when content tracing is enabled, the ToolInvoker observation's input and output are replaced with structured, focused data:

input: list of {"tool_name": ..., "arguments": {...}} — one entry per tool call extracted from the assistant messages
output: list of {"tool_name": ..., "result": ..., "error": bool} — one entry per ToolCallResult from the tool response messages

The existing span name update (tool_invoker - ['search_tool']) is unchanged and still runs regardless of content tracing state.

Note on toolCallNames: The Langfuse "Tool Call Name" filter reads from an internal toolCallNames field that the Langfuse SDK v4 does not expose via span.update(). That is a Langfuse SDK limitation and is out of scope for this fix.

Changes

integrations/langfuse/src/.../tracer.py: extend the ToolInvoker branch in DefaultSpanHandler.handle() to collect tool call arguments while building the name, then call span.raw_span().update(input=..., output=...) when content tracing is on
integrations/langfuse/tests/test_tracer.py: two new tests:
- test_handle_tool_invoker_input_output_with_content_tracing — verifies input/output are set to focused data
- test_handle_tool_invoker_no_content_tracing — verifies no input/output update when content tracing is disabled

Test plan

# 40 tests pass
python3 -m pytest integrations/langfuse/tests/test_tracer.py -x -q

ToolInvoker input is [{"tool_name": "search_tool", "arguments": {...}}] (not full message list)
ToolInvoker output is [{"tool_name": "search_tool", "result": "...", "error": False}]
No input/output update when HAYSTACK_CONTENT_TRACING_ENABLED is false
All existing tests still pass

When content tracing is enabled, ToolInvoker TOOL-type observations were writing the full ChatMessage history as input, making individual tool observations in the Langfuse UI noisy and hard to read. This change makes handle() override the input with just the tool call arguments (tool_name + arguments dict) and populate output with the structured tool results (tool_name, result, error) for each ToolCallResult. The span name update (component_name - [tool_names]) is unchanged and continues to work regardless of content tracing state. Fixes deepset-ai#3435

CLAassistant · 2026-07-02T03:00:26Z

All committers have signed the CLA.

github-actions · 2026-07-02T03:01:12Z

Heads-up for maintainers

This PR is from a fork and touches integrations whose integration tests require API keys.
Those tests are skipped in CI because fork PRs don't have access to repo secrets for security reasons.

Affected integrations:

langfuse

Please run the integration tests locally (hatch run test:integration inside each folder) before approving.

…415/N817

github-actions · 2026-07-02T03:34:07Z

Coverage report (langfuse)

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
integrations/langfuse/src/haystack_integrations/tracing/langfuse
tracer.py
Project Total

_{This report was generated by python-coverage-comment-action}

bogdankostic

Thanks for raising this PR @chakshu-dhannawat! The input half works well. But the output rewrite needs to be adapted (see in-line code comments). Verified end-to-end against a real Langfuse trace from a ToolInvoker inside an Agent the TOOL observation's input becomes the focused [{tool_name, arguments}], but its output is unchanged (still the full {"tool_messages": [...], "state": {...}} blob).

Also, the PR body says populating the Tool Call Name filter is out of scope because the SDK has no parameter for it. Half right: langfuse-sdk 4.12.0 indeed has no explicit field (span.update() has no tool-call parameter). But Langfuse computes toolCallNames server-side at ingestion by pattern-matching the observation's output for tool-call-like objects. A flat output item counts as a tool call iff it has name (not tool_name!) + arguments + a marker key (id/index/known type).

That means neither of this PR's shapes can ever populate the filter. {"tool_name", "result", "error"} has neither a recognized name key nor arguments. But a small rename/enrichment of the output items does:

tool_results.append(
    {
        "id": tcr.origin.id or "",
        "name": tcr.origin.tool_name,
        "arguments": tcr.origin.arguments,
        "result": tcr.result,
        "error": tcr.error,
    }
)

bogdankostic · 2026-07-02T14:58:42Z

+                # Replace the noisy full message history with just the tool call arguments
+                span.raw_span().update(input=tool_calls_input)
+
+                output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("messages", [])


ToolInvoker.run emits its results under tool_messages, not messages. So .get("messages", []) is always [], tool_results stays empty, and span.update(output=...) is never called.

Suggested change

output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("messages", [])

output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("tool_messages", [])

bogdankostic · 2026-07-02T15:00:50Z

+                ]
+            },
+            "haystack.component.output": {
+                "messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)]


This is why the test passes despite the bug above: the fixture puts the tool response under a "messages" key, but a real ToolInvoker emits tool_messages. Please key it the way the component actually outputs so the test exercises the real path:

Suggested change

"messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)]

"tool_messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)]

bogdankostic · 2026-07-02T15:01:42Z

+            "haystack.component.name": "tool_invoker",
+            "haystack.component.type": "ToolInvoker",
+            "haystack.component.input": {"messages": [ChatMessage.from_assistant(text="", tool_calls=[tool_call])]},
+            "haystack.component.output": {"messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]},


Same fixture-key issue here.

Suggested change

"haystack.component.output": {"messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]},

"haystack.component.output": {"tool_messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]},

ToolInvoker.run emits results under 'tool_messages', not 'messages', so the span output was never populated. Correct the output key and update the test fixtures to key results as the component actually outputs them, so the test exercises the real path.

chakshu-dhannawat · 2026-07-03T03:36:15Z

Good catch, thank you @bogdankostic, you're right. ToolInvoker.run emits its results under tool_messages (@component.output_types(tool_messages=...)), so the previous .get("messages", []) always returned [] and the output was never set. The test passed only because its fixture used the same wrong key.

I've pushed a fix:

tracer.py: read the output from tool_messages
test_tracer.py: key the fixtures as the component actually outputs, so the test now exercises the real path (verified it fails without the tracer fix)

The input half is unchanged. Ready for another look.

chakshu-dhannawat requested a review from a team as a code owner July 2, 2026 03:00

chakshu-dhannawat requested review from bogdankostic and removed request for a team July 2, 2026 03:00

github-actions Bot added the integration:langfuse label Jul 2, 2026

github-actions Bot added the type:documentation Improvements or additions to documentation label Jul 2, 2026

chakshu-dhannawat added 2 commits July 2, 2026 12:08

fix(langfuse): move test imports to module level to satisfy ruff PLC0…

fc05156

…415/N817

style(langfuse): apply ruff format to test_tracer.py

f98edd4

bogdankostic requested changes Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(langfuse): replace noisy ToolInvoker input with focused tool call arguments#3520

fix(langfuse): replace noisy ToolInvoker input with focused tool call arguments#3520
chakshu-dhannawat wants to merge 4 commits into
deepset-ai:mainfrom
chakshu-dhannawat:fix/langfuse-tool-invoker-content

chakshu-dhannawat commented Jul 2, 2026

Uh oh!

CLAassistant commented Jul 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

bogdankostic left a comment

Uh oh!

bogdankostic Jul 2, 2026

Uh oh!

bogdankostic Jul 2, 2026

Uh oh!

bogdankostic Jul 2, 2026

Uh oh!

chakshu-dhannawat commented Jul 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("messages", [])
	output_messages = span.get_data().get(_COMPONENT_OUTPUT_KEY, {}).get("tool_messages", [])

	"messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)]
	"tool_messages": [ChatMessage.from_tool("RAG stands for Retrieval-Augmented Generation", tool_call)]

	"haystack.component.output": {"messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]},
	"haystack.component.output": {"tool_messages": [ChatMessage.from_tool("Sunny, 28°C", tool_call)]},

Uh oh!

Conversation

chakshu-dhannawat commented Jul 2, 2026

Summary

Changes

Test plan

Uh oh!

CLAassistant commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Coverage report (langfuse)

Uh oh!

bogdankostic left a comment

Choose a reason for hiding this comment

Uh oh!

bogdankostic Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

bogdankostic Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

bogdankostic Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chakshu-dhannawat commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jul 2, 2026 •

edited

Loading

chakshu-dhannawat commented Jul 3, 2026 •

edited

Loading