docs(designs): background tasks#783
docs(designs): background tasks#783gautamsirdeshmukh wants to merge 1 commit intostrands-agents:mainfrom
Conversation
Documentation Preview ReadyYour documentation preview has been successfully deployed! Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-783/docs/user-guide/quickstart/overview/ Updated at: 2026-04-27T18:26:18.613Z |
| | Action | Example | Description | | ||
| |--------|---------|-------------| | ||
| | dispatch | `background({ action: "dispatch", tool: "search_web", args: { query: "..." } })` | Dispatch work, receive a task ID immediately | | ||
| | get | `background({ action: "get", taskId: "..." })` | Check task status (polling is never required — results are injected automatically) | |
There was a problem hiding this comment.
Doc feedback: It's a little unclear what "results are injected automatically" at this point; it raises more questions
| | Dynamic agent (use_agent) | `background({ ..., tool: "use_agent", args: { prompt: "...", tools: [...] } })` | | ||
| | Any callable | `background({ ..., tool: "x", args: { ... } })` | | ||
|
|
||
| The model keeps reasoning and dispatching more work as needed. As results come in, they are delivered as tagged messages appended to the conversation, not as `tool_result` blocks. The dispatch returns a task ID as a normal `tool_result` (satisfying the provider API's synchronous pairing requirement), and the actual result arrives later as a separate message tagged with the originating tool and task ID. The model sees them at the start of its next turn. The tool definition instructs the model to continue working without fabricating results for dispatched tasks. |
There was a problem hiding this comment.
What will the implication of this be on context management features like tool result offloading?
There was a problem hiding this comment.
will these tool results be lost if context is compressed mid-task?
| |------|------| | ||
| | Each tool has a focused schema — simpler for the model per-call | Three tools in the registry instead of one | | ||
| | Matches the MCP Tasks protocol shape (`tasks/get`, `tasks/cancel`) | Model must discover and learn three tools instead of one | | ||
| | No `action` parameter to misuse | More surface area to maintain | |
There was a problem hiding this comment.
More chance of conflicts too.
It might be good as a team to align on what we prefer/generally do; that also helps us shape future tools
|
|
||
| | Strategy | Pros | Cons | | ||
| |----------|------|------| | ||
| | Tagged message injection (this proposal) | No history mutation; simple to implement; model sees results at clean turn boundaries | Model must learn from tool definition that results arrive as messages, not `tool_result` blocks | |
There was a problem hiding this comment.
Doc feedback: It would be good to detail what it is before going into the pros & cons
| | Dynamic agent (use_agent) | `background({ ..., tool: "use_agent", args: { prompt: "...", tools: [...] } })` | | ||
| | Any callable | `background({ ..., tool: "x", args: { ... } })` | | ||
|
|
||
| The model keeps reasoning and dispatching more work as needed. As results come in, they are delivered as tagged messages appended to the conversation, not as `tool_result` blocks. The dispatch returns a task ID as a normal `tool_result` (satisfying the provider API's synchronous pairing requirement), and the actual result arrives later as a separate message tagged with the originating tool and task ID. The model sees them at the start of its next turn. The tool definition instructs the model to continue working without fabricating results for dispatched tasks. |
There was a problem hiding this comment.
The model sees them at the start of its next turn
need a bit more explanation here
|
|
||
| | Strategy | Pros | Cons | | ||
| |----------|------|------| | ||
| | Tagged message injection (this proposal) | No history mutation; simple to implement; model sees results at clean turn boundaries | Model must learn from tool definition that results arrive as messages, not `tool_result` blocks | |
There was a problem hiding this comment.
Any concern about prompt injection? E.g. we've had model reject data because it seems like prompt injection - do we have any idea if that would/might happen here?
| | Dynamic agent (use_agent) | `background({ ..., tool: "use_agent", args: { prompt: "...", tools: [...] } })` | | ||
| | Any callable | `background({ ..., tool: "x", args: { ... } })` | | ||
|
|
||
| The model keeps reasoning and dispatching more work as needed. As results come in, they are delivered as tagged messages appended to the conversation, not as `tool_result` blocks. The dispatch returns a task ID as a normal `tool_result` (satisfying the provider API's synchronous pairing requirement), and the actual result arrives later as a separate message tagged with the originating tool and task ID. The model sees them at the start of its next turn. The tool definition instructs the model to continue working without fabricating results for dispatched tasks. |
There was a problem hiding this comment.
What if we waited to append the tool use until we had the tool result. This is actually what we do in TS already. Then you don't need to send a "task id" tool result and the actual tool result can be packed into a tool result message.
|
|
||
| --- | ||
|
|
||
| ## Design Alternatives |
There was a problem hiding this comment.
can we extend AgentTool so tools expose tool.backgroundable or sth? so it's not a wrapper but part of the tool, that it can be turned to background? 🤔
| |----------|------|------| | ||
| | Tagged message injection (this proposal) | No history mutation; simple to implement; model sees results at clean turn boundaries | Model must learn from tool definition that results arrive as messages, not `tool_result` blocks | | ||
| | Retroactive `tool_result` (Mastra) | Natural for the model — standard tool pairing, no new patterns to learn | Mutates conversation history after the model has already reasoned past that point; if compaction removes the original `tool_use`, the retroactive result has nothing to pair with; provider validation risks | | ||
| | Explicit polling (LangChain) | Zero hallucination risk — model actively retrieves results | Token overhead from repeated status checks; wasted model calls when results aren't ready | |
There was a problem hiding this comment.
Will we support Explicit polling as option regardless or no?
There was a problem hiding this comment.
Token overhead from repeated status checks; wasted model calls when results aren't ready
There's a world where we could auto-trim old messages, but I do agree this can be a waste
There was a problem hiding this comment.
Will we support Explicit polling as option regardless or no?
Oh, I see that Status + auto-notification is what I was suggesting?
|
|
||
| ### The model is never idle | ||
|
|
||
| Work runs in the background while the model keeps reasoning. A coordinator dispatches 4 researcher agents and immediately moves on to structuring its report outline; when results arrive, they slot into a framework the model has already prepared. |
There was a problem hiding this comment.
How do we handle tool event streaming? If the tool is running in the background, are we still able to stream back intermediate events? Is that something we want.
There was a problem hiding this comment.
On the same note, how do we handle agent stream, when we re-invoke the agent from background tool result? Is hte agent invoked as part of previous agent.stream_async call, or is this a separate async method? in which case we need a way to actually stream the agent response
(I think solution is callback handler, but not sure if that's the pattern we like, i guess?)
| | Tagged message injection (this proposal) | No history mutation; simple to implement; model sees results at clean turn boundaries | Model must learn from tool definition that results arrive as messages, not `tool_result` blocks | | ||
| | Retroactive `tool_result` (Mastra) | Natural for the model — standard tool pairing, no new patterns to learn | Mutates conversation history after the model has already reasoned past that point; if compaction removes the original `tool_use`, the retroactive result has nothing to pair with; provider validation risks | | ||
| | Explicit polling (LangChain) | Zero hallucination risk — model actively retrieves results | Token overhead from repeated status checks; wasted model calls when results aren't ready | | ||
| | Status + auto-notification (Claude Agent SDK) | Clear "async_launched" signal reduces hallucination; model can poll or wait | More complex — requires both notification infrastructure and polling tools | |
There was a problem hiding this comment.
Can you give a bit more context here? how do they handle notification? Why didn't we go with this option?
Assuming tasks (async calling) will need to align with MCP, I'd expect Claude SDK to be the closest, good reference implementation
There was a problem hiding this comment.
More complex — requires both notification infrastructure and polling tools
is that a problem? claude is writing the code anyways :p
I like this approach better compared to what's recommended tbh. The main reason is, the agent might be in the middle of a task, and your tool result can be huge. Dumping it can confuse the agent. In this approach, we leave a notification (instead of the whole tool result), and let the agent get to it, when they can.
There was a problem hiding this comment.
im also wondering why we didn't go with this option. "more complex" from a development/implementation perspective or maintainability and customer use perspective? if this produces better results, i'd rather go with this.
might be worth running a few test scripts with each access method to get some stats on success rates of each.
| | No risk of hallucinated results or polling loops | Only suited for single long-running tasks, not parallel workloads | | ||
| | Works with any existing tool without modification | Requires platform-level re-invocation infrastructure | | ||
|
|
||
| **This approach is complementary, not competing.** `background()` keeps the model productive while multiple tasks run. Model-unaware async releases compute when the model has nothing else to do. In a managed deployment, both compose: the model dispatches background tasks, and the platform releases compute while waiting for results. |
There was a problem hiding this comment.
@JackYPCOnline, @pgrayy this sounds like the invoke checkpoint stuff maybe?
There was a problem hiding this comment.
it is more like very initial version: dispatch tool call [model call, tool calls in our example], wrap the call in a durable provider service.
|
|
||
| ## Extension to Agent Harness | ||
|
|
||
| Background Tasks as proposed are ephemeral: in-process, tracked in memory, scoped to the agent's lifetime. If the agent process dies, the tasks die with it. This is acceptable for interactive sessions and short-lived agentic workflows, but does not scale for long-running production tasks. |
There was a problem hiding this comment.
So no integration with session-manager until we extend it further? I think that's fine, just trying to clarify
|
|
||
| The model calls a tool normally. Behind the scenes, the framework intercepts the result, releases the agent process, and waits for the external work to complete. The agent is re-invoked when ready, and the result is injected as if the tool ran synchronously. The model never knows anything was async. | ||
|
|
||
| This is the approach taken by the [Strands Durability Plugin (SARK)](https://sark-docs.beta.harmony.a2z.com/sdo-strands-durability-plugin/developer-guide/async-execution). SARK's interrupt-based session release is more applicable to the durable task phase than to the ephemeral mechanism proposed here. |
There was a problem hiding this comment.
cc @JackYPCOnline are you aware of this, as part of durability work?
|
|
||
| The model calls the same `background()` tool. What changes is the infrastructure underneath. Tasks run outside the agent process and survive process death. The agent can resume from where it left off, and task metadata persists across sessions. | ||
|
|
||
| Durable background tasks enable fire-and-forget: the agent can dispatch work, exit, and pick up results on its next invocation. They also open the door to ambient, long-running processes — background agents that persist across sessions, react to events, and operate alongside the main agent over extended timeframes. |
There was a problem hiding this comment.
I think there's a middleground here for "durable-tools" of a sort? E.g. tools could-opt into this by returning a token that represents the tool response, and that token is stored in session-management; upon resume the token is given back to the tool to check it's status and it effectively resumes the task
|
|
||
| | Pros | Cons | | ||
| |------|------| | ||
| | Developer controls which tools are safe to background (stateful tools, side effects) | Developer decides what's backgroundable, not the model — no per-conversation flexibility | |
There was a problem hiding this comment.
If user defined tools, that makes sense. What about sdk vended tools.
|
|
||
| ## Proposal | ||
|
|
||
| Background Tasks give the model control over *when* work runs, not just *what*. A single `background()` tool lets the model dispatch work, keep reasoning, and react to results as they arrive: |
There was a problem hiding this comment.
How does the model decide when to call a tool in background mode?
There was a problem hiding this comment.
+1 similarly, if we were to introduce polling, when would the model poll?
There was a problem hiding this comment.
when would the model poll?
depends on what you tell the model. if you don't tell anything, it starts and keeps polling, causing context bloat
|
|
||
| ← search_web completes → | ||
|
|
||
| [message] { tool: "search_web", taskId: "t-1", status: "completed", |
There was a problem hiding this comment.
nit: this is the start of Turn 4
|
|
||
| ### Per-tool config | ||
|
|
||
| Instead of a meta-tool or a separate list, the background property lives on each tool definition. The model still calls tools normally — the SDK checks the per-tool config and backgrounds the call automatically. |
There was a problem hiding this comment.
Does it make sense to allow tools to opt-in to be background tasks only? E.g. I could imagine a scenario where a tool always returns an ephemeral/durable response and so instead of allowing the agent to invoke it synchronously it forces it to be background?
There was a problem hiding this comment.
I think it makes sense. I'd expect tool developer to have some control here. That said, would it make sense to start this under experimental, and once we settle, we can add more config options and move to main? 🤔
| const agent = new Agent({ | ||
| tools: [ | ||
| quickLookup, | ||
| { tool: searchWeb, background: true }, |
There was a problem hiding this comment.
do we need to define what tools are background-able?
|
|
||
| Several implementations validate the core pattern of async dispatch for Strands agents: | ||
|
|
||
| **[async-agentic-tools](https://github.com/mikegc-aws/async-agentic-tools)** (mikegc-aws) — ~320 lines of Python. A `@tool_async` decorator wraps any Strands tool, dispatches it to a thread pool, and returns a task ID immediately. An `AsyncAgent` wrapper handles result delivery via callbacks. Proves the pattern works today with minimal code, though the developer decides which tools are async (decorator-based), not the model. |
There was a problem hiding this comment.
but like, does it work? just because someone wrote the code, doesn't mean it works 😅 (especially with all the AI slop), is it proven?
|
|
||
| ## Proposal | ||
|
|
||
| Background Tasks give the model control over *when* work runs, not just *what*. A single `background()` tool lets the model dispatch work, keep reasoning, and react to results as they arrive: |
There was a problem hiding this comment.
Let's say I have a tool that calls an API in background mode. Could this be setup so the model calls this tool as normal but the tool returns with a response id that should be passed back in when the response is available?
This fits in line with OpenAI's model background mode. When you call the model in background mode, you get a response id that you pass back in a request to retrieve the response when it is done generating.
Through this approach, you wouldn't need a special background tool for the model. However, each tool implementation would then be responsible for exposing background functionality if it supports it.
There was a problem hiding this comment.
https://developers.openai.com/api/docs/guides/background
^ Info on model background mode.
|
|
||
| The model calls a tool normally. Behind the scenes, the framework intercepts the result, releases the agent process, and waits for the external work to complete. The agent is re-invoked when ready, and the result is injected as if the tool ran synchronously. The model never knows anything was async. | ||
|
|
||
| This is the approach taken by the [Strands Durability Plugin (SARK)](https://sark-docs.beta.harmony.a2z.com/sdo-strands-durability-plugin/developer-guide/async-execution). SARK's interrupt-based session release is more applicable to the durable task phase than to the ephemeral mechanism proposed here. |
|
|
||
| ### The model has no control over concurrency | ||
|
|
||
| Every concurrency option in Strands requires developer configuration. If the model determines mid-reasoning that two tasks are independent, it has no way to dispatch them and keep reasoning while they run. ConcurrentToolExecutor helps — it drops latency from sum(tools) to max(tools) — but the developer decides what runs concurrently, not the model, and the agent still blocks until the batch completes. |
There was a problem hiding this comment.
are we planning on limiting a maximum number of tasks that can run in the background concurrently? mainly wondering how this scales? any ideas how many background tasks can/should be run at once and when this becomes no longer beneficial?
|
|
||
| The model keeps reasoning and dispatching more work as needed. As results come in, they are delivered as tagged messages appended to the conversation, not as `tool_result` blocks. The dispatch returns a task ID as a normal `tool_result` (satisfying the provider API's synchronous pairing requirement), and the actual result arrives later as a separate message tagged with the originating tool and task ID. The model sees them at the start of its next turn. The tool definition instructs the model to continue working without fabricating results for dispatched tasks. | ||
|
|
||
| Background results are subject to the same context window limits and compaction behavior as any other conversation content. Proactive strategies — structured output on sub-agents to reduce result size, batching closely-completing results into a single injection, or queuing results when the context is near capacity — can mitigate this further. |
There was a problem hiding this comment.
What happens if we have context bloat or context summarization while waiting for a tool result. Are there plans to help mitigate confusing the model with a tool result that arrives late? How do we help the model not forgot why it wanted the tool result in the first place?
There was a problem hiding this comment.
Also, could be helpful to think about what happens if users filter out tool uses/tool results immediately after model responds to a tool result. Not sure if this happens in practice but I do wonder if someone has tried it for token saving strategies especially when caching isn't available. So the flow is:
- "User question"
- "Tool use"
- "Tool result"
- "Model response"
Under this flow, tool use and tool result can be treated as metadata and so can be filtered out. The critical context is now captured in "user question" and "model response". This means though the model loses the "dispatch" context when running a tool in background mode.
There was a problem hiding this comment.
Also worth thinking about what happens when running a model in stateful mode. We don't control message history under those circumstances.
|
|
||
| ### Durable background tasks | ||
|
|
||
| The model calls the same `background()` tool. What changes is the infrastructure underneath. Tasks run outside the agent process and survive process death. The agent can resume from where it left off, and task metadata persists across sessions. |
There was a problem hiding this comment.
I am not sure, but this actually has conflict with agent controlled event loop idea..? I think we can add durability in dispatch path as well.
|
|
||
| ### Unrelated tasks block each other | ||
|
|
||
| If task A finishes in 2 seconds and task B takes 30, the model cannot act on A's result until B also completes — even if A and B have nothing to do with each other. A completed result should be actionable the moment it arrives. |
There was a problem hiding this comment.
aside: This makes me think whether there should be guidance to include estimated latencies in tool descriptions
|
|
||
| | Action | Example | Description | | ||
| |--------|---------|-------------| | ||
| | dispatch | `background({ action: "dispatch", tool: "search_web", args: { query: "..." } })` | Dispatch work, receive a task ID immediately | |
There was a problem hiding this comment.
Strands SDK will implicitly create these 3 tools in the registry when the user creates a background tool?
|
|
||
| Either shape works. We propose the single-tool approach because it keeps the registry minimal and the model only needs to learn one tool name. | ||
|
|
||
| ### Result delivery strategy |
There was a problem hiding this comment.
Is this something tool manually does? Or do we have a generic way of doing it? or will we have it?
|
|
||
| ### The model is never idle | ||
|
|
||
| Work runs in the background while the model keeps reasoning. A coordinator dispatches 4 researcher agents and immediately moves on to structuring its report outline; when results arrive, they slot into a framework the model has already prepared. |
There was a problem hiding this comment.
I'm curious to understand if we have a sense of which domain use cases are best fit for each background/concurrency approach.
i.e. if I can make agents as tools or background agents as tools or a graph with concurrent edges etc how can we guide users towards the best fit?
Description
Related Issues
Type of Change
Checklist
npm run devBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.