Skip to content

Implement OpenAIResponsesGenerator#1806

Open
ABeltramo wants to merge 1 commit into
NVIDIA:feature/technique_intentfrom
trustyai-explainability:feature/openai-responses-generator
Open

Implement OpenAIResponsesGenerator#1806
ABeltramo wants to merge 1 commit into
NVIDIA:feature/technique_intentfrom
trustyai-explainability:feature/openai-responses-generator

Conversation

@ABeltramo
Copy link
Copy Markdown
Collaborator

Implements a new OpenAI compatible generator OpenAIResponsesGenerator that uses the responses API instead of ChatCompletion.

This started as way to run Agent Breaker against agents that support the responses API, but I think it's valuable as a standalone Generator for any probe.
I'm currently storing the returned tool calls as Message notes, mainly so that I can do manual inspection of the attempts in the report.jsonl file. Ideally, I think there should be a generic mechanism for storing those so that detectors can pick those up properly. Probably outside the scope for this PR, just wanted to know if you already have a direction for integrating tool calling into Garak.

Verification

The new Generator can be used like others OpenAI compatible generators, ex:

  generators:
    openai:
      OpenAIResponsesGenerator:
        uri: http://localhost:1234/v1/
        api_key: dummy 
        tools:
          - type: mcp
            server_label: mcp_server
            server_url: http://localhost:8888/sse
            require_approval: "never"

Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
Copy link
Copy Markdown
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks pretty useful, the new formats are something we are seeing more middle layers starting to mimic.

Some overall asks:

  • Please rebase this on main and adjust the PR target.
  • Given the comments made here about OpenAICompatible this might be functionality that has value in finding a way to integrate there.

}


class OpenAIResponsesGenerator(Generator):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably extend OpenAICompatible or possibly be integrated into that class.

If kept separate, I would even go as far as suggesting this should be OpenAIReponsesCompatible as the intent described in the description suggests usage with any service that supports the Responses APi.

Comment on lines +454 to +457
# response.output item types to collect into the final text.
# "message" captures standard assistant text; "reasoning" captures
# the model's reasoning summary (ResponseReasoningItem).
"output_types": ["message"],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs first class support, reasoning traces are additional data that garak needs to start gathering during the inference phase. As a first to support view I could see storing them as notes on the message but I don't think we should land them as an alternative value collected like this option suggests would be possible.

While is suspect the intent of making this a list is to gather both the option to not include messages becomes implied as valid, I am suggesting that should not be the case.

"uri": None,
"instructions": None,
"tools": [],
"max_output_tokens": 1024,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value should map to max_tokens. Since this generator is targeting the responses API this class should promote the garak standard generator option of max_tokens to be max_output_tokens during __init__ of this class instead of exposing max_output_tokens.

I understand this diverges from how OpenAIReasoningGenerator handled max_completion_tokens, however the evolution of how max_tokens is treated needs to standardize.

super().__init__(self.name, config_root=config_root)

@staticmethod
def _build_input(prompt: Union[Conversation, str]):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to support string here, all prompts must be of type Conversation at this time.

Suggested change
def _build_input(prompt: Union[Conversation, str]):
def _build_input(prompt: Conversation):

Comment on lines +577 to +584
if item_type == "message":
for part in item.content:
if getattr(part, "type", None) == "output_text":
text_parts.append(part.text)
elif item_type == "reasoning":
for summary in getattr(item, "summary", []):
if getattr(summary, "type", None) == "summary_text":
text_parts.append(summary.text)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging message text and reasoning text without may marker to delineate between them is not viable. Reasoning either needs to be stored separately or enclosed in something like the <think></think> tag from legacy APIs to ensure garak can properly exclude reasoning from the responses.

Comment on lines +528 to +533
instructions = self.instructions or self._extract_system_prompt(prompt)
create_args = {
"model": self.name,
"input": self._build_input(prompt),
"max_output_tokens": self.max_output_tokens,
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the format is different here, _extract_system_prompt and _build_input do not take into account additional input modalities supported by the Responses API that are already incorporated in OpenAICompatible._conversation_to_list(). This can be refactored to share that ETL process and enable this generator to support the additional input modalities.

Comment on lines +472 to +474
self.key_env_var = self.ENV_VAR
self._load_unsafe()
super().__init__(self.name, config_root=config_root)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Configurable class handles ENV_VAR support no need set here. Also _load_unsafe() should be called after all initialization and validation has completed.

Suggested change
self.key_env_var = self.ENV_VAR
self._load_unsafe()
super().__init__(self.name, config_root=config_root)
super().__init__(self.name, config_root=config_root)
self._load_unsafe()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants