add TAPIntent probe for intent-based tree-of-attacks testing#1775
Open
ABeltramo wants to merge 8 commits into
Open
add TAPIntent probe for intent-based tree-of-attacks testing#1775ABeltramo wants to merge 8 commits into
ABeltramo wants to merge 8 commits into
Conversation
Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
…m self.prompts after _mint_attempt had already set it to the correct TextStub via _attempt_prestore_hook. The type mismatch (string vs TextStub) caused the EarlyStopHarness stub comparison to silently return False, so TAP jailbreak results were discarded and all baseline attempts remained in the rejected list regardless of outcome. Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
… jailbreak, so the caller's detectors can evaluate them Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
When TAP's pruning algorithm removes all candidates (or TAP fails to generate any), `adv_prompt_list` is empty. This causes `TAPIntent` to not log any `Attempt` for an input stub which results in missing stubs from the output report. see: RHOAIENG-54412 Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
- Use correct intentservice import path (garak.services.intentservice) - Set intent_spec before calling intentservice.load() - Add serve_detectorless_intents flag for T999 test intent - Remove notes[stub] assertion which depends on unreleased base.py changes Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
jmartin-tech
requested changes
May 15, 2026
Collaborator
jmartin-tech
left a comment
There was a problem hiding this comment.
Full review is not completed, comments reflect basic coding requirements for limited PR scope.
Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
TAPIntent, anIntentProbesubclass that drives the Tree of Attacks with Pruning (TAP) algorithm against intent stubs loaded from the intent service.Also includes several fixes to the underlying TAP implementation:
_postprocess_attemptnot being called and attempts not being written to the report fileCherry-picked from trustyai-explainability/garak:automated-red-teaming