Skip to content

add TAPIntent probe for intent-based tree-of-attacks testing#1775

Open
ABeltramo wants to merge 8 commits into
NVIDIA:feature/technique_intentfrom
trustyai-explainability:feature/tap-intent
Open

add TAPIntent probe for intent-based tree-of-attacks testing#1775
ABeltramo wants to merge 8 commits into
NVIDIA:feature/technique_intentfrom
trustyai-explainability:feature/tap-intent

Conversation

@ABeltramo
Copy link
Copy Markdown
Collaborator

Adds TAPIntent, an IntentProbe subclass that drives the Tree of Attacks with Pruning (TAP) algorithm against intent stubs loaded from the intent service.

Also includes several fixes to the underlying TAP implementation:

  • run_tap now returns its best candidates even when no score-10 jailbreak is found, so callers can run their own detectors on the results
  • Guards against the pruning algorithm returning an empty candidate list
  • Fixes attempt recording for unsuccessful TAP runs
  • Fixes _postprocess_attempt not being called and attempts not being written to the report file

Cherry-picked from trustyai-explainability/garak:automated-red-teaming

ABeltramo and others added 7 commits May 15, 2026 08:00
Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
…m self.prompts after _mint_attempt had already set it to the correct TextStub via _attempt_prestore_hook. The type mismatch (string vs TextStub) caused the EarlyStopHarness stub comparison to silently return False, so TAP jailbreak results were discarded and all baseline attempts remained in the rejected list regardless of outcome.

Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
… jailbreak, so the caller's detectors can evaluate them

Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
When TAP's pruning algorithm removes all candidates (or TAP fails to generate any), `adv_prompt_list` is empty.
This causes `TAPIntent` to not log any `Attempt` for an input stub which results in missing stubs from the output report.

see: RHOAIENG-54412
Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
- Use correct intentservice import path (garak.services.intentservice)
- Set intent_spec before calling intentservice.load()
- Add serve_detectorless_intents flag for T999 test intent
- Remove notes[stub] assertion which depends on unreleased base.py changes

Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
Copy link
Copy Markdown
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full review is not completed, comments reflect basic coding requirements for limited PR scope.

Comment thread garak/resources/tap/tap_main.py Outdated
Comment thread garak/resources/tap/tap_main.py Outdated
Comment thread garak/resources/tap/tap_main.py Outdated
Signed-off-by: ABeltramo <beltramo.ale@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants