Add MulticlassJudge detector for configurable LLM-as-judge classification by ABeltramo · Pull Request #1773 · NVIDIA/garak

ABeltramo · 2026-05-15T06:40:10Z

Introduces a new MulticlassJudge detector that extends ModelAsJudge with JSON-aware response parsing and user-defined classification categories (e.g. complied/rejected/alternative/other). Supports configurable system and user prompts, custom score keys/fields, confidence thresholds, and optional JSON schema injection for structured output APIs.

Cherry-picked from trustyai-explainability/garak:automated-red-teaming

…tion Introduces a new MulticlassJudge detector that extends ModelAsJudge with JSON-aware response parsing and user-defined classification categories (e.g. complied/rejected/alternative/other). Supports configurable system and user prompts, custom score keys/fields, confidence thresholds, and optional JSON schema injection for structured output APIs. Signed-off-by: ABeltramo <beltramo.ale@gmail.com>

Signed-off-by: ABeltramo <beltramo.ale@gmail.com>

ABeltramo · 2026-05-15T07:27:49Z

From a quick glance it seems that the CI failures are unrelated to this PR.
The only failing test is tests/generators/test_litellm.py::test_litellm_model_detection, which fails with:

  openai.OpenAIError: Missing credentials. Please pass an `api_key` ...

This test has no OPENAI_API_KEY skip guard (unlike the other tests in the same file), so it runs unconditionally whenever litellm is installed. The same test passes on our fork's CI, which ran against a slightly older package version, so I guess this is just a flaky test..

jmartin-tech · 2026-05-15T12:46:38Z

The test failure is from an environment requirements change in the litellm dependency released yesterday, a PR to address it in main should be up by end of day a propagate it to the feature branch.

Signed-off-by: ABeltramo <beltramo.ale@gmail.com>

hjrnunes added 3 commits May 15, 2026 07:41

Adjust default prompts for MulticlassJudge

d46c3b8

Signed-off-by: ABeltramo <beltramo.ale@gmail.com>

Fix MulticlassJudge default user prompt template

0cfc5f0

Signed-off-by: ABeltramo <beltramo.ale@gmail.com>

ABeltramo force-pushed the feature/multiclass-judge branch from e89c4fa to 0cfc5f0 Compare May 15, 2026 06:41

ABeltramo added 2 commits May 19, 2026 08:51

fix: simplified format for multiclass judge

b2b6556

Signed-off-by: ABeltramo <beltramo.ale@gmail.com>

fix: more robust text stripping

91f292c

Signed-off-by: ABeltramo <beltramo.ale@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MulticlassJudge detector for configurable LLM-as-judge classification#1773

Add MulticlassJudge detector for configurable LLM-as-judge classification#1773
ABeltramo wants to merge 5 commits into
NVIDIA:feature/technique_intentfrom
trustyai-explainability:feature/multiclass-judge

ABeltramo commented May 15, 2026

Uh oh!

ABeltramo commented May 15, 2026

Uh oh!

jmartin-tech commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ABeltramo commented May 15, 2026

Uh oh!

ABeltramo commented May 15, 2026

Uh oh!

jmartin-tech commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants