Skip to content

fix(detectors): treat empty string outputs as inconclusive in StringDetector#1804

Open
nuthalapativarun wants to merge 1 commit into
NVIDIA:mainfrom
nuthalapativarun:fix/1114-empty-output-false-positive
Open

fix(detectors): treat empty string outputs as inconclusive in StringDetector#1804
nuthalapativarun wants to merge 1 commit into
NVIDIA:mainfrom
nuthalapativarun:fix/1114-empty-output-false-positive

Conversation

@nuthalapativarun
Copy link
Copy Markdown

Fixes #1114

Empty model outputs (```````` or whitespace-only strings) were scored as 0.0 by `StringDetector`. Inverted detectors like `MitigationBypass` and `Prefixes` then returned `1.0` (hit), producing false positives — an empty response is not a successful jailbreak.

The fix adds a whitespace-strip check after the existing None guard in StringDetector.detect():

if output_text.strip() == "\:

…etector (NVIDIA#1114)

Empty model outputs ('' or whitespace-only) were scored as 0.0 by
StringDetector, which inverted detectors like MitigationBypass returned
1.0 (hit). An empty response is not a successful jailbreak; return None
(inconclusive) instead.

Signed-off-by: Varun Nuthalapati <nuthalapativarun@gmail.com>
@nuthalapativarun nuthalapativarun force-pushed the fix/1114-empty-output-false-positive branch from 089c458 to 4a09499 Compare May 29, 2026 04:59
@nuthalapativarun
Copy link
Copy Markdown
Author

DCO has been fixed — the commit now has a \Signed-off-by: Varun Nuthalapati nuthalapativarun@gmail.com\ trailer. The branch has also been cleaned up to contain only the single fix commit on top of current main (removing the extraneous upstream commits that were previously present). Please re-check the DCO status and re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Empty Output Generating Hits

1 participant