fix(detectors): treat empty string outputs as inconclusive in StringDetector#1804
Open
nuthalapativarun wants to merge 1 commit into
Open
fix(detectors): treat empty string outputs as inconclusive in StringDetector#1804nuthalapativarun wants to merge 1 commit into
nuthalapativarun wants to merge 1 commit into
Conversation
…etector (NVIDIA#1114) Empty model outputs ('' or whitespace-only) were scored as 0.0 by StringDetector, which inverted detectors like MitigationBypass returned 1.0 (hit). An empty response is not a successful jailbreak; return None (inconclusive) instead. Signed-off-by: Varun Nuthalapati <nuthalapativarun@gmail.com>
089c458 to
4a09499
Compare
Author
|
DCO has been fixed — the commit now has a \Signed-off-by: Varun Nuthalapati nuthalapativarun@gmail.com\ trailer. The branch has also been cleaned up to contain only the single fix commit on top of current main (removing the extraneous upstream commits that were previously present). Please re-check the DCO status and re-review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1114
Empty model outputs (```````` or whitespace-only strings) were scored as
0.0by `StringDetector`. Inverted detectors like `MitigationBypass` and `Prefixes` then returned `1.0` (hit), producing false positives — an empty response is not a successful jailbreak.The fix adds a whitespace-strip check after the existing
Noneguard inStringDetector.detect():