feat: add hyperparams probe for inference parameter sweep (CWE-1434)#1772
Open
JakeBx wants to merge 3 commits into
Open
feat: add hyperparams probe for inference parameter sweep (CWE-1434)#1772JakeBx wants to merge 3 commits into
JakeBx wants to merge 3 commits into
Conversation
349e6ea to
5715905
Compare
83d48f8 to
5a69e35
Compare
…eping Adds garak/probes/hyperparams.py — a generic probe (HyperparamBasher) that sweeps generator inference parameters across the prompts of any existing probe, measuring how model behaviour changes under non-default settings. Implements CWE-1434 detection. Adds CWE-1434 entry to tags.misp.tsv. Random sweep builds the full param_space Cartesian product upfront and draws without replacement via np.random.default_rng, avoiding the seen-set rejection loop that could silently under-sample dense spaces. Supports random_seed for reproducible sweeps without contaminating global random state. Emits a prominent WARNING (logged and printed to terminal) if the generator's MRO does not include OpenAICompatible, catching the silent-failure mode where template-based generators store swept attributes but never interpolate them into requests. Closes NVIDIA#1233. Signed-off-by: JakeBx <jacob.j.lee@live.com>
5b38923 to
0e064a8
Compare
Remove _summarise_by_combo() from HyperparamBasher, which violated the probe/detector boundary by loading and running a detector inside probe(). This also doubled inference cost for non-cached detectors and relied on a timing workaround (writing to attempt.notes) to survive JSONL serialisation before harness detection runs. Replace with garak/analyze/hyperparam_summary.py: standalone post-run script that reads detector_results from the harness-produced JSONL and prints the per-combo pass/fail table; invoke with `python -m garak.analyze.hyperparam_summary --report <path> probe() now prints a copy-pasteable command with the report path after each run. Signed-off-by: JakeBx <jacob.j.lee@live.com>
2a65393 to
e102573
Compare
Signed-off-by: JakeBx <jacob.j.lee@live.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
garak/probes/hyperparams.py— a generic probe (HyperparamBasher) that sweeps generator inference parameters (temperature, top_p, top_k, etc.) across the prompts of any existing probe, measuring how model behaviour changes under non-default settings. Implements detection of CWE-1434: Insecure Setting of Generative AI/ML Model Inference Parameters. Also adds the CWE-1434 entry togarak/data/tags.misp.tsv.Also adds
garak/analyze/hyperparam_summary.py— a standalone post-run script that reads harness detector results from the report JSONL and prints a per-combo pass/fail table.Closes #1233.
Design decisions
Probe-based rather than harness-based
The most complete solution may be a
HyperparamHarnessthat runs any existing probe N times with different generator configs — full probe fidelity, zero base-class changes, normal parallelism per run. The trade-off is N full probe runs (one per param combo), separate report files per combo, and a new harness invocation path.A probe-based approach fits the existing
_generator_precall_hookpattern established bypromptinjectanddivergence, ships as a single self-contained module, and produces one report with per-attempt notes. The cost is that source probe hooks and buffs are not fully inherited — only_attempt_prestore_hookis chained (see below). The harness remains the right answer for thorough sweeps of complex probes and is a possible follow-on.Fail-fast on source probes with a custom
_generator_precall_hookpromptinjectanddivergence.Repeatboth define_generator_precall_hookwith instance state that cannot be safely transferred toHyperparamBasher(e.g.promptinjectreadsattempt.notes["settings"]set by its own_attempt_prestore_hook;divergencereadsself.override_maxlen). Rather than run silently with broken behaviour,HyperparamBasherraisesPluginConfigurationErrorat initialisation. These probes are the correct candidates for the harness approach instead.Generator compatibility — soft warning
HyperparamBashersweeps params by callingsetattr(generator, param, value)before each attempt. This works correctly foropenai.OpenAICompatibleand subclasses, which read instance attributes dynamically viainspect.signaturewhen constructing API requests. Template-based generators such asrest.RestGeneratorstore the attribute but never interpolate it into the request body — the sweep has no effect and results are misleading._validate_params_against_generatorcannot detect this because the attribute exists on the generator; it simply goes unused.Rather than a hard
isinstancegate (which would break valid future generators that expose dynamic params without subclassingOpenAICompatible), the probe emits a prominentWARNINGlog if the generator's MRO does not includeOpenAICompatible. A cleaner long-term solution could be that generators opt in by declaringsupports_dynamic_inference_params = Trueas a class attribute. This shifts the burden to generator authors who actually know whether their class interpolates instance attributes into requests, avoids class hierarchy coupling, and is future-proof for generators that don't subclassOpenAICompatible.Per-combo results via sidecar index + post-run script
The core value of this probe is identifying which parameter settings cause failures. Per-combo detection requires harness-produced
detector_results, which are only available afterprobe()returns — attempts are serialised before harness detection runs, soattempt.detector_resultsis always{}at generation time.An earlier version of this commit solved this inline:
_summarise_by_combo()loaded the primary detector via_plugins.load_plugin, ran detection across all completed attempts insideprobe(), and stored results inattempt.notes["hyperparam_detector_results"]to survive JSONL serialisation. This worked, but on review the architectural impact was a bit too icky — probes generate attempts, detectors classify them, and crossing that boundary doubles inference cost for any non-cached detector (e.g. an LLM-as-judge).The current approach keeps the boundary clean:
notes["hyperparam_combo"]for direct grouping in the report jsonl.garak.analyze.hyperparam_summaryis a standalone post-run script that readsdetector_resultsfrom the harness-produced JSONL and prints the per-combo breakdown. The terminal prints a copy-pasteable invocation with the report path at the end of each run.The UX trade-off (running one extra command after the probe) is preferable to a boundary violation with hidden cost. The inline version printed results immediately; the post-run script requires an extra step but operates on correct, harness-produced scores.
A follow up would be to update the report generation to display in the report results by parameter setting.
Verification
Smoke tested against
mistralai/mistral-nemovia OpenRouter (openai.OpenAICompatible).{ "run": {"generations": 2, "soft_probe_prompt_cap": 3}, "plugins": { "generators": {"openai": {"OpenAICompatible": {"uri": "https://openrouter.ai/api/v1"}}}, "probes": { "hyperparams": { "HyperparamBasher": { "source_probe": "packagehallucination.Python", "param_space": {"temperature": [0.0, 1.5, 2.0]}, "sweep_strategy": "single" } } } } }export OPENAICOMPATIBLE_API_KEY=$OPENROUTER_API_KEY
python -m garak
--model_type openai.OpenAICompatible
--model_name mistralai/mistral-nemo
--config
View per-combo results (run after garak completes):
python -m garak.analyze.hyperparam_summary --report ~/.local/share/garak/.report.jsonl
Full verification smoke test configs and output: https://github.com/JakeBx/mtls-testing/tree/main/garak/param-basher