Skip to content

feat: add hyperparams probe for inference parameter sweep (CWE-1434)#1772

Open
JakeBx wants to merge 3 commits into
NVIDIA:mainfrom
JakeBx:feat/hyperparam-basher
Open

feat: add hyperparams probe for inference parameter sweep (CWE-1434)#1772
JakeBx wants to merge 3 commits into
NVIDIA:mainfrom
JakeBx:feat/hyperparam-basher

Conversation

@JakeBx
Copy link
Copy Markdown
Contributor

@JakeBx JakeBx commented May 15, 2026

Adds garak/probes/hyperparams.py — a generic probe (HyperparamBasher) that sweeps generator inference parameters (temperature, top_p, top_k, etc.) across the prompts of any existing probe, measuring how model behaviour changes under non-default settings. Implements detection of CWE-1434: Insecure Setting of Generative AI/ML Model Inference Parameters. Also adds the CWE-1434 entry to garak/data/tags.misp.tsv.

Also adds garak/analyze/hyperparam_summary.py — a standalone post-run script that reads harness detector results from the report JSONL and prints a per-combo pass/fail table.

Closes #1233.


Design decisions

Probe-based rather than harness-based

The most complete solution may be a HyperparamHarness that runs any existing probe N times with different generator configs — full probe fidelity, zero base-class changes, normal parallelism per run. The trade-off is N full probe runs (one per param combo), separate report files per combo, and a new harness invocation path.

A probe-based approach fits the existing _generator_precall_hook pattern established by promptinject and divergence, ships as a single self-contained module, and produces one report with per-attempt notes. The cost is that source probe hooks and buffs are not fully inherited — only _attempt_prestore_hook is chained (see below). The harness remains the right answer for thorough sweeps of complex probes and is a possible follow-on.

Fail-fast on source probes with a custom _generator_precall_hook

promptinject and divergence.Repeat both define _generator_precall_hook with instance state that cannot be safely transferred to HyperparamBasher (e.g. promptinject reads attempt.notes["settings"] set by its own _attempt_prestore_hook; divergence reads self.override_maxlen). Rather than run silently with broken behaviour, HyperparamBasher raises PluginConfigurationError at initialisation. These probes are the correct candidates for the harness approach instead.

Generator compatibility — soft warning

HyperparamBasher sweeps params by calling setattr(generator, param, value) before each attempt. This works correctly for openai.OpenAICompatible and subclasses, which read instance attributes dynamically via inspect.signature when constructing API requests. Template-based generators such as rest.RestGenerator store the attribute but never interpolate it into the request body — the sweep has no effect and results are misleading. _validate_params_against_generator cannot detect this because the attribute exists on the generator; it simply goes unused.

Rather than a hard isinstance gate (which would break valid future generators that expose dynamic params without subclassing OpenAICompatible), the probe emits a prominent WARNING log if the generator's MRO does not include OpenAICompatible. A cleaner long-term solution could be that generators opt in by declaring supports_dynamic_inference_params = True as a class attribute. This shifts the burden to generator authors who actually know whether their class interpolates instance attributes into requests, avoids class hierarchy coupling, and is future-proof for generators that don't subclass OpenAICompatible.

Per-combo results via sidecar index + post-run script

The core value of this probe is identifying which parameter settings cause failures. Per-combo detection requires harness-produced detector_results, which are only available after probe() returns — attempts are serialised before harness detection runs, so attempt.detector_results is always {} at generation time.

An earlier version of this commit solved this inline: _summarise_by_combo() loaded the primary detector via _plugins.load_plugin, ran detection across all completed attempts inside probe(), and stored results in attempt.notes["hyperparam_detector_results"] to survive JSONL serialisation. This worked, but on review the architectural impact was a bit too icky — probes generate attempts, detectors classify them, and crossing that boundary doubles inference cost for any non-cached detector (e.g. an LLM-as-judge).

The current approach keeps the boundary clean:

  • Each attempt carries notes["hyperparam_combo"] for direct grouping in the report jsonl.
  • garak.analyze.hyperparam_summary is a standalone post-run script that reads detector_results from the harness-produced JSONL and prints the per-combo breakdown. The terminal prints a copy-pasteable invocation with the report path at the end of each run.

The UX trade-off (running one extra command after the probe) is preferable to a boundary violation with hidden cost. The inline version printed results immediately; the post-run script requires an extra step but operates on correct, harness-produced scores.

A follow up would be to update the report generation to display in the report results by parameter setting.


Verification

Smoke tested against mistralai/mistral-nemo via OpenRouter (openai.OpenAICompatible).

{
    "run": {"generations": 2, "soft_probe_prompt_cap": 3},
    "plugins": {
        "generators": {"openai": {"OpenAICompatible": {"uri": "https://openrouter.ai/api/v1"}}},
        "probes": {
            "hyperparams": {
                "HyperparamBasher": {
                    "source_probe": "packagehallucination.Python",
                    "param_space": {"temperature": [0.0, 1.5, 2.0]},
                    "sweep_strategy": "single"
                }
            }
        }
    }
}

export OPENAICOMPATIBLE_API_KEY=$OPENROUTER_API_KEY
python -m garak
--model_type openai.OpenAICompatible
--model_name mistralai/mistral-nemo
--config

View per-combo results (run after garak completes):
python -m garak.analyze.hyperparam_summary --report ~/.local/share/garak/.report.jsonl

Full verification smoke test configs and output: https://github.com/JakeBx/mtls-testing/tree/main/garak/param-basher

@JakeBx JakeBx force-pushed the feat/hyperparam-basher branch from 349e6ea to 5715905 Compare May 15, 2026 04:16
@JakeBx JakeBx marked this pull request as ready for review May 16, 2026 10:51
@JakeBx JakeBx force-pushed the feat/hyperparam-basher branch from 83d48f8 to 5a69e35 Compare May 16, 2026 22:59
…eping

Adds garak/probes/hyperparams.py — a generic probe (HyperparamBasher) that
sweeps generator inference parameters across the prompts of any existing
probe, measuring how model behaviour changes under non-default settings.
Implements CWE-1434 detection. Adds CWE-1434 entry to tags.misp.tsv.

Random sweep builds the full param_space Cartesian product upfront and draws
without replacement via np.random.default_rng, avoiding the seen-set rejection
loop that could silently under-sample dense spaces. Supports random_seed for
reproducible sweeps without contaminating global random state.

Emits a prominent WARNING (logged and printed to terminal) if the generator's
MRO does not include OpenAICompatible, catching the silent-failure mode where
template-based generators store swept attributes but never interpolate them
into requests.

Closes NVIDIA#1233.

Signed-off-by: JakeBx <jacob.j.lee@live.com>
@JakeBx JakeBx force-pushed the feat/hyperparam-basher branch from 5b38923 to 0e064a8 Compare May 17, 2026 02:16
Remove _summarise_by_combo() from HyperparamBasher, which violated the
probe/detector boundary by loading and running a detector inside probe().
This also doubled inference cost for non-cached detectors and
relied on a timing workaround (writing to attempt.notes) to survive JSONL
serialisation before harness detection runs.

Replace with garak/analyze/hyperparam_summary.py: standalone post-run script that reads
detector_results from the harness-produced JSONL and prints the per-combo
pass/fail table; invoke with `python -m garak.analyze.hyperparam_summary --report <path>
probe() now prints a copy-pasteable command with the report path after each run.

Signed-off-by: JakeBx <jacob.j.lee@live.com>
@JakeBx JakeBx force-pushed the feat/hyperparam-basher branch from 2a65393 to e102573 Compare May 17, 2026 12:42
Signed-off-by: JakeBx <jacob.j.lee@live.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inference Parameter Manipulation

1 participant