Support HY-MT evaluation workflows by hanbitmyths · Pull Request #2482 · microsoft/Olive

hanbitmyths · 2026-05-31T06:33:48Z

Summary

Update lm-eval integration to preserve recipe bootstrap settings.
Improve ORT GenAI evaluation provider handling and runtime GenAI search options.
Keep model-load-only Hugging Face kwargs out of generation config loading.
Add accelerator normalization coverage for WebGPU export-only and runtime-required paths.

Validation

Ran py_compile for olive/evaluator/lmeval_ort.py, olive/evaluator/olive_evaluator.py, olive/model/handler/mixin/hf.py, and test/hardware/test_accelerator.py.
Ran git diff --check for the touched Olive files.
VS Code diagnostics reported no errors for touched accelerator/evaluator files.
Targeted pytest was attempted but blocked by missing pydantic in this environment.

Copilot

Pull request overview

This PR enables HY-MT evaluation workflows by tightening lm-eval/ORT-GenAI integration: it preserves recipe-provided bootstrap_iters, normalizes ORT GenAI provider names and mirrors the exported past_present_share_buffer setting at runtime, prevents model-loading-only kwargs from leaking into generation config loading, and extends accelerator tests with WebGPU coverage.

Changes:

LMEvaluator now forwards a configurable bootstrap_iters to lm_eval.simple_evaluate.
LMEvalORTGenAIEvaluator maps Olive EP names to ORT GenAI provider names, reads past_present_share_buffer from genai_config.json, and renames device to _device.
HfMixin.get_hf_generation_config filters out torch_dtype/device_map/max_memory/quantization_config before calling get_generation_config.
New accelerator normalization tests for WebGPU in both skip-EP-check and runtime-required paths.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
olive/evaluator/olive_evaluator.py	Adds `bootstrap_iters` plumbing into `LMEvaluator` and `simple_evaluate`.
olive/evaluator/lmeval_ort.py	Adds ORT GenAI provider name normalization, runtime `past_present_share_buffer`, and renames `self.device` to `self._device`.
olive/model/handler/mixin/hf.py	Excludes model-load-only kwargs when loading HF generation config.
test/hardware/test_accelerator.py	Adds WebGPU coverage for accelerator normalization skip/required paths.

Support HY-MT evaluation workflows

6e9890e

Copilot AI review requested due to automatic review settings May 31, 2026 06:33

Copilot started reviewing on behalf of hanbitmyths May 31, 2026 06:33 View session

Copilot AI reviewed May 31, 2026

View reviewed changes

Comment thread olive/evaluator/lmeval_ort.py

github-advanced-security AI found potential problems May 31, 2026

View reviewed changes

Comment thread olive/evaluator/lmeval_ort.py Fixed

hanbitmyths and others added 4 commits May 31, 2026 06:43

Address ORT GenAI evaluator review comments

cd03fff

Merge branch 'main' into sunghcho/hunyuan

0d86a8b

Fix accelerator test formatting

d2273c3

Merge branch 'main' into sunghcho/hunyuan

d2ff73d

xiaoyu-work approved these changes Jun 5, 2026

View reviewed changes

xiaoyu-work merged commit 3c13f57 into main Jun 5, 2026
13 checks passed

xiaoyu-work deleted the sunghcho/hunyuan branch June 5, 2026 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support HY-MT evaluation workflows#2482

Support HY-MT evaluation workflows#2482
xiaoyu-work merged 5 commits into
mainfrom
sunghcho/hunyuan

hanbitmyths commented May 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hanbitmyths commented May 31, 2026

Summary

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants