bench: warm up SEAL kernels before timed benchmarks (#625)#741
Merged
kimlaine merged 2 commits intomicrosoft:mainfrom Apr 28, 2026
Merged
bench: warm up SEAL kernels before timed benchmarks (#625)#741kimlaine merged 2 commits intomicrosoft:mainfrom
kimlaine merged 2 commits intomicrosoft:mainfrom
Conversation
The first sealbench batch is 2-6x slower than later batches due to cold instruction cache and an uninitialized SEAL memory pool, which produced the "n=2048 keygen faster than n=1024" symptom in microsoft#625. Add a silent warmup pass after precomputation that runs one encrypt + decrypt + add + multiply (+ relinearize when key-switching is on) per registered BMEnv. Bench-only change; no library code touched. Implements the fix proposed by @rickwebiii in the issue. Closes microsoft#625
Contributor
|
Thank you, this is overall a nice improvement. Would you be able to add a flag that optionally disables this warmup; something like --no-warmup? This could be just read from argv. The rationale is that sometimes it's actually desirable to measure the cold start. |
Contributor
Author
|
no problem |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #625.
Summary
The first batch of sealbench results in a fresh process can be 2-6× slower than subsequent batches due to cold instruction cache, page faults, and an uninitialized SEAL memory pool. This produced the symptom @rickwebiii reported in #625 where "poly degree 2048 keygen was faster than 1024, which makes no sense", and the same first-batch / second-batch ratios he documented in the issue body (KeyGen Secret 392 µs → 75 µs, Decrypt 140 µs → 59 µs, etc.).
This PR implements the fix exactly as the original reporter proposed:
What changes
native/bench/bench.cpponly. No library code touched, no impact on users who do not build sealbench.After the existing precomputation in
main(), a newwarmup_family()helper iterates thebm_env_mapand runs one encrypt + decrypt + add + multiply per BMEnv (with an extra relinearize when key-switching is on), discarding the results. CKKS additionally exercises the encoder. The warmup primes:A
Running warmup pass ...line prints between the existingRunning precomputations ...banner and the google-benchmark output, so it is visible to users.Verification
Built
Release-O3on Apple M-series, full sealbench across all default parameter sets:sealbenchexit codeFirst-vs-second-batch variability tightens on this hardware (e.g. KeyGen Public: 2.7% gap baseline → 0.6% gap with warmup). The dramatic 2-6× ratios reported in #625 are hardware-dependent (the original report was on M1 Air); on M-series the baseline gap is already smaller, but the warmup eliminates it by construction regardless of system.
Test plan
microsoft:main