CostModel: batch-aware peak-memory single-term optimization + DRY cleanup#559
Open
evaleev wants to merge 38 commits into
Open
CostModel: batch-aware peak-memory single-term optimization + DRY cleanup#559evaleev wants to merge 38 commits into
evaleev wants to merge 38 commits into
Conversation
…mization) Adds doc/dev/specs spec for a CostModel abstraction that unifies single-term optimization cost and runtime evaluation policy: peak-memory objective (pebbling recurrence), batchability-aware footprint (monomial in batch-index extents, peak_full/peak_slice two-mode, persistence-gated frontier), three built-in models (DenseFLOPs, DensePeakSize, DensePeakSizeBatched), and custom injection.
… term Phase 1 oracle revealed the pebbling recurrence computed a Sethi-Ullman (register-style) peak that omits resident input leaves. Adopt the realistic all-co-resident tensor peak: add per-subset L[n] (leaf-size sum) and the bystander term L[other]+peak[child] to the DP; oracle (7.0 on the 3-leaf example) and DP now agree. Same clean subset DP and optimal substructure.
…m + public dispatch)
The DensePeakSize enumerator, the peak_cost wrapper, and the public single_term_opt Metric template parameter were undocumented. Describe the peak-memory objective (all-co-resident model, order is a real lever) and its Phase-1 limitation (no subnet CSE).
Make the DensePeakSizeBatched formulation concrete under the all-co-resident model: explicit peak_full/peak_slice recurrence (full bystander terms, frontier substitution), local batchable-frontier gate (batch-index internal AND persistent), the validation strategy (slice mode reuses Phase-1 peak_cost at batched extents; full mode vs a tree x order x batch-choice oracle), and the Phase-2 OptimizeOptions plumbing (pre-CostModel).
Batchable indices slice independently (peak[n][B], B subset of the term's batchable indices) rather than as one group, which would under-count and mislead on multi-aux terms. Batch decision for Ki taken at the node where Ki is internalized; objective peak[root][empty]; m=1 collapses to two-mode and ties to Phase-1 peak_cost. Oracle and validation updated for per-index (incl. a two-distinct-aux case).
DP gains a [B] dimension over the term's distinct batchable indices; oracle threads the per-index slice context; reconstruction gets a full numeric memory-simulation check. All-sliced corner ties to Phase-1 peak_cost.
Add DensePeakSizeBatched to ObjectiveFunction and two new OptimizeOptions fields (is_batchable_index, batch_target_size). Implement in detail namespace (single_term.hpp): - batchable_index_list: distinct batchable indices in appearance order - sliced_footprints: 2^m tables of subset_footprints, one per sliced-set B - leaf_volatile_mask: bitmask of volatile leaf tensors (mirrors inline mask) Test: "per-index batchability tables" SECTION verifies that aux.size()==2 for two distinct F-space indices, tables.size()==4 (2^2 sliced-sets), the all-sliced footprint is strictly smaller than the unsliced one, and that slicing only F1 shrinks only the F1-leaf footprint.
…side) One compile-time generic driver run_single_term_dp<Model> owns the subset lattice + bipartition enumeration; each objective becomes a CostModel type (State + Context + leaf/init/relax/finalize/reconstruct). Four built-ins (AdditiveModel x2, PeakModel, PeakBatchedModel) map their existing DPs; behavior-preserving (existing tests/oracles are the regression net). Evaluator face and mpqc deferred to Phase 4.
4 tasks, behavior-preserving, model-by-model (AdditiveModel, PeakModel, PeakBatchedModel) + concept/custom-model test. Old standalone DP/cost functions kept as reference oracles; per-objective equivalence tests + full existing suite green are the gates.
… cost; cover volatile path in equivalence test
…r optimizer + eval)
Bundle the batchability triple into one BatchPolicy{is_batchable_index,
batch_target_size (per-index), is_volatile_leaf} consumed by both the optimizer
(OptimizeOptions embeds it) and a thin eval-layer make_evaluator adapter (lifts
the Tensor volatile predicate to EvalNode). Generalizes batch_target_size from
scalar to per-index function. Two stages: SeQuant (policy+adapter+ripple), then
mpqc (construct once, feed both, delete dup). Behavior-preserving.
A1 batch_target_size scalar->per-index function; A2 BatchPolicy struct embedded in OptimizeOptions; A3 eval-layer make_evaluator adapter; B1 mpqc construct-once + feed-both + delete-dup (CSV-CCk energy-match validation). Behavior-preserving; A->B mpqc-compile window noted.
Replace every std::size_t batch/target_batch_size parameter on the
batched optimizer path and in make_batched_custom_evaluator with
std::function<std::size_t(Index const&)> batch_target_size.
Slicing applies min(extent(ix), batch_target_size(ix)), so a constant
lambda [](Index const&){ return N; } reproduces old scalar-N results.
Changed:
- OptimizeOptions::batch_target_size: size_t -> function<size_t(Index)>
- sliced_footprints, peak_dp_batched, peak_cost_batched,
single_term_opt_peak_batched_impl, reconstructed_batched_peak: same
- detail::single_term_opt and public single_term_opt overload: same
- PeakBatchedModel::batch member: size_t -> function<size_t(Index)>
- make_batched_custom_evaluator: target_batch_size is now a function;
call sites pass target_batch_size(*K) and target_batch_size(*Kk)
to mode_batches (which still takes a scalar)
Tests: all existing batched test calls updated to pass constant lambdas;
new SECTION("per-index batch_target_size honored") verifies that
distinct per-index sizes produce different peak costs.
…k through the models
…maining uses through the models
…dp, *_impl, PeakRes)
…hs' mutual agreement The make_evaluator-vs-hand-built comparison asserted Tight (exact) equality between two independent batched-summation evaluations, whose accumulation order is thread-non-deterministic; this flaked by a few ULPs. Both paths already compare Loose against the reference for the same reason; make the mutual comparison Loose too.
Krzmbrzl
reviewed
Jun 20, 2026
Comment on lines
+39
to
+42
| prod, opts.idx_to_extent, subnet_cse, | ||
| opts.batch_policy.is_volatile_leaf, opts.volatile_weight, | ||
| opts.footprint_weight, opts.batch_policy.is_batchable_index, | ||
| opts.batch_policy.batch_target_size); |
Collaborator
There was a problem hiding this comment.
I guess it would be easier to just pass opts in its entirety into the function instead of every member separately, no?
Collaborator
There was a problem hiding this comment.
These files should be removed before merging.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a batch-aware, peak-memory, customizable cost model for single-term tensor-network optimization, and removes the duplicated DP code it supersedes.
Two existing objectives (
DenseFLOPs,DenseSize) are generalized into aCostModelframework with two new objectives and one shared batchability policy:DensePeakSize— all-co-resident (model A) peak-memory objective via a pebbling DP, instead of summed-FLOPs/size. Validates against an independent brute-force oracle.DensePeakSizeBatched— per-index multi-mode batched peak (peak[n][B]): each batchable index slices independently; persistence-gated.CostModelgeneric driver (run_single_term_opt<Model>) — one compile-time subset-lattice/bipartition driver; each objective's recurrence + reconstruction lives in its own model type (AdditiveModel,PeakModel,PeakBatchedModel). Public, so users can drive a custom objective directly.BatchPolicy(core/batch_policy.hpp) — one batchability source (is_batchable_index, per-indexbatch_target_size,is_volatile_leaf) feeding both the optimizer and the runtime batched evaluator (make_evaluatoradapter overmake_batched_custom_evaluator), so the two can't drift.It also lands the full-DRY cleanup:
peak_cost/peak_cost_batched/reconstructed_batched_peaknow delegate to the models, and the five now-dead standalone DP functions (peak_dp,peak_dp_batched,single_term_opt_impl,single_term_opt_peak_impl,single_term_opt_peak_batched_impl) plusstruct PeakResare removed — each recurrence now exists in exactly one place. The independentbrute_force_min_peak/batched_min_peakoracles remain as cross-checks.Design docs
Specs and plans under
doc/dev/{specs,plans}/(2026-06-18 / 2026-06-19).Testing
[optimize]unit suite green; new objectives cross-checked against independent brute-force oracles and reconstruction-simulation checks;CostModelconcept conformance (positive + negative).Loosetolerance, since batched summation accumulation order is thread-non-deterministic.)he10batched) reproduces the reference energy to 7e-15.Follow-up
MPQC will repin
MPQC_TRACKED_SEQUANT_TAGto this once merged.