CostModel: batch-aware peak-memory single-term optimization + DRY cleanup by evaleev · Pull Request #559 · ValeevGroup/SeQuant

evaleev · 2026-06-19T19:48:17Z

Summary

Adds a batch-aware, peak-memory, customizable cost model for single-term tensor-network optimization, and removes the duplicated DP code it supersedes.

Two existing objectives (DenseFLOPs, DenseSize) are generalized into a CostModel framework with two new objectives and one shared batchability policy:

DensePeakSize — all-co-resident (model A) peak-memory objective via a pebbling DP, instead of summed-FLOPs/size. Validates against an independent brute-force oracle.
DensePeakSizeBatched — per-index multi-mode batched peak (peak[n][B]): each batchable index slices independently; persistence-gated.
CostModel generic driver (run_single_term_opt<Model>) — one compile-time subset-lattice/bipartition driver; each objective's recurrence + reconstruction lives in its own model type (AdditiveModel, PeakModel, PeakBatchedModel). Public, so users can drive a custom objective directly.
BatchPolicy (core/batch_policy.hpp) — one batchability source (is_batchable_index, per-index batch_target_size, is_volatile_leaf) feeding both the optimizer and the runtime batched evaluator (make_evaluator adapter over make_batched_custom_evaluator), so the two can't drift.

It also lands the full-DRY cleanup: peak_cost/peak_cost_batched/reconstructed_batched_peak now delegate to the models, and the five now-dead standalone DP functions (peak_dp, peak_dp_batched, single_term_opt_impl, single_term_opt_peak_impl, single_term_opt_peak_batched_impl) plus struct PeakRes are removed — each recurrence now exists in exactly one place. The independent brute_force_min_peak/batched_min_peak oracles remain as cross-checks.

Design docs

Specs and plans under doc/dev/{specs,plans}/ (2026-06-18 / 2026-06-19).

Testing

Full [optimize] unit suite green; new objectives cross-checked against independent brute-force oracles and reconstruction-simulation checks; CostModel concept conformance (positive + negative).
Full SeQuant unit suite (all tags) green. (Also fixes a pre-existing flaky eval test: the two batched-eval paths agree only to Loose tolerance, since batched summation accumulation order is thread-non-deterministic.)
Behavior-preserving: existing objective results unchanged bit-for-bit.
Downstream check: MPQC CSV-CCk (he10 batched) reproduces the reference energy to 7e-15.

Follow-up

MPQC will repin MPQC_TRACKED_SEQUANT_TAG to this once merged.

…mization) Adds doc/dev/specs spec for a CostModel abstraction that unifies single-term optimization cost and runtime evaluation policy: peak-memory objective (pebbling recurrence), batchability-aware footprint (monomial in batch-index extents, peak_full/peak_slice two-mode, persistence-gated frontier), three built-in models (DenseFLOPs, DensePeakSize, DensePeakSizeBatched), and custom injection.

…mory objective)

… term Phase 1 oracle revealed the pebbling recurrence computed a Sethi-Ullman (register-style) peak that omits resident input leaves. Adopt the realistic all-co-resident tensor peak: add per-subset L[n] (leaf-size sum) and the bystander term L[other]+peak[child] to the DP; oracle (7.0 on the 3-leaf example) and DP now agree. Same clean subset DP and optimal substructure.

…m + public dispatch)

…ttery

The DensePeakSize enumerator, the peak_cost wrapper, and the public single_term_opt Metric template parameter were undocumented. Describe the peak-memory objective (all-co-resident model, order is a real lever) and its Phase-1 limitation (no subnet CSE).

Make the DensePeakSizeBatched formulation concrete under the all-co-resident model: explicit peak_full/peak_slice recurrence (full bystander terms, frontier substitution), local batchable-frontier gate (batch-index internal AND persistent), the validation strategy (slice mode reuses Phase-1 peak_cost at batched extents; full mode vs a tree x order x batch-choice oracle), and the Phase-2 OptimizeOptions plumbing (pre-CostModel).

…+ batch oracle)

Batchable indices slice independently (peak[n][B], B subset of the term's batchable indices) rather than as one group, which would under-count and mislead on multi-aux terms. Batch decision for Ki taken at the node where Ki is internalized; objective peak[root][empty]; m=1 collapses to two-mode and ties to Phase-1 peak_cost. Oracle and validation updated for per-index (incl. a two-distinct-aux case).

DP gains a [B] dimension over the term's distinct batchable indices; oracle threads the per-index slice context; reconstruction gets a full numeric memory-simulation check. All-sliced corner ties to Phase-1 peak_cost.

Add DensePeakSizeBatched to ObjectiveFunction and two new OptimizeOptions fields (is_batchable_index, batch_target_size). Implement in detail namespace (single_term.hpp): - batchable_index_list: distinct batchable indices in appearance order - sliced_footprints: 2^m tables of subset_footprints, one per sliced-set B - leaf_volatile_mask: bitmask of volatile leaf tensors (mirrors inline mask) Test: "per-index batchability tables" SECTION verifies that aux.size()==2 for two distinct F-space indices, tables.size()==4 (2^2 sliced-sets), the all-sliced footprint is strictly smaller than the unsliced one, and that slicing only F1 shrinks only the F1-leaf footprint.

… guard

…side) One compile-time generic driver run_single_term_dp<Model> owns the subset lattice + bipartition enumeration; each objective becomes a CostModel type (State + Context + leaf/init/relax/finalize/reconstruct). Four built-ins (AdditiveModel x2, PeakModel, PeakBatchedModel) map their existing DPs; behavior-preserving (existing tests/oracles are the regression net). Evaluator face and mpqc deferred to Phase 4.

4 tasks, behavior-preserving, model-by-model (AdditiveModel, PeakModel, PeakBatchedModel) + concept/custom-model test. Old standalone DP/cost functions kept as reference oracles; per-objective equivalence tests + full existing suite green are the gates.

…ize)

… cost; cover volatile path in equivalence test

…int test

…r optimizer + eval) Bundle the batchability triple into one BatchPolicy{is_batchable_index, batch_target_size (per-index), is_volatile_leaf} consumed by both the optimizer (OptimizeOptions embeds it) and a thin eval-layer make_evaluator adapter (lifts the Tensor volatile predicate to EvalNode). Generalizes batch_target_size from scalar to per-index function. Two stages: SeQuant (policy+adapter+ripple), then mpqc (construct once, feed both, delete dup). Behavior-preserving.

A1 batch_target_size scalar->per-index function; A2 BatchPolicy struct embedded in OptimizeOptions; A3 eval-layer make_evaluator adapter; B1 mpqc construct-once + feed-both + delete-dup (CSV-CCk energy-match validation). Behavior-preserving; A->B mpqc-compile window noted.

Replace every std::size_t batch/target_batch_size parameter on the batched optimizer path and in make_batched_custom_evaluator with std::function<std::size_t(Index const&)> batch_target_size. Slicing applies min(extent(ix), batch_target_size(ix)), so a constant lambda [](Index const&){ return N; } reproduces old scalar-N results. Changed: - OptimizeOptions::batch_target_size: size_t -> function<size_t(Index)> - sliced_footprints, peak_dp_batched, peak_cost_batched, single_term_opt_peak_batched_impl, reconstructed_batched_peak: same - detail::single_term_opt and public single_term_opt overload: same - PeakBatchedModel::batch member: size_t -> function<size_t(Index)> - make_batched_custom_evaluator: target_batch_size is now a function; call sites pass target_batch_size(*K) and target_batch_size(*Kk) to mode_batches (which still takes a scalar) Tests: all existing batched test calls updated to pass constant lambdas; new SECTION("per-index batch_target_size honored") verifies that distinct per-index sizes produce different peak costs.

…calar docs

…aluator

…pt test

…k through the models

…maining uses through the models

…dp, *_impl, PeakRes)

…hs' mutual agreement The make_evaluator-vs-hand-built comparison asserted Tight (exact) equality between two independent batched-summation evaluations, whose accumulation order is thread-non-deterministic; this flaked by a few ULPs. Both paths already compare Loose against the reference for the same reason; make the mutual comparison Loose too.

Krzmbrzl · 2026-06-20T13:19:07Z

+        prod, opts.idx_to_extent, subnet_cse,
+        opts.batch_policy.is_volatile_leaf, opts.volatile_weight,
+        opts.footprint_weight, opts.batch_policy.is_batchable_index,
+        opts.batch_policy.batch_target_size);


I guess it would be easier to just pass opts in its entirety into the function instead of every member separately, no?

Krzmbrzl · 2026-06-20T13:21:52Z

These files should be removed before merging.

evaleev added 30 commits June 18, 2026 20:47

docs: Phase 1 implementation plan for batch-aware cost model (peak-me…

17a9e94

…mory objective)

optimize: add subset_footprints primitive for peak costing

c779059

optimize: add brute-force min-peak oracle for tests

37558ff

optimize: add DensePeakSize peak-memory DP (validated vs oracle)

c2180b8

docs: strengthen Phase 1 Task 4 tests (reconstruction-achieves-optimu…

adafabe

…m + public dispatch)

optimize: expose DensePeakSize via optimize(); add peak regression ba…

aa63940

…ttery

docs: Phase 2 implementation plan (DensePeakSizeBatched, two-mode DP …

a16d061

…+ batch oracle)

docs: rewrite Phase 2 plan for per-index multi-mode batching

cc42ec2

DP gains a [B] dimension over the term's distinct batchable indices; oracle threads the per-index slice context; reconstruction gets a full numeric memory-simulation check. All-sliced corner ties to Phase-1 peak_cost.

optimize: per-index batch-aware min-peak oracle for tests

427635f

optimize: multi-mode (per-index) DensePeakSizeBatched DP

66ecb24

optimize: reconstruct (numeric-checked) + dispatch DensePeakSizeBatched

15a5502

optimize: clarify reconstructed_batched_peak independence; tidy nt==0…

770acff

… guard

optimize: generic run_single_term_opt driver + AdditiveModel (FLOPs/S…

9209954

…ize)

optimize: fix AdditiveModel::finalize to apply volatile weight in CSE…

3b22e52

… cost; cover volatile path in equivalence test

optimize: PeakModel via generic driver (DensePeakSize)

86525e0

optimize: PeakBatchedModel via generic driver (DensePeakSizeBatched)

ad07ab2

optimize: CostModel concept + conformance + custom-model extension-po…

8251da1

…int test

optimize: strengthen per-index batch_target_size test; update stale s…

6c56be8

…calar docs

optimize: introduce BatchPolicy, embed in OptimizeOptions

4a9a88f

evaleev added 8 commits June 19, 2026 11:48

eval: make_evaluator(BatchPolicy) adapter over make_batched_custom_ev…

292b7d7

…aluator

docs: plan for CostModel old-impl removal (full DRY) + negative conce…

2b5facb

…pt test

docs: correct SeQuant build dir to cmake-build-release in removal plan

76f3db2

optimize: route peak_cost/peak_cost_batched/reconstructed_batched_pea…

c3674d3

…k through the models

optimize tests: drop model-vs-old-impl equivalence SECTIONs, route re…

a4990ee

…maining uses through the models

optimize: delete dead standalone single-term DP drivers/tables (peak_…

7a7b3f4

…dp, *_impl, PeakRes)

optimize tests: add negative-direction CostModel concept check

56eb634

Krzmbrzl reviewed Jun 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CostModel: batch-aware peak-memory single-term optimization + DRY cleanup#559

CostModel: batch-aware peak-memory single-term optimization + DRY cleanup#559
evaleev wants to merge 38 commits into
masterfrom
feature/cost-model-batch-aware

evaleev commented Jun 19, 2026

Uh oh!

Krzmbrzl Jun 20, 2026

Uh oh!

Krzmbrzl Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

evaleev commented Jun 19, 2026

Summary

Design docs

Testing

Follow-up

Uh oh!

Krzmbrzl Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Krzmbrzl Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants