Skip to content

CostModel: batch-aware peak-memory single-term optimization + DRY cleanup#559

Open
evaleev wants to merge 38 commits into
masterfrom
feature/cost-model-batch-aware
Open

CostModel: batch-aware peak-memory single-term optimization + DRY cleanup#559
evaleev wants to merge 38 commits into
masterfrom
feature/cost-model-batch-aware

Conversation

@evaleev

@evaleev evaleev commented Jun 19, 2026

Copy link
Copy Markdown
Member

Summary

Adds a batch-aware, peak-memory, customizable cost model for single-term tensor-network optimization, and removes the duplicated DP code it supersedes.

Two existing objectives (DenseFLOPs, DenseSize) are generalized into a CostModel framework with two new objectives and one shared batchability policy:

  • DensePeakSize — all-co-resident (model A) peak-memory objective via a pebbling DP, instead of summed-FLOPs/size. Validates against an independent brute-force oracle.
  • DensePeakSizeBatched — per-index multi-mode batched peak (peak[n][B]): each batchable index slices independently; persistence-gated.
  • CostModel generic driver (run_single_term_opt<Model>) — one compile-time subset-lattice/bipartition driver; each objective's recurrence + reconstruction lives in its own model type (AdditiveModel, PeakModel, PeakBatchedModel). Public, so users can drive a custom objective directly.
  • BatchPolicy (core/batch_policy.hpp) — one batchability source (is_batchable_index, per-index batch_target_size, is_volatile_leaf) feeding both the optimizer and the runtime batched evaluator (make_evaluator adapter over make_batched_custom_evaluator), so the two can't drift.

It also lands the full-DRY cleanup: peak_cost/peak_cost_batched/reconstructed_batched_peak now delegate to the models, and the five now-dead standalone DP functions (peak_dp, peak_dp_batched, single_term_opt_impl, single_term_opt_peak_impl, single_term_opt_peak_batched_impl) plus struct PeakRes are removed — each recurrence now exists in exactly one place. The independent brute_force_min_peak/batched_min_peak oracles remain as cross-checks.

Design docs

Specs and plans under doc/dev/{specs,plans}/ (2026-06-18 / 2026-06-19).

Testing

  • Full [optimize] unit suite green; new objectives cross-checked against independent brute-force oracles and reconstruction-simulation checks; CostModel concept conformance (positive + negative).
  • Full SeQuant unit suite (all tags) green. (Also fixes a pre-existing flaky eval test: the two batched-eval paths agree only to Loose tolerance, since batched summation accumulation order is thread-non-deterministic.)
  • Behavior-preserving: existing objective results unchanged bit-for-bit.
  • Downstream check: MPQC CSV-CCk (he10 batched) reproduces the reference energy to 7e-15.

Follow-up

MPQC will repin MPQC_TRACKED_SEQUANT_TAG to this once merged.

evaleev added 30 commits June 18, 2026 20:47
…mization)

Adds doc/dev/specs spec for a CostModel abstraction that unifies single-term
optimization cost and runtime evaluation policy: peak-memory objective (pebbling
recurrence), batchability-aware footprint (monomial in batch-index extents,
peak_full/peak_slice two-mode, persistence-gated frontier), three built-in
models (DenseFLOPs, DensePeakSize, DensePeakSizeBatched), and custom injection.
… term

Phase 1 oracle revealed the pebbling recurrence computed a Sethi-Ullman
(register-style) peak that omits resident input leaves. Adopt the realistic
all-co-resident tensor peak: add per-subset L[n] (leaf-size sum) and the
bystander term L[other]+peak[child] to the DP; oracle (7.0 on the 3-leaf
example) and DP now agree. Same clean subset DP and optimal substructure.
The DensePeakSize enumerator, the peak_cost wrapper, and the public
single_term_opt Metric template parameter were undocumented. Describe the
peak-memory objective (all-co-resident model, order is a real lever) and its
Phase-1 limitation (no subnet CSE).
Make the DensePeakSizeBatched formulation concrete under the all-co-resident
model: explicit peak_full/peak_slice recurrence (full bystander terms, frontier
substitution), local batchable-frontier gate (batch-index internal AND
persistent), the validation strategy (slice mode reuses Phase-1 peak_cost at
batched extents; full mode vs a tree x order x batch-choice oracle), and the
Phase-2 OptimizeOptions plumbing (pre-CostModel).
Batchable indices slice independently (peak[n][B], B subset of the term's
batchable indices) rather than as one group, which would under-count and
mislead on multi-aux terms. Batch decision for Ki taken at the node where Ki is
internalized; objective peak[root][empty]; m=1 collapses to two-mode and ties to
Phase-1 peak_cost. Oracle and validation updated for per-index (incl. a
two-distinct-aux case).
DP gains a [B] dimension over the term's distinct batchable indices; oracle
threads the per-index slice context; reconstruction gets a full numeric
memory-simulation check. All-sliced corner ties to Phase-1 peak_cost.
Add DensePeakSizeBatched to ObjectiveFunction and two new
OptimizeOptions fields (is_batchable_index, batch_target_size).

Implement in detail namespace (single_term.hpp):
- batchable_index_list: distinct batchable indices in appearance order
- sliced_footprints: 2^m tables of subset_footprints, one per sliced-set B
- leaf_volatile_mask: bitmask of volatile leaf tensors (mirrors inline mask)

Test: "per-index batchability tables" SECTION verifies that aux.size()==2
for two distinct F-space indices, tables.size()==4 (2^2 sliced-sets), the
all-sliced footprint is strictly smaller than the unsliced one, and that
slicing only F1 shrinks only the F1-leaf footprint.
…side)

One compile-time generic driver run_single_term_dp<Model> owns the subset
lattice + bipartition enumeration; each objective becomes a CostModel type
(State + Context + leaf/init/relax/finalize/reconstruct). Four built-ins
(AdditiveModel x2, PeakModel, PeakBatchedModel) map their existing DPs;
behavior-preserving (existing tests/oracles are the regression net). Evaluator
face and mpqc deferred to Phase 4.
4 tasks, behavior-preserving, model-by-model (AdditiveModel, PeakModel,
PeakBatchedModel) + concept/custom-model test. Old standalone DP/cost functions
kept as reference oracles; per-objective equivalence tests + full existing suite
green are the gates.
… cost; cover volatile path in equivalence test
…r optimizer + eval)

Bundle the batchability triple into one BatchPolicy{is_batchable_index,
batch_target_size (per-index), is_volatile_leaf} consumed by both the optimizer
(OptimizeOptions embeds it) and a thin eval-layer make_evaluator adapter (lifts
the Tensor volatile predicate to EvalNode). Generalizes batch_target_size from
scalar to per-index function. Two stages: SeQuant (policy+adapter+ripple), then
mpqc (construct once, feed both, delete dup). Behavior-preserving.
A1 batch_target_size scalar->per-index function; A2 BatchPolicy struct embedded
in OptimizeOptions; A3 eval-layer make_evaluator adapter; B1 mpqc construct-once
+ feed-both + delete-dup (CSV-CCk energy-match validation). Behavior-preserving;
A->B mpqc-compile window noted.
Replace every std::size_t batch/target_batch_size parameter on the
batched optimizer path and in make_batched_custom_evaluator with
std::function<std::size_t(Index const&)> batch_target_size.

Slicing applies min(extent(ix), batch_target_size(ix)), so a constant
lambda [](Index const&){ return N; } reproduces old scalar-N results.

Changed:
- OptimizeOptions::batch_target_size: size_t -> function<size_t(Index)>
- sliced_footprints, peak_dp_batched, peak_cost_batched,
  single_term_opt_peak_batched_impl, reconstructed_batched_peak: same
- detail::single_term_opt and public single_term_opt overload: same
- PeakBatchedModel::batch member: size_t -> function<size_t(Index)>
- make_batched_custom_evaluator: target_batch_size is now a function;
  call sites pass target_batch_size(*K) and target_batch_size(*Kk)
  to mode_batches (which still takes a scalar)

Tests: all existing batched test calls updated to pass constant lambdas;
new SECTION("per-index batch_target_size honored") verifies that
distinct per-index sizes produce different peak costs.
evaleev added 8 commits June 19, 2026 11:48
…hs' mutual agreement

The make_evaluator-vs-hand-built comparison asserted Tight (exact) equality
between two independent batched-summation evaluations, whose accumulation order
is thread-non-deterministic; this flaked by a few ULPs. Both paths already
compare Loose against the reference for the same reason; make the mutual
comparison Loose too.
Comment on lines +39 to +42
prod, opts.idx_to_extent, subnet_cse,
opts.batch_policy.is_volatile_leaf, opts.volatile_weight,
opts.footprint_weight, opts.batch_policy.is_batchable_index,
opts.batch_policy.batch_target_size);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be easier to just pass opts in its entirety into the function instead of every member separately, no?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files should be removed before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants