Skip to content

expressions: tree-general index deduction (Phase E) — inner-node general products#563

Merged
evaleev merged 4 commits into
evaleev/feature/general-product-exprfrom
evaleev/feature/general-product-tree-deduction
Jun 12, 2026
Merged

expressions: tree-general index deduction (Phase E) — inner-node general products#563
evaleev merged 4 commits into
evaleev/feature/general-product-exprfrom
evaleev/feature/general-product-tree-deduction

Conversation

@evaleev

@evaleev evaleev commented Jun 12, 2026

Copy link
Copy Markdown
Member

Stacked on #562. Completes the general-product story for arbitrary expression trees: general products (fused + contracted + free indices) no longer need to sit at the root of their own assignment.

What

  • Top-down index-set deduction (MultEngine::init_indices(target)): an index shared by a product's two children is fused iff the node's target carries it, contracted at the node otherwise; an index neither the sibling nor the target carries is consumed entirely within the child subtree and is dropped from its demand (pure intersection). The up-pass is the new available_indices() (per-subtree leaf-annotation union, valid before init); each child dictates the ORDER of its demand via preferred_layout() (canonical (fused, left-free, right-free) for products). Expressions consumed without a target (reductions) retain the bottom-up contraction convention and remain unsupported for inner general products.
  • Streaming result re-permute for general products (replaces the interleaved-target gate): a target differing from the canonical result layout cannot be folded into the batched tile op (BatchedContractReduce must stay perm-free), so the product evaluates canonically (perm-free batched Summa over a slab-replicated pmap) and a UnaryEvalImpl wrapper re-permutes to the consumer's layout. Honors TA's implicit-permute contract: when the consumer fuses the permutation into its transposed GEMM (implicit_permute_outer), only tile ordinals/trange are remapped and contents stay canonical. This also lifts the former root-level interleaved-target limitation.

With this, the THC reconstruction evaluates in ONE expression:

g("p,q,r,s") = x("p,r1") * x("q,r1") * z("r1,r2") * x("r,r2") * x("s,r2");

(left-deep tree; r1 is fused where the two x factors meet, contracted where z joins, and dropped from demands above), verified against the explicit-intermediates staging.

Tests

  • New: THC one-expression vs explicit intermediates; minimal depth-2 (general feeding contraction, exercises the implicit-permute path); root-level non-canonical target (exercises the content-permuting wrapper); scratch deduction unit checks.
  • The former expression_general_product_inner_node_gated throw test is superseded (the expression now evaluates).
  • Full regression: general_product, einsum_*, expressions{,_sparse} (modulo the two pre-existing assign_subblock_block_base1 failures), tot_contraction/permutations — green.
  • mpqc c6h14/cc-pVDZ PNO-CCSD energy unchanged (3e-11, run-to-run noise level).

Out of scope (follow-ups)

  • ScalMultEngine (scaled products) keeps depth-1 semantics for inner general products.
  • ToT inner-index result permutations remain gated.
  • Mixed T*ToT in arbitrary trees (Phase F) — to be scoped separately.

evaleev added 3 commits June 11, 2026 21:03
…t target (Phase E)

An index shared by the two children of a product is fused iff the node's
target carries it, contracted otherwise; an index neither the sibling nor
the target carries is consumed within the child subtree and not demanded
of it. available_indices() (per-subtree leaf-annotation union, valid
before init) supplies the up-pass; each child dictates the ORDER of its
demand via preferred_layout() (canonical (fused, left-free, right-free)
for products, pass-through elsewhere). Expressions consumed without a
target (reductions) retain the bottom-up contraction convention.
…reaming re-permute

A target that differs from the canonical (fused..., left-free...,
right-free...) result layout cannot be folded into the batched tile op
(BatchedContractReduce must be perm-free); evaluate canonically (Summa
over a slab-replicated pmap) and re-permute to the target with a
streaming UnaryEvalImpl. Honors the implicit-permute contract: when the
consumer fuses the permutation into its own operation (transposed GEMM),
only the tile ordinals/trange are remapped and contents stay canonical.
Replaces the interleaved-target gate, enabling general products at inner
expression-tree nodes and non-canonical root targets.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Phase E support for top-down index-set deduction across arbitrary expression trees, enabling inner-node general products (fused + contracted + free indices) and lifting the prior “general product must be root” limitation. It also adds a streaming re-permute wrapper so general products can evaluate in canonical layout and re-permute to non-canonical consumer layouts without injecting permutations into the perm-free batched tile op.

Changes:

  • Implement top-down child-demand deduction via new available_indices() up-pass and preferred_layout() demand ordering.
  • Support non-canonical general-product result layouts by evaluating canonically and applying a streaming unary re-permute.
  • Update/expand general-product tests to cover inner-node cases and non-canonical targets (including THC one-expression reconstruction).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/general_product.cpp Replaces the former inner-node “gated throw” test with executing cases; adds coverage for non-canonical targets and depth-2 inner-node paths.
src/TiledArray/expressions/unary_engine.h Adds Phase-E plumbing (available_indices, preferred_layout) for unary nodes.
src/TiledArray/expressions/mult_engine.h Implements Phase-E top-down demand propagation and adds canonical product preferred_layout() for products.
src/TiledArray/expressions/leaf_engine.h Exposes leaf available_indices() and a pass-through preferred_layout().
src/TiledArray/expressions/cont_engine.h Adds streaming re-permute wrapper for general products with non-canonical targets; updates general init_struct logic accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +850 to +865
struct GeneralRepermuteOp {
typedef value_type result_type;
typedef value_type argument_type;
static constexpr bool is_consumable = false;
BipartitePermutation perm;
/// false when the consumer fuses the permutation into its own operation
/// (implicit permute, e.g. a transposed GEMM): then only the tile
/// ordinals/trange are remapped (by the host UnaryEvalImpl) and the tile
/// contents are delivered in the canonical layout
bool permute_contents = true;
result_type operator()(const argument_type& tile) const {
if (!permute_contents) return tile;
TiledArray::detail::Noop<value_type, value_type, false> noop;
return noop(tile, perm);
}
};

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f8f4090: GeneralRepermuteOp now stores a plain outer Permutation and is constructed with outer(perm_); the inner perm stays the responsibility of init_struct_general/implicit_permute_inner_. Verified general_product_suite (incl. all ToT repermute cases) stays green.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f8f4090. GeneralRepermuteOp now stores/applies only the outer Permutation (outer(perm_)); the wrapper exists for outer result-layout reorder only, and inner (within-cell) ToT permutation is handled separately via init_struct_general / implicit_permute_inner_. With the full bipartite perm this was a no-op while inner(perm_) is identity, but a latent inner double-apply if an inner perm were ever deferred to a downstream op.

Comment thread src/TiledArray/expressions/cont_engine.h Outdated
Comment on lines +332 to +335
BinaryEngine_::left_.init_indices(BinaryEngine_::left_.preferred_layout(
bipartite_demand(avail_l, avail_r, target_indices)));
BinaryEngine_::right_.init_indices(BinaryEngine_::right_.preferred_layout(
bipartite_demand(avail_r, avail_l, target_indices)));

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f8f4090 by materializing the demands as named lvalues. (Strictly, the original g(f(temp)) was already safe — the temporary lives to the end of the full expression — but the named-lvalue form is clearer and robust against future refactors that might store the returned reference.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in f8f4090 (materialize each child demand as a named lvalue before preferred_layout()). Strictly this was not UB as written: the bipartite_demand(...) temporary lives to the end of the full expression, and init_indices() consumes the returned reference (copying into indices_) within that same full expression, so it was well-defined. But leaf/binary/unary preferred_layout() do return a reference to their argument, so the named-lvalue form removes the bind-to-temporary footgun.

- GeneralRepermuteOp: store/apply only the OUTER result-layout permutation.
  The streaming re-permute wrapper exists to reorder a general product's
  outer (result) layout; inner (within-cell) permutation of ToT results is
  handled separately (init_struct_general / implicit_permute_inner_). Using
  the full bipartite perm_ would also permute inner cells -- a no-op when the
  inner perm is identity, but a latent double-apply if an inner perm is ever
  deferred to a downstream op. Pass outer(perm_) into the op; the host
  UnaryEvalImpl still receives full perm_ for ordinal/trange remap.
- MultEngine down-pass: materialize each child demand as a named lvalue
  before preferred_layout(), which returns a reference to its argument for
  leaf/binary engines -- avoids a needlessly fragile bind-to-temporary.
@evaleev evaleev merged commit cf8e9f0 into evaleev/feature/general-product-expr Jun 12, 2026
9 checks passed
@evaleev evaleev deleted the evaleev/feature/general-product-tree-deduction branch June 12, 2026 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants