expressions: tree-general index deduction (Phase E) — inner-node general products by evaleev · Pull Request #563 · ValeevGroup/tiledarray

evaleev · 2026-06-12T01:14:34Z

Stacked on #562. Completes the general-product story for arbitrary expression trees: general products (fused + contracted + free indices) no longer need to sit at the root of their own assignment.

What

Top-down index-set deduction (MultEngine::init_indices(target)): an index shared by a product's two children is fused iff the node's target carries it, contracted at the node otherwise; an index neither the sibling nor the target carries is consumed entirely within the child subtree and is dropped from its demand (pure intersection). The up-pass is the new available_indices() (per-subtree leaf-annotation union, valid before init); each child dictates the ORDER of its demand via preferred_layout() (canonical (fused, left-free, right-free) for products). Expressions consumed without a target (reductions) retain the bottom-up contraction convention and remain unsupported for inner general products.
Streaming result re-permute for general products (replaces the interleaved-target gate): a target differing from the canonical result layout cannot be folded into the batched tile op (BatchedContractReduce must stay perm-free), so the product evaluates canonically (perm-free batched Summa over a slab-replicated pmap) and a UnaryEvalImpl wrapper re-permutes to the consumer's layout. Honors TA's implicit-permute contract: when the consumer fuses the permutation into its transposed GEMM (implicit_permute_outer), only tile ordinals/trange are remapped and contents stay canonical. This also lifts the former root-level interleaved-target limitation.

With this, the THC reconstruction evaluates in ONE expression:

g("p,q,r,s") = x("p,r1") * x("q,r1") * z("r1,r2") * x("r,r2") * x("s,r2");

(left-deep tree; r1 is fused where the two x factors meet, contracted where z joins, and dropped from demands above), verified against the explicit-intermediates staging.

Tests

New: THC one-expression vs explicit intermediates; minimal depth-2 (general feeding contraction, exercises the implicit-permute path); root-level non-canonical target (exercises the content-permuting wrapper); scratch deduction unit checks.
The former expression_general_product_inner_node_gated throw test is superseded (the expression now evaluates).
Full regression: general_product, einsum_*, expressions{,_sparse} (modulo the two pre-existing assign_subblock_block_base1 failures), tot_contraction/permutations — green.
mpqc c6h14/cc-pVDZ PNO-CCSD energy unchanged (3e-11, run-to-run noise level).

Out of scope (follow-ups)

ScalMultEngine (scaled products) keeps depth-1 semantics for inner general products.
ToT inner-index result permutations remain gated.
Mixed T*ToT in arbitrary trees (Phase F) — to be scoped separately.

…t target (Phase E) An index shared by the two children of a product is fused iff the node's target carries it, contracted otherwise; an index neither the sibling nor the target carries is consumed within the child subtree and not demanded of it. available_indices() (per-subtree leaf-annotation union, valid before init) supplies the up-pass; each child dictates the ORDER of its demand via preferred_layout() (canonical (fused, left-free, right-free) for products, pass-through elsewhere). Expressions consumed without a target (reductions) retain the bottom-up contraction convention.

…reaming re-permute A target that differs from the canonical (fused..., left-free..., right-free...) result layout cannot be folded into the batched tile op (BatchedContractReduce must be perm-free); evaluate canonically (Summa over a slab-replicated pmap) and re-permute to the target with a streaming UnaryEvalImpl. Honors the implicit-permute contract: when the consumer fuses the permutation into its own operation (transposed GEMM), only the tile ordinals/trange are remapped and contents stay canonical. Replaces the interleaved-target gate, enabling general products at inner expression-tree nodes and non-canonical root targets.

…target)

Copilot

Pull request overview

Adds Phase E support for top-down index-set deduction across arbitrary expression trees, enabling inner-node general products (fused + contracted + free indices) and lifting the prior “general product must be root” limitation. It also adds a streaming re-permute wrapper so general products can evaluate in canonical layout and re-permute to non-canonical consumer layouts without injecting permutations into the perm-free batched tile op.

Changes:

Implement top-down child-demand deduction via new available_indices() up-pass and preferred_layout() demand ordering.
Support non-canonical general-product result layouts by evaluating canonically and applying a streaming unary re-permute.
Update/expand general-product tests to cover inner-node cases and non-canonical targets (including THC one-expression reconstruction).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`tests/general_product.cpp`	Replaces the former inner-node “gated throw” test with executing cases; adds coverage for non-canonical targets and depth-2 inner-node paths.
`src/TiledArray/expressions/unary_engine.h`	Adds Phase-E plumbing (`available_indices`, `preferred_layout`) for unary nodes.
`src/TiledArray/expressions/mult_engine.h`	Implements Phase-E top-down demand propagation and adds canonical product `preferred_layout()` for products.
`src/TiledArray/expressions/leaf_engine.h`	Exposes leaf `available_indices()` and a pass-through `preferred_layout()`.
`src/TiledArray/expressions/cont_engine.h`	Adds streaming re-permute wrapper for general products with non-canonical targets; updates general init_struct logic accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

evaleev · 2026-06-12T14:01:06Z

+  struct GeneralRepermuteOp {
+    typedef value_type result_type;
+    typedef value_type argument_type;
+    static constexpr bool is_consumable = false;
+    BipartitePermutation perm;
+    /// false when the consumer fuses the permutation into its own operation
+    /// (implicit permute, e.g. a transposed GEMM): then only the tile
+    /// ordinals/trange are remapped (by the host UnaryEvalImpl) and the tile
+    /// contents are delivered in the canonical layout
+    bool permute_contents = true;
+    result_type operator()(const argument_type& tile) const {
+      if (!permute_contents) return tile;
+      TiledArray::detail::Noop<value_type, value_type, false> noop;
+      return noop(tile, perm);
+    }
+  };


Fixed in f8f4090: GeneralRepermuteOp now stores a plain outer Permutation and is constructed with outer(perm_); the inner perm stays the responsibility of init_struct_general/implicit_permute_inner_. Verified general_product_suite (incl. all ToT repermute cases) stays green.

Fixed in f8f4090. GeneralRepermuteOp now stores/applies only the outer Permutation (outer(perm_)); the wrapper exists for outer result-layout reorder only, and inner (within-cell) ToT permutation is handled separately via init_struct_general / implicit_permute_inner_. With the full bipartite perm this was a no-op while inner(perm_) is identity, but a latent inner double-apply if an inner perm were ever deferred to a downstream op.

evaleev · 2026-06-12T14:01:08Z

+      BinaryEngine_::left_.init_indices(BinaryEngine_::left_.preferred_layout(
+          bipartite_demand(avail_l, avail_r, target_indices)));
+      BinaryEngine_::right_.init_indices(BinaryEngine_::right_.preferred_layout(
+          bipartite_demand(avail_r, avail_l, target_indices)));


Addressed in f8f4090 by materializing the demands as named lvalues. (Strictly, the original g(f(temp)) was already safe — the temporary lives to the end of the full expression — but the named-lvalue form is clearer and robust against future refactors that might store the returned reference.)

Applied in f8f4090 (materialize each child demand as a named lvalue before preferred_layout()). Strictly this was not UB as written: the bipartite_demand(...) temporary lives to the end of the full expression, and init_indices() consumes the returned reference (copying into indices_) within that same full expression, so it was well-defined. But leaf/binary/unary preferred_layout() do return a reference to their argument, so the named-lvalue form removes the bind-to-temporary footgun.

- GeneralRepermuteOp: store/apply only the OUTER result-layout permutation. The streaming re-permute wrapper exists to reorder a general product's outer (result) layout; inner (within-cell) permutation of ToT results is handled separately (init_struct_general / implicit_permute_inner_). Using the full bipartite perm_ would also permute inner cells -- a no-op when the inner perm is identity, but a latent double-apply if an inner perm is ever deferred to a downstream op. Pass outer(perm_) into the op; the host UnaryEvalImpl still receives full perm_ for ordinal/trange remap. - MultEngine down-pass: materialize each child demand as a named lvalue before preferred_layout(), which returns a reference to its argument for leaf/binary engines -- avoids a needlessly fragile bind-to-temporary.

evaleev added 3 commits June 11, 2026 21:03

tests: inner-node general products (THC, depth-2, non-canonical root …

7c5d1c2

…target)

evaleev mentioned this pull request Jun 12, 2026

expressions: mixed T x ToT products in arbitrary expression trees (Phase F) #564

Merged

evaleev requested a review from Copilot June 12, 2026 12:49

Copilot started reviewing on behalf of evaleev June 12, 2026 12:50 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

evaleev mentioned this pull request Jun 12, 2026

summa: 3-d (proc_h) process grid for batched general products #565

Open

evaleev merged commit cf8e9f0 into evaleev/feature/general-product-expr Jun 12, 2026
9 checks passed

evaleev deleted the evaleev/feature/general-product-tree-deduction branch June 12, 2026 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expressions: tree-general index deduction (Phase E) — inner-node general products#563

expressions: tree-general index deduction (Phase E) — inner-node general products#563
evaleev merged 4 commits into
evaleev/feature/general-product-exprfrom
evaleev/feature/general-product-tree-deduction

evaleev commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

evaleev Jun 12, 2026

Uh oh!

evaleev Jun 12, 2026

Uh oh!

Uh oh!

evaleev Jun 12, 2026

Uh oh!

evaleev Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

evaleev commented Jun 12, 2026

What

Tests

Out of scope (follow-ups)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

evaleev Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

evaleev Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

evaleev Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

evaleev Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants