expressions: tree-general index deduction (Phase E) — inner-node general products#563
Conversation
…t target (Phase E) An index shared by the two children of a product is fused iff the node's target carries it, contracted otherwise; an index neither the sibling nor the target carries is consumed within the child subtree and not demanded of it. available_indices() (per-subtree leaf-annotation union, valid before init) supplies the up-pass; each child dictates the ORDER of its demand via preferred_layout() (canonical (fused, left-free, right-free) for products, pass-through elsewhere). Expressions consumed without a target (reductions) retain the bottom-up contraction convention.
…reaming re-permute A target that differs from the canonical (fused..., left-free..., right-free...) result layout cannot be folded into the batched tile op (BatchedContractReduce must be perm-free); evaluate canonically (Summa over a slab-replicated pmap) and re-permute to the target with a streaming UnaryEvalImpl. Honors the implicit-permute contract: when the consumer fuses the permutation into its own operation (transposed GEMM), only the tile ordinals/trange are remapped and contents stay canonical. Replaces the interleaved-target gate, enabling general products at inner expression-tree nodes and non-canonical root targets.
There was a problem hiding this comment.
Pull request overview
Adds Phase E support for top-down index-set deduction across arbitrary expression trees, enabling inner-node general products (fused + contracted + free indices) and lifting the prior “general product must be root” limitation. It also adds a streaming re-permute wrapper so general products can evaluate in canonical layout and re-permute to non-canonical consumer layouts without injecting permutations into the perm-free batched tile op.
Changes:
- Implement top-down child-demand deduction via new
available_indices()up-pass andpreferred_layout()demand ordering. - Support non-canonical general-product result layouts by evaluating canonically and applying a streaming unary re-permute.
- Update/expand general-product tests to cover inner-node cases and non-canonical targets (including THC one-expression reconstruction).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tests/general_product.cpp |
Replaces the former inner-node “gated throw” test with executing cases; adds coverage for non-canonical targets and depth-2 inner-node paths. |
src/TiledArray/expressions/unary_engine.h |
Adds Phase-E plumbing (available_indices, preferred_layout) for unary nodes. |
src/TiledArray/expressions/mult_engine.h |
Implements Phase-E top-down demand propagation and adds canonical product preferred_layout() for products. |
src/TiledArray/expressions/leaf_engine.h |
Exposes leaf available_indices() and a pass-through preferred_layout(). |
src/TiledArray/expressions/cont_engine.h |
Adds streaming re-permute wrapper for general products with non-canonical targets; updates general init_struct logic accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| struct GeneralRepermuteOp { | ||
| typedef value_type result_type; | ||
| typedef value_type argument_type; | ||
| static constexpr bool is_consumable = false; | ||
| BipartitePermutation perm; | ||
| /// false when the consumer fuses the permutation into its own operation | ||
| /// (implicit permute, e.g. a transposed GEMM): then only the tile | ||
| /// ordinals/trange are remapped (by the host UnaryEvalImpl) and the tile | ||
| /// contents are delivered in the canonical layout | ||
| bool permute_contents = true; | ||
| result_type operator()(const argument_type& tile) const { | ||
| if (!permute_contents) return tile; | ||
| TiledArray::detail::Noop<value_type, value_type, false> noop; | ||
| return noop(tile, perm); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Fixed in f8f4090: GeneralRepermuteOp now stores a plain outer Permutation and is constructed with outer(perm_); the inner perm stays the responsibility of init_struct_general/implicit_permute_inner_. Verified general_product_suite (incl. all ToT repermute cases) stays green.
There was a problem hiding this comment.
Fixed in f8f4090. GeneralRepermuteOp now stores/applies only the outer Permutation (outer(perm_)); the wrapper exists for outer result-layout reorder only, and inner (within-cell) ToT permutation is handled separately via init_struct_general / implicit_permute_inner_. With the full bipartite perm this was a no-op while inner(perm_) is identity, but a latent inner double-apply if an inner perm were ever deferred to a downstream op.
| BinaryEngine_::left_.init_indices(BinaryEngine_::left_.preferred_layout( | ||
| bipartite_demand(avail_l, avail_r, target_indices))); | ||
| BinaryEngine_::right_.init_indices(BinaryEngine_::right_.preferred_layout( | ||
| bipartite_demand(avail_r, avail_l, target_indices))); |
There was a problem hiding this comment.
Addressed in f8f4090 by materializing the demands as named lvalues. (Strictly, the original g(f(temp)) was already safe — the temporary lives to the end of the full expression — but the named-lvalue form is clearer and robust against future refactors that might store the returned reference.)
There was a problem hiding this comment.
Applied in f8f4090 (materialize each child demand as a named lvalue before preferred_layout()). Strictly this was not UB as written: the bipartite_demand(...) temporary lives to the end of the full expression, and init_indices() consumes the returned reference (copying into indices_) within that same full expression, so it was well-defined. But leaf/binary/unary preferred_layout() do return a reference to their argument, so the named-lvalue form removes the bind-to-temporary footgun.
- GeneralRepermuteOp: store/apply only the OUTER result-layout permutation. The streaming re-permute wrapper exists to reorder a general product's outer (result) layout; inner (within-cell) permutation of ToT results is handled separately (init_struct_general / implicit_permute_inner_). Using the full bipartite perm_ would also permute inner cells -- a no-op when the inner perm is identity, but a latent double-apply if an inner perm is ever deferred to a downstream op. Pass outer(perm_) into the op; the host UnaryEvalImpl still receives full perm_ for ordinal/trange remap. - MultEngine down-pass: materialize each child demand as a named lvalue before preferred_layout(), which returns a reference to its argument for leaf/binary engines -- avoids a needlessly fragile bind-to-temporary.
cf8e9f0
into
evaleev/feature/general-product-expr
Stacked on #562. Completes the general-product story for arbitrary expression trees: general products (fused + contracted + free indices) no longer need to sit at the root of their own assignment.
What
MultEngine::init_indices(target)): an index shared by a product's two children is fused iff the node's target carries it, contracted at the node otherwise; an index neither the sibling nor the target carries is consumed entirely within the child subtree and is dropped from its demand (pure intersection). The up-pass is the newavailable_indices()(per-subtree leaf-annotation union, valid before init); each child dictates the ORDER of its demand viapreferred_layout()(canonical (fused, left-free, right-free) for products). Expressions consumed without a target (reductions) retain the bottom-up contraction convention and remain unsupported for inner general products.UnaryEvalImplwrapper re-permutes to the consumer's layout. Honors TA's implicit-permute contract: when the consumer fuses the permutation into its transposed GEMM (implicit_permute_outer), only tile ordinals/trange are remapped and contents stay canonical. This also lifts the former root-level interleaved-target limitation.With this, the THC reconstruction evaluates in ONE expression:
(left-deep tree; r1 is fused where the two x factors meet, contracted where z joins, and dropped from demands above), verified against the explicit-intermediates staging.
Tests
expression_general_product_inner_node_gatedthrow test is superseded (the expression now evaluates).assign_subblock_block_base1failures), tot_contraction/permutations — green.Out of scope (follow-ups)
ScalMultEngine(scaled products) keeps depth-1 semantics for inner general products.