Skip to content

[SPARK-56444][SQL] Fix equals/hashCode contract violations in BatchScanExec#55297

Open
anton5798 wants to merge 1 commit intoapache:masterfrom
anton5798:batchscan-equals-hashcode
Open

[SPARK-56444][SQL] Fix equals/hashCode contract violations in BatchScanExec#55297
anton5798 wants to merge 1 commit intoapache:masterfrom
anton5798:batchscan-equals-hashcode

Conversation

@anton5798
Copy link
Copy Markdown
Contributor

@anton5798 anton5798 commented Apr 10, 2026

What changes were proposed in this pull request?

Fix two contract violations in BatchScanExec.equals()/hashCode():

  1. Reflexivity violation: equals() had a this.batch != null guard, so x.equals(x) returned false when batch was null (which happens after serialization round-trip since scan is @transient).
  2. hashCode inconsistency: hashCode() did not include keyGroupedPartitioning, but equals() compared it — violating the contract that equal objects must have equal hash codes.

Why are the changes needed?

These violations can cause incorrect behavior when BatchScanExec instances are used in hash-based collections (e.g., HashMap, HashSet) or when comparing plans after serialization.

How was this patch tested?

Added unit tests in SparkPlanSuite verifying:

  • Reflexivity when batch is null
  • hashCode consistency with equals
  • hashCode sensitivity to keyGroupedPartitioning

Was this patch authored or co-authored using generative AI tooling?

Yes.

…anExec

### What changes were proposed in this pull request?

Fix two contract violations in `BatchScanExec.equals()`/`hashCode()`:

1. **Reflexivity violation**: `equals()` had a `this.batch != null` guard, so `x.equals(x)` returned `false` when `batch` was null (which happens after serialization round-trip since `scan` is `@transient`).
2. **hashCode inconsistency**: `hashCode()` did not include `keyGroupedPartitioning`, but `equals()` compared it — violating the contract that equal objects must have equal hash codes.

### Why are the changes needed?

These violations can cause incorrect behavior when `BatchScanExec` instances are used in hash-based collections (e.g., `HashMap`, `HashSet`) or when comparing plans after serialization.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit tests in `SparkPlanSuite` verifying:
- Reflexivity when `batch` is null
- hashCode consistency with equals
- hashCode sensitivity to `keyGroupedPartitioning`

### Was this patch authored or co-authored using generative AI tooling?

Yes.
@anton5798 anton5798 changed the title [SPARK-XXXXX][SQL] Fix equals/hashCode contract violations in BatchSc… [SPARK-XXXXX][SQL] Fix equals/hashCode contract violations in BatchScanExec Apr 10, 2026
@anton5798 anton5798 changed the title [SPARK-XXXXX][SQL] Fix equals/hashCode contract violations in BatchScanExec [SPARK-56444][SQL] Fix equals/hashCode contract violations in BatchScanExec Apr 10, 2026
Copy link
Copy Markdown
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending CI.

@anton5798, double check this PR that added this.batch != null in equals.

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, small comment

keyGroupedPartitioning = Some(Seq(Literal(2)))
)
assert(!exec1.equals(exec2))
assert(exec1.hashCode() != exec2.hashCode())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would this fail with very low probability? should we remove?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants