Skip to content

[SPARK-56046][SQL] Typed SPJ partition key Reducers#54884

Open
peter-toth wants to merge 1 commit intoapache:masterfrom
peter-toth:SPARK-56046-typed-spj-reducers
Open

[SPARK-56046][SQL] Typed SPJ partition key Reducers#54884
peter-toth wants to merge 1 commit intoapache:masterfrom
peter-toth:SPARK-56046-typed-spj-reducers

Conversation

@peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Mar 18, 2026

What changes were proposed in this pull request?

This PR adds a new method to SPJ partition key Reducers to return the type of a reduced partition key.

Why are the changes needed?

After the SPJ refactor some Iceberg SPJ tests, that join a hours transform partitioned table with a days transform partitioned table, started to fail. This is because after the refactor the keys of a KeyedPartitioning partitioning are InternalRowComparableWrappers, which include the type of the key, and when the partition keys are reduced the type of the reduced keys are inherited from their original type.
This means that when hours transformed hour keys are reduced to days, the keys actually remain having IntegerType type, while the days transformed keys have DateType type in Iceberg. This type difference causes that the left and right side InternalRowComparableWrappers are not considered equal despite their InternalRow raw key data are equal.

Before the refactor the type of (possibly reduced) partition keys were not stored in the partitioning. When the left and right side raw keys were compared in EnsureRequirement a common comparator was initialized with the type of the left side keys.
So in the Iceberg SPJ tests the IntegerType keys were forced to be interpreted as DateType, or the DateType keys were forced to be interpreted as IntegerType, depending on the join order of the tables.
The reason why this was not causing any issues is that the PhysicalDataType of both DateType and IntegerType logical types is PhysicalIntegerType.

This PR:

  • Introduce a new resultType() method of Reducer to return the correct type of the reduced keys.
  • Properly compares the left and right side reduced key types and return an error when they are not the same.
  • Adds a new spark.sql.legacy.allowIncompatibleTransformTypes.enabled=true flag to keep the old behavior and consider the reduced keys types the same if they share a common physical type.

Does this PR introduce any user-facing change?

Yes, the reduced key types are now properly compared and incompatibilities are reported to users, but the legacy flag can allow the old behaviour.

How was this patch tested?

Added new UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

@peter-toth peter-toth force-pushed the SPARK-56046-typed-spj-reducers branch from 580ca49 to fa4bce7 Compare March 18, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant