[SPARK-56046][SQL] Typed SPJ partition key Reducers#54884
Open
peter-toth wants to merge 1 commit intoapache:masterfrom
Open
[SPARK-56046][SQL] Typed SPJ partition key Reducers#54884peter-toth wants to merge 1 commit intoapache:masterfrom
Reducers#54884peter-toth wants to merge 1 commit intoapache:masterfrom
Conversation
580ca49 to
fa4bce7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR adds a new method to SPJ partition key
Reducers to return the type of a reduced partition key.Why are the changes needed?
After the SPJ refactor some Iceberg SPJ tests, that join a
hourstransform partitioned table with adaystransform partitioned table, started to fail. This is because after the refactor the keys of aKeyedPartitioningpartitioning areInternalRowComparableWrappers, which include the type of the key, and when the partition keys are reduced the type of the reduced keys are inherited from their original type.This means that when
hourstransformed hour keys are reduced to days, the keys actually remain havingIntegerTypetype, while thedaystransformed keys haveDateTypetype in Iceberg. This type difference causes that the left and right sideInternalRowComparableWrappers are not considered equal despite theirInternalRowraw key data are equal.Before the refactor the type of (possibly reduced) partition keys were not stored in the partitioning. When the left and right side raw keys were compared in
EnsureRequirementa common comparator was initialized with the type of the left side keys.So in the Iceberg SPJ tests the
IntegerTypekeys were forced to be interpreted asDateType, or theDateTypekeys were forced to be interpreted asIntegerType, depending on the join order of the tables.The reason why this was not causing any issues is that the
PhysicalDataTypeof bothDateTypeandIntegerTypelogical types isPhysicalIntegerType.This PR:
resultType()method ofReducerto return the correct type of the reduced keys.spark.sql.legacy.allowIncompatibleTransformTypes.enabled=trueflag to keep the old behavior and consider the reduced keys types the same if they share a common physical type.Does this PR introduce any user-facing change?
Yes, the reduced key types are now properly compared and incompatibilities are reported to users, but the legacy flag can allow the old behaviour.
How was this patch tested?
Added new UTs.
Was this patch authored or co-authored using generative AI tooling?
No.