[python] Fix filter on data evolution table not working issue#7211
[python] Fix filter on data evolution table not working issue#7211XiaoHongbo-Hope wants to merge 6 commits intoapache:masterfrom
Conversation
4354588 to
b8e709d
Compare
c694ad0 to
867ca01
Compare
867ca01 to
653f23e
Compare
| simple_null = self._filter_batch_simple_null(batch) | ||
| if simple_null is not None: | ||
| return simple_null | ||
| if not self.predicate.has_null_check(): |
There was a problem hiding this comment.
We can just use Predicate.to_arrow, it is same to other table modes.
There was a problem hiding this comment.
We can just use
Predicate.to_arrow, it is same to other table modes.
Updated
| schema=result.schema, | ||
| ) | ||
| except (TypeError, ValueError, pa.ArrowInvalid) as e: | ||
| logger.debug( |
There was a problem hiding this comment.
What exception here?
Used Predicate.to_arrow. This code is removed
There was a problem hiding this comment.
We can refactor this, this if should be in is_primary_key_table.
There was a problem hiding this comment.
We can refactor this, this if should be in
is_primary_key_table.
Updated
| if not self.predicate: | ||
| return True | ||
| if self.predicate_for_stats is None: | ||
| if self.predicate_for_stats is None or self.data_evolution: |
There was a problem hiding this comment.
Create a separate if. And we should add comments to this if, explain why there is no filtering done here.
There was a problem hiding this comment.
Create a separate
if. And we should add comments to this if, explain why there is no filtering done here.
Updated.
| super().__init__(table, predicate, read_type, actual_split, row_tracking_enabled) | ||
|
|
||
| def _push_down_predicate(self) -> Optional[Predicate]: | ||
| # Do not push predicate to file readers; |
There was a problem hiding this comment.
Detailed comments, why not push predicate.
There was a problem hiding this comment.
Detailed comments, why not push predicate.
Added
| return ConcatBatchReader(suppliers) | ||
| merge_reader = ConcatBatchReader(suppliers) | ||
| if self.predicate is not None: | ||
| # Only apply filter when all predicate columns are in read_type (e.g. projected schema). |
There was a problem hiding this comment.
What we are returning here is complete row, right? So this check should be applicable to all table modes? That shouldn't be added here, it should be verified in SplitRead.init.
There was a problem hiding this comment.
What we are returning here is complete row, right? So this check should be applicable to all table modes? That shouldn't be added here, it should be verified in
SplitRead.init.
Updated
Purpose/Problem
Filter not working on data evolution read: when a predicate is provided, all rows are returned.
Tests
API and Format
Documentation