-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
After the great work from @hhhizzz in #8733, we will (finally) have the ability to use a Bitmask filter representation when applying filters during Parquet decode.
#8733 automatically converts an existing RowSelection (aka a Vec<RowSelector> of ranges) into a bitmask for evaluation.
However, at the moment, when a filter is initially evaluated, it is always converted from Bitmask --> RowSelection here:
https://github.com/apache/arrow-rs/blob/911331aafa13f5e230440cf5d02feb245985c64e/parquet/src/arrow/arrow_reader/read_plan.rs#L168-L167
This leads to inefficiency in the case where a Bitmask is converted to a RowSelection only to be turned back into a Bitmask for evaluation
Describe the solution you'd like
Add a way to avoid converting from a Mask --> Selection with the result of evaluating predicates
I think the tricky bit will be to quickly look at a Mask and determine if it should be turned back into a Selection (probably we can use the same heuristics that @hhhizzz added in #8733 for going the other way)
Describe alternatives you've considered
Additional context
See here for discussion on the initial PR: