-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
After the great work from @hhhizzz in #8733, we will (finally) have the ability to use a Bitmask filter representation when applying filters during Parquet decode.
During review, one thing we noticed is that the code that is used to convert a RowSelection to a Mask could likely be optimized more
arrow-rs/parquet/src/arrow/arrow_reader/selection.rs
Lines 926 to 933 in 911331a
| fn boolean_mask_from_selectors(selectors: &[RowSelector]) -> BooleanBuffer { | |
| let total_rows: usize = selectors.iter().map(|s| s.row_count).sum(); | |
| let mut builder = BooleanBufferBuilder::new(total_rows); | |
| for selector in selectors { | |
| builder.append_n(selector.row_count, !selector.skip); | |
| } | |
| builder.finish() | |
| } |
Describe the solution you'd like
Make predicate evaluation faster by optimizing the conversion to mask
Describe alternatives you've considered
@hhhizzz mentions on #8733 (comment)
I'll learn the #6624 (comment) and try to improve it in the following PRs.
Aka that @XiangpengHao 's PR here has many tricks to use
Additional context