Skip to content

[Parquet] Improve speed of conversion between RowSelection <--> Mask #8847

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

After the great work from @hhhizzz in #8733, we will (finally) have the ability to use a Bitmask filter representation when applying filters during Parquet decode.

During review, one thing we noticed is that the code that is used to convert a RowSelection to a Mask could likely be optimized more

fn boolean_mask_from_selectors(selectors: &[RowSelector]) -> BooleanBuffer {
let total_rows: usize = selectors.iter().map(|s| s.row_count).sum();
let mut builder = BooleanBufferBuilder::new(total_rows);
for selector in selectors {
builder.append_n(selector.row_count, !selector.skip);
}
builder.finish()
}

Describe the solution you'd like

Make predicate evaluation faster by optimizing the conversion to mask

Describe alternatives you've considered
@hhhizzz mentions on #8733 (comment)

I'll learn the #6624 (comment) and try to improve it in the following PRs.

Aka that @XiangpengHao 's PR here has many tricks to use

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions