Skip to content

[Parquet] Avoid Mask --> RowSelection --> Mask conversion to improve predicate pushdown performance #8844

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

After the great work from @hhhizzz in #8733, we will (finally) have the ability to use a Bitmask filter representation when applying filters during Parquet decode.

#8733 automatically converts an existing RowSelection (aka a Vec<RowSelector> of ranges) into a bitmask for evaluation.

However, at the moment, when a filter is initially evaluated, it is always converted from Bitmask --> RowSelection here:
https://github.com/apache/arrow-rs/blob/911331aafa13f5e230440cf5d02feb245985c64e/parquet/src/arrow/arrow_reader/read_plan.rs#L168-L167

This leads to inefficiency in the case where a Bitmask is converted to a RowSelection only to be turned back into a Bitmask for evaluation

Describe the solution you'd like
Add a way to avoid converting from a Mask --> Selection with the result of evaluating predicates

I think the tricky bit will be to quickly look at a Mask and determine if it should be turned back into a Selection (probably we can use the same heuristics that @hhhizzz added in #8733 for going the other way)

Describe alternatives you've considered

Additional context
See here for discussion on the initial PR:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions