-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Configurable bitmap index encoding strategies for numeric fields in nested data columns #18722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
...essing/src/main/java/org/apache/druid/segment/nested/NestedCommonFormatColumnFormatSpec.java
Fixed
Show fixed
Hide fixed
| public void testIngestAndScanSegmentsRollup() throws Exception | ||
| @Parameters(method = "getNestedColumnFormatSpec") | ||
| @TestCaseName("{0}") | ||
| public void testIngestAndScanSegmentsWithSpec(String name, boolean auto, NestedCommonFormatColumnFormatSpec spec) |
Check notice
Code scanning / CodeQL
Useless parameter Note test
processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java
Dismissed
Show dismissed
Hide dismissed
| @Parameters(method = "getNestedColumnFormatSpec") | ||
| @TestCaseName("{0}") | ||
| public void testIngestAndScanSegmentsTsv(String name, NestedCommonFormatColumnFormatSpec spec) throws Exception | ||
| public void testIngestAndScanSegmentsTsv(String name, boolean auto, NestedCommonFormatColumnFormatSpec spec) |
Check notice
Code scanning / CodeQL
Useless parameter Note test
| public void testIngestAndScanSegmentsAndFilter() throws Exception | ||
| @Parameters(method = "getNestedColumnFormatSpec") | ||
| @TestCaseName("{0}") | ||
| public void testIngestAndScanSegmentsAndFilter(String name, boolean auto, NestedCommonFormatColumnFormatSpec spec) |
Check notice
Code scanning / CodeQL
Useless parameter Note test
| @Parameters(method = "getNestedColumnFormatSpec") | ||
| @TestCaseName("{0}") | ||
| public void testIngestAndScanSegmentsAndRangeFilter( | ||
| String name, |
Check notice
Code scanning / CodeQL
Useless parameter Note test
| @Parameters(method = "getNestedColumnFormatSpec") | ||
| @TestCaseName("{0}") | ||
| public void testIngestAndScanSegmentsRealtimeAutoExplicit( | ||
| String name, |
Check notice
Code scanning / CodeQL
Useless parameter Note test
| @Parameters(method = "getNestedColumnFormatSpec") | ||
| @TestCaseName("{0}") | ||
| public void testIngestAndScanSegmentsAndFilterPartialPathArrayIndex( | ||
| String name, |
Check notice
Code scanning / CodeQL
Useless parameter Note test
| @Parameters(method = "getNestedColumnFormatSpec") | ||
| @TestCaseName("{0}") | ||
| public void testIngestAndScanSegmentsAndFilterPartialPath( | ||
| String name, |
Check notice
Code scanning / CodeQL
Useless parameter Note test
| @Parameters(method = "getNestedColumnFormatSpec") | ||
| @TestCaseName("{0}") | ||
| public void testIngestAndScanSegmentsNestedColumnNotNullFilter( | ||
| String name, |
Check notice
Code scanning / CodeQL
Useless parameter Note test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces configurable bitmap index encoding strategies for numeric fields in nested data columns, allowing users to choose between full dictionary-based indexing and nulls-only indexing to optimize storage.
Key Changes:
- Added
BitmapIndexEncodingStrategyabstraction with two implementations:DictionaryId(full indexing) andNullsOnly(nulls-only indexing) - Updated
NestedCommonFormatColumnFormatSpecto includenumericFieldsBitmapIndexEncodingconfiguration - Refactored test utilities to use a new
SegmentBuilderpattern for cleaner test code
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| BitmapIndexEncodingStrategy.java | New abstraction defining strategies for encoding bitmap indexes |
| NestedCommonFormatColumnFormatSpec.java | Added numericFieldsBitmapIndexEncoding field and updated serialization |
| GlobalDictionaryEncodedFieldColumnWriter.java | Refactored to use configurable bitmap encoding strategy |
| ScalarLongFieldColumnWriter.java | Set bitmap encoding strategy from column format spec |
| ScalarDoubleFieldColumnWriter.java | Set bitmap encoding strategy from column format spec |
| CompressedNestedDataComplexColumn.java | Updated to use format spec for bitmap encoding decisions |
| NestedDataColumnSupplier.java | Changed to use format spec instead of bitmap serde factory |
| NestedDataColumnSupplierV4.java | Changed to use format spec instead of bitmap serde factory |
| NestedDataColumnV3.java | Changed parameter type from BitmapSerdeFactory to format spec |
| NestedDataColumnV4.java | Changed parameter type from BitmapSerdeFactory to format spec |
| NestedDataColumnV5.java | Changed parameter type from BitmapSerdeFactory to format spec |
| NestedCommonFormatColumnPartSerde.java | Updated FormatSpec to include numericFieldsBitmapIndex |
| VariantFieldColumnWriter.java | Removed redundant writeColumnTo method |
| VariantArrayFieldColumnWriter.java | Removed redundant writeColumnTo method |
| ScalarStringFieldColumnWriter.java | Removed redundant writeColumnTo method |
| NestedDataTestUtils.java | Refactored with new SegmentBuilder pattern for test data creation |
| NestedDataScanQueryTest.java | Updated tests to use SegmentBuilder and test new bitmap strategies |
| NestedDataColumnSchemaTest.java | Updated test to include bitmap encoding strategy |
| NestedDataColumnSupplierTest.java | Fixed parameter in test (bitmapSerdeFactory → columnFormatSpec) |
| NestedCommonFormatColumnFormatSpecTest.java | Added test coverage for numericFieldsBitmapIndexEncoding |
| BitmapIndexEncodingStrategyTest.java | New test file for bitmap encoding strategy serialization |
| BuiltInTypesModuleTest.java | Updated test to verify numericFieldsBitmapIndexEncoding configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...essing/src/main/java/org/apache/druid/segment/nested/NestedCommonFormatColumnFormatSpec.java
Outdated
Show resolved
Hide resolved
| private IndexSpec indexSpec = IndexSpec.getDefault(); | ||
|
|
||
| /** | ||
| * Builder for an {@link IncrementalIndexSegment} or a list of{@link QueryableIndexSegment}, with some defaults: |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space between 'of' and '{@link QueryableIndexSegment}'.
| * Builder for an {@link IncrementalIndexSegment} or a list of{@link QueryableIndexSegment}, with some defaults: | |
| * Builder for an {@link IncrementalIndexSegment} or a list of {@link QueryableIndexSegment}, with some defaults: |
| .build(); | ||
| Query<ScanResultValue> scanQuery = queryBuilder() | ||
| .columns("timestamp", "str", "double", "bool", "variant", | ||
| "variantNumeric", "variantEmptyObj", "variantEmtpyArray", "variantWithArrays" |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'variantEmtpyArray' to 'variantEmptyArray'.
| "variantNumeric", "variantEmptyObj", "variantEmtpyArray", "variantWithArrays" | |
| "variantNumeric", "variantEmptyObj", "variantEmptyArray", "variantWithArrays" |
| .build(); | ||
| Query<ScanResultValue> scanQuery = queryBuilder() | ||
| .columns("timestamp", "str", "double", "bool", "variant", | ||
| "variantNumeric", "variantEmptyObj", "variantEmtpyArray", "variantWithArrays" |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'variantEmtpyArray' to 'variantEmptyArray'.
| "variantNumeric", "variantEmptyObj", "variantEmtpyArray", "variantWithArrays" | |
| "variantNumeric", "variantEmptyObj", "variantEmptyArray", "variantWithArrays" |
…dCommonFormatColumnFormatSpec.java Co-authored-by: Copilot <[email protected]>
BitmapIndexEncodingStrategy to control the bitmap encoding in a nested column
BitmapIndexEncodingStrategy to control the bitmap encoding in a nested column
Description
BitmapIndexEncodingStrategyabstraction with two implementations: DictionaryId (full indexing) and NullsOnly (nulls-only indexing). It can be configured viaNumericFieldsBitmapIndexEncodinginNestedCommonFormatColumnFormatSpec.NestedDataScanQueryTestto use parameterized testing for better coverage and maintainability, added aSegmentBuilderclass to build segment for cleaner test code.GlobalDictionaryEncodedFieldColumnWriterto makegetSerializedColumnSizeandwriteColumnTosize match in the same class for consistency.Key changed/added classes in this PR
BitmapIndexEncodingStrategyBitmapIndexEncodingStrategy.DictionaryIdBitmapIndexEncodingStrategy.NullsOnlyNestedCommonFormatColumnFormatSpecNestedDataScanQueryTestThis PR has: