Skip to content

Conversation

@cecemei
Copy link
Contributor

@cecemei cecemei commented Nov 6, 2025

Description

  • Main change: users can now choose between full dictionary-based indexing and nulls-only indexing. This is done via a new BitmapIndexEncodingStrategy abstraction with two implementations: DictionaryId (full indexing) and NullsOnly (nulls-only indexing). It can be configured via NumericFieldsBitmapIndexEncoding in NestedCommonFormatColumnFormatSpec.
  • Refactored NestedDataScanQueryTest to use parameterized testing for better coverage and maintainability, added a SegmentBuilder class to build segment for cleaner test code.
  • Minor update on GlobalDictionaryEncodedFieldColumnWriter to make getSerializedColumnSize and writeColumnTo size match in the same class for consistency.
Key changed/added classes in this PR
  • BitmapIndexEncodingStrategy
  • BitmapIndexEncodingStrategy.DictionaryId
  • BitmapIndexEncodingStrategy.NullsOnly
  • NestedCommonFormatColumnFormatSpec
  • NestedDataScanQueryTest

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

public void testIngestAndScanSegmentsRollup() throws Exception
@Parameters(method = "getNestedColumnFormatSpec")
@TestCaseName("{0}")
public void testIngestAndScanSegmentsWithSpec(String name, boolean auto, NestedCommonFormatColumnFormatSpec spec)

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'name' is never used.
@Parameters(method = "getNestedColumnFormatSpec")
@TestCaseName("{0}")
public void testIngestAndScanSegmentsTsv(String name, NestedCommonFormatColumnFormatSpec spec) throws Exception
public void testIngestAndScanSegmentsTsv(String name, boolean auto, NestedCommonFormatColumnFormatSpec spec)

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'name' is never used.
public void testIngestAndScanSegmentsAndFilter() throws Exception
@Parameters(method = "getNestedColumnFormatSpec")
@TestCaseName("{0}")
public void testIngestAndScanSegmentsAndFilter(String name, boolean auto, NestedCommonFormatColumnFormatSpec spec)

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'name' is never used.
@Parameters(method = "getNestedColumnFormatSpec")
@TestCaseName("{0}")
public void testIngestAndScanSegmentsAndRangeFilter(
String name,

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'name' is never used.
@Parameters(method = "getNestedColumnFormatSpec")
@TestCaseName("{0}")
public void testIngestAndScanSegmentsRealtimeAutoExplicit(
String name,

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'name' is never used.
@Parameters(method = "getNestedColumnFormatSpec")
@TestCaseName("{0}")
public void testIngestAndScanSegmentsAndFilterPartialPathArrayIndex(
String name,

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'name' is never used.
@Parameters(method = "getNestedColumnFormatSpec")
@TestCaseName("{0}")
public void testIngestAndScanSegmentsAndFilterPartialPath(
String name,

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'name' is never used.
@Parameters(method = "getNestedColumnFormatSpec")
@TestCaseName("{0}")
public void testIngestAndScanSegmentsNestedColumnNotNullFilter(
String name,

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'name' is never used.
@cecemei cecemei requested a review from Copilot November 10, 2025 14:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces configurable bitmap index encoding strategies for numeric fields in nested data columns, allowing users to choose between full dictionary-based indexing and nulls-only indexing to optimize storage.

Key Changes:

  • Added BitmapIndexEncodingStrategy abstraction with two implementations: DictionaryId (full indexing) and NullsOnly (nulls-only indexing)
  • Updated NestedCommonFormatColumnFormatSpec to include numericFieldsBitmapIndexEncoding configuration
  • Refactored test utilities to use a new SegmentBuilder pattern for cleaner test code

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
BitmapIndexEncodingStrategy.java New abstraction defining strategies for encoding bitmap indexes
NestedCommonFormatColumnFormatSpec.java Added numericFieldsBitmapIndexEncoding field and updated serialization
GlobalDictionaryEncodedFieldColumnWriter.java Refactored to use configurable bitmap encoding strategy
ScalarLongFieldColumnWriter.java Set bitmap encoding strategy from column format spec
ScalarDoubleFieldColumnWriter.java Set bitmap encoding strategy from column format spec
CompressedNestedDataComplexColumn.java Updated to use format spec for bitmap encoding decisions
NestedDataColumnSupplier.java Changed to use format spec instead of bitmap serde factory
NestedDataColumnSupplierV4.java Changed to use format spec instead of bitmap serde factory
NestedDataColumnV3.java Changed parameter type from BitmapSerdeFactory to format spec
NestedDataColumnV4.java Changed parameter type from BitmapSerdeFactory to format spec
NestedDataColumnV5.java Changed parameter type from BitmapSerdeFactory to format spec
NestedCommonFormatColumnPartSerde.java Updated FormatSpec to include numericFieldsBitmapIndex
VariantFieldColumnWriter.java Removed redundant writeColumnTo method
VariantArrayFieldColumnWriter.java Removed redundant writeColumnTo method
ScalarStringFieldColumnWriter.java Removed redundant writeColumnTo method
NestedDataTestUtils.java Refactored with new SegmentBuilder pattern for test data creation
NestedDataScanQueryTest.java Updated tests to use SegmentBuilder and test new bitmap strategies
NestedDataColumnSchemaTest.java Updated test to include bitmap encoding strategy
NestedDataColumnSupplierTest.java Fixed parameter in test (bitmapSerdeFactory → columnFormatSpec)
NestedCommonFormatColumnFormatSpecTest.java Added test coverage for numericFieldsBitmapIndexEncoding
BitmapIndexEncodingStrategyTest.java New test file for bitmap encoding strategy serialization
BuiltInTypesModuleTest.java Updated test to verify numericFieldsBitmapIndexEncoding configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

private IndexSpec indexSpec = IndexSpec.getDefault();

/**
* Builder for an {@link IncrementalIndexSegment} or a list of{@link QueryableIndexSegment}, with some defaults:
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space between 'of' and '{@link QueryableIndexSegment}'.

Suggested change
* Builder for an {@link IncrementalIndexSegment} or a list of{@link QueryableIndexSegment}, with some defaults:
* Builder for an {@link IncrementalIndexSegment} or a list of {@link QueryableIndexSegment}, with some defaults:

Copilot uses AI. Check for mistakes.
.build();
Query<ScanResultValue> scanQuery = queryBuilder()
.columns("timestamp", "str", "double", "bool", "variant",
"variantNumeric", "variantEmptyObj", "variantEmtpyArray", "variantWithArrays"
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'variantEmtpyArray' to 'variantEmptyArray'.

Suggested change
"variantNumeric", "variantEmptyObj", "variantEmtpyArray", "variantWithArrays"
"variantNumeric", "variantEmptyObj", "variantEmptyArray", "variantWithArrays"

Copilot uses AI. Check for mistakes.
.build();
Query<ScanResultValue> scanQuery = queryBuilder()
.columns("timestamp", "str", "double", "bool", "variant",
"variantNumeric", "variantEmptyObj", "variantEmtpyArray", "variantWithArrays"
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'variantEmtpyArray' to 'variantEmptyArray'.

Suggested change
"variantNumeric", "variantEmptyObj", "variantEmtpyArray", "variantWithArrays"
"variantNumeric", "variantEmptyObj", "variantEmptyArray", "variantWithArrays"

Copilot uses AI. Check for mistakes.
…dCommonFormatColumnFormatSpec.java

Co-authored-by: Copilot <[email protected]>
@cecemei cecemei marked this pull request as ready for review November 10, 2025 14:32
@cecemei cecemei changed the title index-strategy Add a new BitmapIndexEncodingStrategy to control the bitmap encoding in a nested column Nov 24, 2025
@cecemei cecemei changed the title Add a new BitmapIndexEncodingStrategy to control the bitmap encoding in a nested column Configurable bitmap index encoding strategies for numeric fields in nested data columns Nov 24, 2025
@cecemei cecemei requested a review from clintropolis November 25, 2025 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant