-
Notifications
You must be signed in to change notification settings - Fork 11
Optimise IN queries for json fields in flat collections #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
suresh-prakash
merged 7 commits into
hypertrace:main
from
suddendust:flat_collections_optimise_json_fields
Nov 21, 2025
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
4a9945d
Optimise IN queries for json fields in flat collections
suddendust ceb872c
WIP
suddendust ef322b9
Fixed failing test cases
suddendust 86ed593
WIP
suddendust a792d43
Spotless
suddendust 04d6d92
Revert inadvertent changes
suddendust 01c40e0
Fixed failing test cases
suddendust File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
343 changes: 160 additions & 183 deletions
343
...store/src/integrationTest/java/org/hypertrace/core/documentstore/DocStoreQueryV1Test.java
Large diffs are not rendered by default.
Oops, something went wrong.
50 changes: 0 additions & 50 deletions
50
...java/org/hypertrace/core/documentstore/expression/impl/JsonArrayIdentifierExpression.java
This file was deleted.
Oops, something went wrong.
13 changes: 13 additions & 0 deletions
13
...-store/src/main/java/org/hypertrace/core/documentstore/expression/impl/JsonFieldType.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| package org.hypertrace.core.documentstore.expression.impl; | ||
|
|
||
| /** Represents the type of JSON fields in flat collections */ | ||
| public enum JsonFieldType { | ||
| STRING, | ||
| NUMBER, | ||
| BOOLEAN, | ||
| STRING_ARRAY, | ||
| NUMBER_ARRAY, | ||
| BOOLEAN_ARRAY, | ||
| OBJECT_ARRAY, | ||
| OBJECT | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
96 changes: 96 additions & 0 deletions
96
...umentstore/postgres/query/v1/parser/filter/PostgresInRelationalFilterParserJsonArray.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| package org.hypertrace.core.documentstore.postgres.query.v1.parser.filter; | ||
|
|
||
| import java.util.stream.Collectors; | ||
| import java.util.stream.StreamSupport; | ||
| import org.hypertrace.core.documentstore.expression.impl.JsonFieldType; | ||
| import org.hypertrace.core.documentstore.expression.impl.JsonIdentifierExpression; | ||
| import org.hypertrace.core.documentstore.expression.impl.RelationalExpression; | ||
| import org.hypertrace.core.documentstore.postgres.Params; | ||
|
|
||
| /** | ||
| * Optimized parser for IN operations on JSON array fields with type-specific casting. | ||
| * | ||
| * <p>Uses JSONB containment operator (@>) with typed jsonb_build_array for "contains any" | ||
| * semantics: | ||
| * | ||
| * <ul> | ||
| * <li><b>STRING_ARRAY:</b> {@code "document" -> 'tags' @> jsonb_build_array(?::text)} | ||
| * <li><b>NUMBER_ARRAY:</b> {@code "document" -> 'scores' @> jsonb_build_array(?::numeric)} | ||
| * <li><b>BOOLEAN_ARRAY:</b> {@code "document" -> 'flags' @> jsonb_build_array(?::boolean)} | ||
| * <li><b>OBJECT_ARRAY:</b> {@code "document" -> 'items' @> jsonb_build_array(?::jsonb)} | ||
| * </ul> | ||
| * | ||
| * <p>This checks if the JSON array contains ANY of the provided values, using efficient JSONB | ||
| * containment instead of defensive type checking. | ||
| */ | ||
| public class PostgresInRelationalFilterParserJsonArray | ||
| implements PostgresInRelationalFilterParserInterface { | ||
|
|
||
| @Override | ||
| public String parse( | ||
| final RelationalExpression expression, final PostgresRelationalFilterContext context) { | ||
| final String parsedLhs = expression.getLhs().accept(context.lhsParser()); | ||
| final Iterable<Object> parsedRhs = expression.getRhs().accept(context.rhsParser()); | ||
|
|
||
| // Extract field type for typed array handling (guaranteed to be present by selector) | ||
| JsonIdentifierExpression jsonExpr = (JsonIdentifierExpression) expression.getLhs(); | ||
| JsonFieldType fieldType = | ||
| jsonExpr | ||
| .getFieldType() | ||
| .orElseThrow( | ||
| () -> | ||
| new IllegalStateException( | ||
| "JsonFieldType must be present - this should have been caught by the selector")); | ||
|
|
||
| return prepareFilterStringForInOperator( | ||
| parsedLhs, parsedRhs, fieldType, context.getParamsBuilder()); | ||
| } | ||
|
|
||
| private String prepareFilterStringForInOperator( | ||
| final String parsedLhs, | ||
| final Iterable<Object> parsedRhs, | ||
| final JsonFieldType fieldType, | ||
| final Params.Builder paramsBuilder) { | ||
|
|
||
| // Determine the appropriate type cast for jsonb_build_array elements | ||
| String typeCast = getTypeCastForArray(fieldType); | ||
|
|
||
| // For JSON arrays, we use the @> containment operator | ||
| // Check if ANY of the RHS values is contained in the LHS array | ||
| String orConditions = | ||
| StreamSupport.stream(parsedRhs.spliterator(), false) | ||
| .map( | ||
| value -> { | ||
| paramsBuilder.addObjectParam(value); | ||
| return String.format("%s @> jsonb_build_array(?%s)", parsedLhs, typeCast); | ||
| }) | ||
| .collect(Collectors.joining(" OR ")); | ||
|
|
||
| // Wrap in parentheses if multiple conditions | ||
| return StreamSupport.stream(parsedRhs.spliterator(), false).count() > 1 | ||
| ? String.format("(%s)", orConditions) | ||
| : orConditions; | ||
| } | ||
|
|
||
| /** | ||
| * Returns the PostgreSQL type cast string for jsonb_build_array elements based on array type. | ||
| * | ||
| * @param fieldType The JSON field type (must not be null) | ||
| * @return Type cast string (e.g., "::text", "::numeric") | ||
| */ | ||
| private String getTypeCastForArray(JsonFieldType fieldType) { | ||
| switch (fieldType) { | ||
| case STRING_ARRAY: | ||
| return "::text"; | ||
| case NUMBER_ARRAY: | ||
| return "::numeric"; | ||
| case BOOLEAN_ARRAY: | ||
| return "::boolean"; | ||
| case OBJECT_ARRAY: | ||
| return "::jsonb"; | ||
| default: | ||
| throw new IllegalArgumentException( | ||
| "Unsupported array type: " + fieldType + ". Expected *_ARRAY types."); | ||
| } | ||
| } | ||
| } |
84 changes: 84 additions & 0 deletions
84
...tstore/postgres/query/v1/parser/filter/PostgresInRelationalFilterParserJsonPrimitive.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| package org.hypertrace.core.documentstore.postgres.query.v1.parser.filter; | ||
|
|
||
| import java.util.stream.Collectors; | ||
| import java.util.stream.StreamSupport; | ||
| import org.hypertrace.core.documentstore.expression.impl.JsonFieldType; | ||
| import org.hypertrace.core.documentstore.expression.impl.JsonIdentifierExpression; | ||
| import org.hypertrace.core.documentstore.expression.impl.RelationalExpression; | ||
| import org.hypertrace.core.documentstore.postgres.Params; | ||
|
|
||
| /** | ||
| * Optimized parser for IN operations on JSON primitive fields (string, number, boolean) with proper | ||
| * type casting. | ||
| * | ||
| * <p>Generates efficient SQL using {@code ->>} operator with appropriate PostgreSQL casting: | ||
| * | ||
| * <ul> | ||
| * <li><b>STRING:</b> {@code "document" ->> 'item' IN ('Soap', 'Shampoo')} | ||
| * <li><b>NUMBER:</b> {@code CAST("document" ->> 'price' AS NUMERIC) IN (10, 20)} | ||
| * <li><b>BOOLEAN:</b> {@code CAST("document" ->> 'active' AS BOOLEAN) IN (true, false)} | ||
| * </ul> | ||
| * | ||
| * <p>This is much more efficient than the defensive approach that checks both array and scalar | ||
| * types, and ensures correct type comparisons. | ||
| */ | ||
| public class PostgresInRelationalFilterParserJsonPrimitive | ||
| implements PostgresInRelationalFilterParserInterface { | ||
|
|
||
| @Override | ||
| public String parse( | ||
| final RelationalExpression expression, final PostgresRelationalFilterContext context) { | ||
| String parsedLhs = expression.getLhs().accept(context.lhsParser()); | ||
| final Iterable<Object> parsedRhs = expression.getRhs().accept(context.rhsParser()); | ||
|
|
||
| // Extract field type for proper casting (guaranteed to be present by selector) | ||
| JsonIdentifierExpression jsonExpr = (JsonIdentifierExpression) expression.getLhs(); | ||
| JsonFieldType fieldType = | ||
| jsonExpr | ||
| .getFieldType() | ||
| .orElseThrow( | ||
| () -> | ||
| new IllegalStateException( | ||
| "JsonFieldType must be present - this should have been caught by the selector")); | ||
|
|
||
| // For JSON primitives, we need ->> (text extraction) instead of -> (jsonb extraction) | ||
| // The LHS parser generates: "props"->'brand' (returns JSONB) | ||
| // We need: "props"->>'brand' (returns TEXT) | ||
| // Replace the last -> with ->> for primitive type extraction | ||
| int lastArrowIndex = parsedLhs.lastIndexOf("->"); | ||
| if (lastArrowIndex != -1) { | ||
| parsedLhs = | ||
| parsedLhs.substring(0, lastArrowIndex) + "->>" + parsedLhs.substring(lastArrowIndex + 2); | ||
| } | ||
|
|
||
| return prepareFilterStringForInOperator( | ||
| parsedLhs, parsedRhs, fieldType, context.getParamsBuilder()); | ||
| } | ||
|
|
||
| private String prepareFilterStringForInOperator( | ||
| final String parsedLhs, | ||
| final Iterable<Object> parsedRhs, | ||
| final JsonFieldType fieldType, | ||
| final Params.Builder paramsBuilder) { | ||
|
|
||
| String placeholders = | ||
| StreamSupport.stream(parsedRhs.spliterator(), false) | ||
| .map( | ||
| value -> { | ||
| paramsBuilder.addObjectParam(value); | ||
| return "?"; | ||
| }) | ||
| .collect(Collectors.joining(", ")); | ||
|
|
||
| // Apply appropriate casting based on field type | ||
| String lhsWithCast = parsedLhs; | ||
| if (fieldType == JsonFieldType.NUMBER) { | ||
| lhsWithCast = String.format("CAST(%s AS NUMERIC)", parsedLhs); | ||
| } else if (fieldType == JsonFieldType.BOOLEAN) { | ||
| lhsWithCast = String.format("CAST(%s AS BOOLEAN)", parsedLhs); | ||
| } | ||
| // STRING or null fieldType: no casting needed | ||
|
|
||
| return String.format("%s IN (%s)", lhsWithCast, placeholders); | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is tweaking the parser to not do this a bigger lift?