-
-
Notifications
You must be signed in to change notification settings - Fork 150
Fix(hql): Correct IS_IN validation and runtime handling for internal node IDs #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
helix-db/src/utils/filterable.rs, line 230 (link)logic: Edge's
idproperty returnsValue::String(viaValue::from(self.uuid())), while Node now returnsValue::Id. This inconsistency will cause the same panic ifIS_INis used on edge IDs.
2 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, no comments
…roperty handling ## Changes ### 1. Added Reserved Property Support - Created `ReservedProp` enum for built-in properties (id, label, version, etc.) - Added `Step::ReservedPropertyAccess` variant to generate direct field access - Reserved properties now return correct Value types (e.g., Value::Id for id field) ### 2. Updated Property Access Generator - Modified `gen_property_access()` to identify reserved vs user-defined properties - Reserved properties use direct struct field access - User-defined properties continue using `get_property()` HashMap lookup ### 3. Enhanced Boolean Operation Optimization - Updated `WhereRef` Display implementation for reserved property optimization - Generates `Value::Id(ID::from(val.id))` for id field in filters - Maintains `get_property()` optimization for user-defined properties ### 4. Fixed IS_IN Validation - Separated IS_IN from scalar boolean operations - IS_IN now correctly expects `Type::Array` arguments with scalar elements - Added `get_reserved_property_type()` helper for type validation - Updated Node/Edge/Vector validation to check reserved properties first ### 5. Testing - Added IS_IN test with both reserved (id) and user-defined (field) properties - Verified correct code generation and type safety ## Impact Resolves the issues from original PR #666: - IS_IN with arrays now validates at compile-time - Reserved property access returns correct Value types for IS_IN operations - Queries like `WHERE(_::{id}::IS_IN(node_ids))` now compile and execute correctly Co-Authored-By: ishaksebsib <[email protected]>
#674) ## Summary This PR integrates the fixes from PR #666 into the current `arena-implementation` branch. PR #666 originally fixed IS_IN validation and runtime handling for reserved properties, but the current implementation has diverged significantly from when that PR was created. ## Problem **Analyzer Issue**: The type validator incorrectly grouped `IS_IN` with other boolean operations, expecting scalar arguments instead of arrays. This caused compilation errors for queries like `nodes::WHERE(_::{id}::IS_IN(node_ids))` when `node_ids` is `[ID]`. **Runtime Issue**: The current implementation removed `filterable.rs` and its special handling for reserved properties (id, label, etc.). Reserved properties are now struct fields, not in the properties HashMap, so `get_property("id")` returns `None`. **Root Cause**: 1. IS_IN validation expects scalars instead of arrays 2. Reserved properties lost special handling during refactoring 3. Type mismatch between `Value::String` and `Value::Id` for ID operations ## Solution ### 1. Added Reserved Property Support - Created `ReservedProp` enum for built-in properties: Id, Label, Version, FromNode, ToNode, Deleted, Level, Distance, Data - Added `Step::ReservedPropertyAccess` variant to generate direct field access - Reserved properties now return correct Value types (e.g., `Value::Id` for id field) ### 2. Updated Property Access Generator - Modified `gen_property_access()` in `analyzer/utils.rs` to identify reserved vs user-defined properties - Reserved properties use direct struct field access: `Value::Id(ID::from(val.id))` - User-defined properties continue using `get_property()` HashMap lookup ### 3. Enhanced Boolean Operation Optimization - Updated `WhereRef` Display implementation in `traversal_steps.rs` for reserved properties - Generates optimized filter code with correct Value types - Example: `Value::Id(ID::from(val.id)).is_in(&data.node_ids)` ### 4. Fixed IS_IN Validation - Separated IS_IN from scalar boolean operations in `traversal_validation.rs` - IS_IN now correctly expects `Type::Array(Box<Type::Scalar(ft)))` arguments - Added `get_reserved_property_type()` helper for type-safe validation - Updated Node/Edge/Vector validation to check reserved properties first ## Files Changed - `helix-db/src/helixc/generator/traversal_steps.rs` - Added ReservedProp enum and Display logic - `helix-db/src/helixc/analyzer/utils.rs` - Updated gen_property_access() - `helix-db/src/helixc/analyzer/methods/traversal_validation.rs` - Fixed IS_IN validation and reserved property checking - `hql-tests/tests/is_in/` - Added test cases for IS_IN with reserved and user-defined properties ## Testing Added comprehensive IS_IN tests: ```hql QUERY GetNodes (fields: [String]) => node <- N<MyNode>::WHERE(_::{field}::IS_IN(fields)) RETURN node QUERY GetNodesByID (node_ids: [ID]) => node <- N<MyNode>::WHERE(_::{id}::IS_IN(node_ids)) RETURN node ``` **Verified**: - ✅ Queries compile successfully - ✅ Correct code generation for reserved properties (direct field access) - ✅ Correct code generation for user-defined properties (get_property) - ✅ Type safety maintained for array operations ## Generated Code Example For reserved property `id`: ```rust .filter_ref(|val, txn|{ if let Ok(val) = val { Ok(Value::Id(ID::from(val.id)).is_in(&data.node_ids)) } else { Ok(false) } }) ``` For user-defined property `field`: ```rust .filter_ref(|val, txn|{ if let Ok(val) = val { Ok(val .get_property("field") .map_or(false, |v| v.is_in(&data.fields))) } else { Ok(false) } }) ``` ## Impact - IS_IN operations now work correctly with arrays at compile-time and runtime - Reserved properties return proper Value types for IS_IN comparisons - Queries like `WHERE(_::{id}::IS_IN(node_ids))` compile and execute without panics - No breaking changes to existing functionality Co-Authored-By: ishaksebsib <[email protected]>
## Description <!-- Provide a brief description of the changes in this PR --> ## Related Issues <!-- Link to any related issues using #issue_number --> Closes #670 #666 #667 #672 #668 #661 #655 #654 #652 #436 ## Checklist when merging to main <!-- Mark items with "x" when completed --> - [ ] No compiler warnings (if applicable) - [ ] Code is formatted with `rustfmt` - [ ] No useless or dead code (if applicable) - [ ] Code is easy to understand - [ ] Doc comments are used for all functions, enums, structs, and fields (where appropriate) - [ ] All tests pass - [ ] Performance has not regressed (assuming change was not to fix a bug) - [ ] Version number has been updated in `helix-cli/Cargo.toml` and `helixdb/Cargo.toml` ## Additional Notes <!-- Add any additional information that would be helpful for reviewers --> <!-- greptile_comment --> <h2>Greptile Overview</h2> Updated On: 2025-11-07 00:19:04 UTC <h3>Greptile Summary</h3> This PR implements arena-based memory allocation for graph traversals and refactors the worker pool's channel selection mechanism. **Key Changes:** - **Arena Implementation**: Introduced `'arena` lifetime parameter throughout traversal operations (`in_e.rs`), replacing owned data with arena-allocated references for improved memory efficiency - **Worker Pool Refactor**: Replaced `flume::Selector` with a parity-based `try_recv()`/`recv()` pattern to handle two channels (`cont_rx` and `rx`) across multiple worker threads - **Badge Addition**: Added Manta Graph badge to README **Issues Found:** - **Worker Pool Channel Handling**: The new parity-based approach requires an even number of workers (≥2) and uses non-blocking `try_recv()` followed by blocking `recv()` on alternating channels. While this avoids a true busy-wait (since one `recv()` always blocks), the asymmetry means channels are polled at different frequencies, potentially causing channel starvation or unfair scheduling compared to the previous `Selector::wait()` approach. The arena implementation appears solid and follows Rust lifetime best practices. The worker pool change seems to be addressing a specific issue with core affinity (per commit `7437cf0f`), but the trade-off in channel fairness should be monitored. <details><summary><h3>Important Files Changed</h3></summary> File Analysis | Filename | Score | Overview | |----------|-------|----------| | README.md | 5/5 | Added Manta Graph badge to README - cosmetic documentation change with no functional impact | | helix-db/src/helix_engine/traversal_core/ops/in_/in_e.rs | 5/5 | Refactored to use arena-based lifetimes ('arena) instead of owned data, replacing separate InEdgesIterator struct with inline closures for better memory management | | helix-db/src/helix_gateway/worker_pool/mod.rs | 3/5 | Replaced flume Selector with parity-based try_recv/recv pattern requiring even worker count, but implementation has potential busy-wait issues that could cause high CPU usage | </details> </details> <details><summary><h3>Sequence Diagram</h3></summary> ```mermaid sequenceDiagram participant Client participant WorkerPool participant Worker1 as Worker (parity=true) participant Worker2 as Worker (parity=false) participant Router participant Storage Client->>WorkerPool: process(request) WorkerPool->>WorkerPool: Send request to req_rx channel par Worker1 Loop (parity=true) loop Every iteration Worker1->>Worker1: try_recv(cont_rx) - non-blocking alt Continuation available Worker1->>Worker1: Execute continuation function else Empty Worker1->>Worker1: Skip (no busy wait here) end Worker1->>Worker1: recv(rx) - BLOCKS until request alt Request received Worker1->>Router: Route request to handler Router->>Storage: Execute graph operation Storage-->>Router: Return result Router-->>Worker1: Response Worker1->>WorkerPool: Send response via ret_chan end end end par Worker2 Loop (parity=false) loop Every iteration Worker2->>Worker2: try_recv(rx) - non-blocking alt Request available Worker2->>Router: Route request to handler Router->>Storage: Execute graph operation Storage-->>Router: Return result Router-->>Worker2: Response Worker2->>WorkerPool: Send response via ret_chan else Empty Worker2->>Worker2: Skip (no busy wait here) end Worker2->>Worker2: recv(cont_rx) - BLOCKS until continuation alt Continuation received Worker2->>Worker2: Execute continuation function end end end WorkerPool-->>Client: Response ``` </details> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->
Description
Problem
IS_INwith other boolean operations, expecting scalar arguments instead of arrays. This caused compilation errors for queries likenodes::WHERE(_::{id}::IS_IN(node_ids))whennode_idsis[ID].IS_INto filter by internal node IDs caused a panic:"Value is not an id"atvalue.rs:1591. This happenedbecause
check_property("id")returnedValue::String(stringified UUID), butIS_IN'sis_inmethod requiresValue::IdforIDoperations (viaIntoPrimitive<ID>).Valuevariant for IDs).IS_INwith arrays failed at compile-time or runtime, preventing valid queries from working.What We Solved
IS_INvalidation intraversal_validation.rsto expectType::Array(Box<Type::Scalar(ft)))for arguments, ensuring array element compatibility.check_property("id")infilterable.rsto returnValue::Id(ID::from(self.id))instead ofValue::String(...), aligning withIntoPrimitive<ID>expectations.IS_INnow correctly validates and executes with arrays, resolving panics while preserving ID string serialization in responses.The fixes ensure type safety for array operations and proper handling of reserved properties.
Related Issues
None
Checklist when merging to main
rustfmthelix-cli/Cargo.tomlandhelixdb/Cargo.tomlAdditional Notes
None
Greptile Overview
Updated On: 2025-10-21 12:34:05 UTC
Summary
This PR fixes
IS_INoperator to work correctly with array arguments by addressing type validation at compile-time and value handling at runtime.Changes:
IS_INfrom scalar boolean operations, now correctly expectsType::Array(Box<Type::Scalar(ft)))and validates array element types match the field typecheck_property("id")for Node, Edge, and HVector to returnValue::Id(ID::from(self.id))instead ofValue::String, ensuring compatibility withIntoPrimitive<ID>trait used byValue::is_in()methodImpact:
Enables queries like
nodes::WHERE(_::{id}::IS_IN(node_ids))to compile and execute without panics, resolving both the analyzer's incorrect type expectations and the runtime's value variant mismatch.Important Files Changed
File Analysis
check_property("id")to returnValue::Id(ID::from(self.id))for both Node and Edge, fixing runtime panics inIS_INoperationscheck_property("id")to returnValue::Id(ID::from(self.id)), ensuring consistency with Node and Edge implementationsIS_INvalidation from scalar boolean operations to correctly expectType::Arrayarguments with scalar elements, fixing compile-time type checkingSequence Diagram
sequenceDiagram participant Query as HQL Query participant Analyzer as traversal_validation participant Runtime as check_property participant IsIn as Value is_in participant IntoPrim as IntoPrimitive Query->>Analyzer: nodes WHERE id IS_IN node_ids alt Before Analyzer Fix Note over Analyzer: IS_IN grouped with scalar ops Analyzer->>Analyzer: Expect Type Scalar Analyzer-->>Query: Error Expected scalar got array end alt After Analyzer Fix Note over Analyzer: IS_IN separated from scalar ops Analyzer->>Analyzer: Check IS_IN arg type Analyzer->>Analyzer: Expect Type Array of Scalar Analyzer->>Analyzer: Validate array element type Analyzer-->>Query: Compilation succeeds end Query->>Runtime: Execute filter by ID Runtime->>Runtime: check_property id alt Before Runtime Fix Runtime-->>IsIn: Value String uuid_string IsIn->>IntoPrim: into_primitive ID IntoPrim-->>IsIn: panic Value is not an id end alt After Runtime Fix Runtime-->>IsIn: Value Id ID from self id IsIn->>IntoPrim: into_primitive ID IntoPrim-->>IsIn: Returns ID reference IsIn-->>Query: Returns filter result end