Skip to content

Conversation

@ishaksebsib
Copy link
Contributor

@ishaksebsib ishaksebsib commented Oct 21, 2025

Description

Problem

  • Analyzer Issue: The type validator incorrectly grouped IS_IN with other boolean operations, expecting scalar arguments instead of arrays. This caused compilation errors for queries like nodes::WHERE(_::{id}::IS_IN(node_ids)) when node_ids is [ID].
  • Runtime Panic: after parsing, queries using IS_IN to filter by internal node IDs caused a panic: "Value is not an id" at value.rs:1591. This happened
    because check_property("id") returned Value::String (stringified UUID), but IS_IN's is_in method requires Value::Id for ID operations (via IntoPrimitive<ID>).
  • Root Cause: Type mismatch in both analysis (expecting scalars for arrays) and runtime (wrong Value variant for IDs).
  • Impact: IS_IN with arrays failed at compile-time or runtime, preventing valid queries from working.

What We Solved

  • Analyzer Fix: Separated IS_IN validation in traversal_validation.rs to expect Type::Array(Box<Type::Scalar(ft))) for arguments, ensuring array element compatibility.
  • Runtime Fix: Updated check_property("id") in filterable.rs to return Value::Id(ID::from(self.id)) instead of Value::String(...), aligning with IntoPrimitive<ID> expectations.
  • Result: IS_IN now correctly validates and executes with arrays, resolving panics while preserving ID string serialization in responses.

The fixes ensure type safety for array operations and proper handling of reserved properties.

Related Issues

None

Checklist when merging to main

  • No compiler warnings (if applicable)
  • Code is formatted with rustfmt
  • No useless or dead code (if applicable)
  • Code is easy to understand
  • Doc comments are used for all functions, enums, structs, and fields (where appropriate)
  • All tests pass
  • Performance has not regressed (assuming change was not to fix a bug)
  • Version number has been updated in helix-cli/Cargo.toml and helixdb/Cargo.toml

Additional Notes

None

Greptile Overview

Updated On: 2025-10-21 12:34:05 UTC

Summary

This PR fixes IS_IN operator to work correctly with array arguments by addressing type validation at compile-time and value handling at runtime.

Changes:

  • Analyzer (traversal_validation.rs): Separated IS_IN from scalar boolean operations, now correctly expects Type::Array(Box<Type::Scalar(ft))) and validates array element types match the field type
  • Runtime (filterable.rs, vector.rs): Changed check_property("id") for Node, Edge, and HVector to return Value::Id(ID::from(self.id)) instead of Value::String, ensuring compatibility with IntoPrimitive<ID> trait used by Value::is_in() method

Impact:
Enables queries like nodes::WHERE(_::{id}::IS_IN(node_ids)) to compile and execute without panics, resolving both the analyzer's incorrect type expectations and the runtime's value variant mismatch.

Important Files Changed

File Analysis

Filename Score Overview
helix-db/src/utils/filterable.rs 5/5 Changed check_property("id") to return Value::Id(ID::from(self.id)) for both Node and Edge, fixing runtime panics in IS_IN operations
helix-db/src/helix_engine/vector_core/vector.rs 5/5 Updated HVector's check_property("id") to return Value::Id(ID::from(self.id)), ensuring consistency with Node and Edge implementations
helix-db/src/helixc/analyzer/methods/traversal_validation.rs 5/5 Separated IS_IN validation from scalar boolean operations to correctly expect Type::Array arguments with scalar elements, fixing compile-time type checking

Sequence Diagram

sequenceDiagram
    participant Query as HQL Query
    participant Analyzer as traversal_validation
    participant Runtime as check_property
    participant IsIn as Value is_in
    participant IntoPrim as IntoPrimitive

    Query->>Analyzer: nodes WHERE id IS_IN node_ids
    
    alt Before Analyzer Fix
        Note over Analyzer: IS_IN grouped with scalar ops
        Analyzer->>Analyzer: Expect Type Scalar
        Analyzer-->>Query: Error Expected scalar got array
    end
    
    alt After Analyzer Fix
        Note over Analyzer: IS_IN separated from scalar ops
        Analyzer->>Analyzer: Check IS_IN arg type
        Analyzer->>Analyzer: Expect Type Array of Scalar
        Analyzer->>Analyzer: Validate array element type
        Analyzer-->>Query: Compilation succeeds
    end
    
    Query->>Runtime: Execute filter by ID
    Runtime->>Runtime: check_property id
    
    alt Before Runtime Fix
        Runtime-->>IsIn: Value String uuid_string
        IsIn->>IntoPrim: into_primitive ID
        IntoPrim-->>IsIn: panic Value is not an id
    end
    
    alt After Runtime Fix
        Runtime-->>IsIn: Value Id ID from self id
        IsIn->>IntoPrim: into_primitive ID
        IntoPrim-->>IsIn: Returns ID reference
        IsIn-->>Query: Returns filter result
    end
Loading

@ishaksebsib ishaksebsib marked this pull request as ready for review October 21, 2025 09:37
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. helix-db/src/utils/filterable.rs, line 230 (link)

    logic: Edge's id property returns Value::String (via Value::from(self.uuid())), while Node now returns Value::Id. This inconsistency will cause the same panic if IS_IN is used on edge IDs.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@ishaksebsib
Copy link
Contributor Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

xav-db added a commit that referenced this pull request Oct 29, 2025
…roperty handling

## Changes

### 1. Added Reserved Property Support
- Created `ReservedProp` enum for built-in properties (id, label, version, etc.)
- Added `Step::ReservedPropertyAccess` variant to generate direct field access
- Reserved properties now return correct Value types (e.g., Value::Id for id field)

### 2. Updated Property Access Generator
- Modified `gen_property_access()` to identify reserved vs user-defined properties
- Reserved properties use direct struct field access
- User-defined properties continue using `get_property()` HashMap lookup

### 3. Enhanced Boolean Operation Optimization
- Updated `WhereRef` Display implementation for reserved property optimization
- Generates `Value::Id(ID::from(val.id))` for id field in filters
- Maintains `get_property()` optimization for user-defined properties

### 4. Fixed IS_IN Validation
- Separated IS_IN from scalar boolean operations
- IS_IN now correctly expects `Type::Array` arguments with scalar elements
- Added `get_reserved_property_type()` helper for type validation
- Updated Node/Edge/Vector validation to check reserved properties first

### 5. Testing
- Added IS_IN test with both reserved (id) and user-defined (field) properties
- Verified correct code generation and type safety

## Impact
Resolves the issues from original PR #666:
- IS_IN with arrays now validates at compile-time
- Reserved property access returns correct Value types for IS_IN operations
- Queries like `WHERE(_::{id}::IS_IN(node_ids))` now compile and execute correctly

Co-Authored-By: ishaksebsib <[email protected]>
xav-db added a commit that referenced this pull request Oct 29, 2025
#674)

## Summary

This PR integrates the fixes from PR #666 into the current
`arena-implementation` branch. PR #666 originally fixed IS_IN validation
and runtime handling for reserved properties, but the current
implementation has diverged significantly from when that PR was created.

## Problem

**Analyzer Issue**: The type validator incorrectly grouped `IS_IN` with
other boolean operations, expecting scalar arguments instead of arrays.
This caused compilation errors for queries like
`nodes::WHERE(_::{id}::IS_IN(node_ids))` when `node_ids` is `[ID]`.

**Runtime Issue**: The current implementation removed `filterable.rs`
and its special handling for reserved properties (id, label, etc.).
Reserved properties are now struct fields, not in the properties
HashMap, so `get_property("id")` returns `None`.

**Root Cause**: 
1. IS_IN validation expects scalars instead of arrays
2. Reserved properties lost special handling during refactoring
3. Type mismatch between `Value::String` and `Value::Id` for ID
operations

## Solution

### 1. Added Reserved Property Support
- Created `ReservedProp` enum for built-in properties: Id, Label,
Version, FromNode, ToNode, Deleted, Level, Distance, Data
- Added `Step::ReservedPropertyAccess` variant to generate direct field
access
- Reserved properties now return correct Value types (e.g., `Value::Id`
for id field)

### 2. Updated Property Access Generator
- Modified `gen_property_access()` in `analyzer/utils.rs` to identify
reserved vs user-defined properties
- Reserved properties use direct struct field access:
`Value::Id(ID::from(val.id))`
- User-defined properties continue using `get_property()` HashMap lookup

### 3. Enhanced Boolean Operation Optimization
- Updated `WhereRef` Display implementation in `traversal_steps.rs` for
reserved properties
- Generates optimized filter code with correct Value types
- Example: `Value::Id(ID::from(val.id)).is_in(&data.node_ids)`

### 4. Fixed IS_IN Validation
- Separated IS_IN from scalar boolean operations in
`traversal_validation.rs`
- IS_IN now correctly expects `Type::Array(Box<Type::Scalar(ft)))`
arguments
- Added `get_reserved_property_type()` helper for type-safe validation
- Updated Node/Edge/Vector validation to check reserved properties first

## Files Changed

- `helix-db/src/helixc/generator/traversal_steps.rs` - Added
ReservedProp enum and Display logic
- `helix-db/src/helixc/analyzer/utils.rs` - Updated
gen_property_access()
- `helix-db/src/helixc/analyzer/methods/traversal_validation.rs` - Fixed
IS_IN validation and reserved property checking
- `hql-tests/tests/is_in/` - Added test cases for IS_IN with reserved
and user-defined properties

## Testing

Added comprehensive IS_IN tests:
```hql
QUERY GetNodes (fields: [String]) =>
    node <- N<MyNode>::WHERE(_::{field}::IS_IN(fields))
    RETURN node

QUERY GetNodesByID (node_ids: [ID]) =>
    node <- N<MyNode>::WHERE(_::{id}::IS_IN(node_ids))
    RETURN node
```

**Verified**:
- ✅ Queries compile successfully
- ✅ Correct code generation for reserved properties (direct field
access)
- ✅ Correct code generation for user-defined properties (get_property)
- ✅ Type safety maintained for array operations

## Generated Code Example

For reserved property `id`:
```rust
.filter_ref(|val, txn|{
    if let Ok(val) = val {
        Ok(Value::Id(ID::from(val.id)).is_in(&data.node_ids))
    } else {
        Ok(false)
    }
})
```

For user-defined property `field`:
```rust
.filter_ref(|val, txn|{
    if let Ok(val) = val {
        Ok(val
        .get_property("field")
        .map_or(false, |v| v.is_in(&data.fields)))
    } else {
        Ok(false)
    }
})
```

## Impact

- IS_IN operations now work correctly with arrays at compile-time and
runtime
- Reserved properties return proper Value types for IS_IN comparisons
- Queries like `WHERE(_::{id}::IS_IN(node_ids))` compile and execute
without panics
- No breaking changes to existing functionality

Co-Authored-By: ishaksebsib <[email protected]>
@xav-db xav-db mentioned this pull request Oct 29, 2025
8 tasks
xav-db added a commit that referenced this pull request Nov 7, 2025
## Description
<!-- Provide a brief description of the changes in this PR -->

## Related Issues
<!-- Link to any related issues using #issue_number -->

Closes #670 #666 #667 #672  #668 #661 #655 #654 #652 #436 

## Checklist when merging to main
<!-- Mark items with "x" when completed -->

- [ ] No compiler warnings (if applicable)
- [ ] Code is formatted with `rustfmt`
- [ ] No useless or dead code (if applicable)
- [ ] Code is easy to understand
- [ ] Doc comments are used for all functions, enums, structs, and
fields (where appropriate)
- [ ] All tests pass
- [ ] Performance has not regressed (assuming change was not to fix a
bug)
- [ ] Version number has been updated in `helix-cli/Cargo.toml` and
`helixdb/Cargo.toml`

## Additional Notes
<!-- Add any additional information that would be helpful for reviewers
-->

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

Updated On: 2025-11-07 00:19:04 UTC

<h3>Greptile Summary</h3>


This PR implements arena-based memory allocation for graph traversals
and refactors the worker pool's channel selection mechanism.

**Key Changes:**
- **Arena Implementation**: Introduced `'arena` lifetime parameter
throughout traversal operations (`in_e.rs`), replacing owned data with
arena-allocated references for improved memory efficiency
- **Worker Pool Refactor**: Replaced `flume::Selector` with a
parity-based `try_recv()`/`recv()` pattern to handle two channels
(`cont_rx` and `rx`) across multiple worker threads
- **Badge Addition**: Added Manta Graph badge to README

**Issues Found:**
- **Worker Pool Channel Handling**: The new parity-based approach
requires an even number of workers (≥2) and uses non-blocking
`try_recv()` followed by blocking `recv()` on alternating channels.
While this avoids a true busy-wait (since one `recv()` always blocks),
the asymmetry means channels are polled at different frequencies,
potentially causing channel starvation or unfair scheduling compared to
the previous `Selector::wait()` approach.

The arena implementation appears solid and follows Rust lifetime best
practices. The worker pool change seems to be addressing a specific
issue with core affinity (per commit `7437cf0f`), but the trade-off in
channel fairness should be monitored.

<details><summary><h3>Important Files Changed</h3></summary>



File Analysis



| Filename | Score | Overview |
|----------|-------|----------|
| README.md | 5/5 | Added Manta Graph badge to README - cosmetic
documentation change with no functional impact |
| helix-db/src/helix_engine/traversal_core/ops/in_/in_e.rs | 5/5 |
Refactored to use arena-based lifetimes ('arena) instead of owned data,
replacing separate InEdgesIterator struct with inline closures for
better memory management |
| helix-db/src/helix_gateway/worker_pool/mod.rs | 3/5 | Replaced flume
Selector with parity-based try_recv/recv pattern requiring even worker
count, but implementation has potential busy-wait issues that could
cause high CPU usage |

</details>


</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant Client
    participant WorkerPool
    participant Worker1 as Worker (parity=true)
    participant Worker2 as Worker (parity=false)
    participant Router
    participant Storage

    Client->>WorkerPool: process(request)
    WorkerPool->>WorkerPool: Send request to req_rx channel
    
    par Worker1 Loop (parity=true)
        loop Every iteration
            Worker1->>Worker1: try_recv(cont_rx) - non-blocking
            alt Continuation available
                Worker1->>Worker1: Execute continuation function
            else Empty
                Worker1->>Worker1: Skip (no busy wait here)
            end
            Worker1->>Worker1: recv(rx) - BLOCKS until request
            alt Request received
                Worker1->>Router: Route request to handler
                Router->>Storage: Execute graph operation
                Storage-->>Router: Return result
                Router-->>Worker1: Response
                Worker1->>WorkerPool: Send response via ret_chan
            end
        end
    end
    
    par Worker2 Loop (parity=false)
        loop Every iteration
            Worker2->>Worker2: try_recv(rx) - non-blocking
            alt Request available
                Worker2->>Router: Route request to handler
                Router->>Storage: Execute graph operation
                Storage-->>Router: Return result
                Router-->>Worker2: Response
                Worker2->>WorkerPool: Send response via ret_chan
            else Empty
                Worker2->>Worker2: Skip (no busy wait here)
            end
            Worker2->>Worker2: recv(cont_rx) - BLOCKS until continuation
            alt Continuation received
                Worker2->>Worker2: Execute continuation function
            end
        end
    end

    WorkerPool-->>Client: Response
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
@xav-db xav-db closed this Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants