Skip to content

Conversation

@christinaexyou
Copy link

@christinaexyou christinaexyou commented Aug 18, 2025

Description

This PR modifies the existing /v1/chat/completions and adds a /v1/models endpoint to be OpenAI compatible.

Changes:

  • nemoguardrails/server/api.py - Introduces new classes that correspond to OpenAI's standard chat completion and models API
  • tests/test_api.py, tests/test_server_calls_with_state.py & tests/test_threads - Updates the message content path according to new ResponseBody class and adds new tests for the /v1/models API

Related Issue(s)

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@christinaexyou
Copy link
Author

Updated nemoguardrails/colang/v2_x/runtime/runtime.py and tests/test_server_calls_with_state.py because tests were failing due to Colang 2.x not supporting assistant messages as input. Fixed by adding logic to convert State object to dict and passing in state instead of assistant cc: @Pouyanpi @tgasser-nv

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 81.42857% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.70%. Comparing base (eb29437) to head (f494cf2).

Files with missing lines Patch % Lines
nemoguardrails/server/api.py 84.84% 10 Missing ⚠️
nemoguardrails/colang/v2_x/runtime/runtime.py 25.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1340      +/-   ##
===========================================
+ Coverage    71.66%   71.70%   +0.04%     
===========================================
  Files          171      171              
  Lines        17015    17071      +56     
===========================================
+ Hits         12193    12240      +47     
- Misses        4822     4831       +9     
Flag Coverage Δ
python 71.70% <81.42%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
nemoguardrails/colang/v2_x/runtime/runtime.py 73.84% <25.00%> (-0.41%) ⬇️
nemoguardrails/server/api.py 64.58% <84.84%> (+5.00%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pouyanpi Pouyanpi added the enhancement New feature or request label Oct 14, 2025
@RobGeada
Copy link
Contributor

please don't merge- we want to refactor this a little more

@christinaexyou christinaexyou marked this pull request as draft October 16, 2025 14:04
default=None,
description="A state object that should be used to continue the interaction.",
)
# Standard OpenAI completion parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's bring the OpenAI schema into a separate file- perhaps server/schemes/openai

default=None,
description="Top-p sampling parameter.",
)
stop: Optional[str] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop needs to be Optional[Union[str, List[str]]]

index: Optional[int] = Field(
default=None, description="The index of the choice in the list of choices."
)
messages: Optional[dict] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: messages needs to be message (no s)

@christinaexyou christinaexyou force-pushed the openai-server-compat branch 3 times, most recently from 2ec47ee to 96f759f Compare October 24, 2025 14:50
@christinaexyou christinaexyou marked this pull request as ready for review November 4, 2025 18:44
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds OpenAI API compatibility to NeMo Guardrails by introducing a new /v1/models endpoint and modifying /v1/chat/completions to return OpenAI-compatible response structures.

Key Changes:

  • Created nemoguardrails/server/schemas/openai.py with Pydantic models for OpenAI-compatible requests and responses
  • Modified RequestBody to inherit from OpenAIRequestFields, mapping model parameter to config_id for backward compatibility
  • Updated ResponseBody to include OpenAI standard fields (id, object, created, choices) while maintaining NeMo-specific extensions (state, llm_output, log)
  • Added /v1/models endpoint that lists available guardrails configurations in OpenAI model format
  • Implemented Colang 2.x state serialization support in runtime.py
  • Updated all tests to verify new response structure

Issues Found:

  • Critical: OpenAI parameters (max_tokens, temperature, etc.) are only applied when thread_id is present due to incorrect indentation (nemoguardrails/server/api.py:460-472)
  • Syntax Error: Test assertion uses wrong response path in tests/test_threads.py:143

Confidence Score: 2/5

  • This PR should NOT be merged until the critical indentation bug is fixed - OpenAI parameters won't work without thread_id
  • Score reflects a critical logic error that breaks OpenAI parameter functionality for non-threaded requests, plus a test syntax error that will cause test failures
  • Pay immediate attention to nemoguardrails/server/api.py (lines 460-472 need unindenting) and tests/test_threads.py (line 143 needs correction)

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/server/schemas/openai.py 5/5 New file defining OpenAI-compatible schemas. Well-structured with proper Pydantic models for requests and responses.
nemoguardrails/server/api.py 2/5 Adds /v1/models endpoint and modifies /v1/chat/completions for OpenAI compatibility. Critical bug: OpenAI parameters only applied when using thread_id due to incorrect indentation.
tests/test_threads.py 3/5 Updates thread tests for new response format. Contains syntax error on line 143 with incorrect response path.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as /v1/chat/completions
    participant RequestBody
    participant DataStore
    participant LLMRails
    participant ResponseBody

    Client->>API: POST with model/config_id + messages
    API->>RequestBody: Validate request
    RequestBody->>RequestBody: Map model → config_id
    RequestBody->>RequestBody: Apply OpenAI parameters
    
    alt thread_id provided
        API->>DataStore: Fetch thread messages
        DataStore-->>API: Previous messages
        API->>API: Prepend thread history
    end
    
    alt streaming enabled
        API->>LLMRails: generate_async (streaming)
        LLMRails-->>Client: StreamingResponse
    else non-streaming
        API->>LLMRails: generate_async
        LLMRails-->>API: GenerationResponse
        
        alt thread_id provided
            API->>DataStore: Save updated thread
        end
        
        API->>ResponseBody: Build OpenAI response
        ResponseBody->>ResponseBody: Add choices, message, id, created
        ResponseBody->>ResponseBody: Include NeMo extensions (state, log)
        ResponseBody-->>Client: JSON response
    end
Loading

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 460 to 472
generation_options = body.options
if body.max_tokens:
generation_options.max_tokens = body.max_tokens
if body.temperature is not None:
generation_options.temperature = body.temperature
if body.top_p is not None:
generation_options.top_p = body.top_p
if body.stop:
generation_options.stop = body.stop
if body.presence_penalty is not None:
generation_options.presence_penalty = body.presence_penalty
if body.frequency_penalty is not None:
generation_options.frequency_penalty = body.frequency_penalty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: OpenAI parameters only applied when using thread_id - these lines are incorrectly indented inside the if body.thread_id: block (starts at line 428). Move this block outside and before the streaming check to ensure parameters work for all requests.

Suggested change
generation_options = body.options
if body.max_tokens:
generation_options.max_tokens = body.max_tokens
if body.temperature is not None:
generation_options.temperature = body.temperature
if body.top_p is not None:
generation_options.top_p = body.top_p
if body.stop:
generation_options.stop = body.stop
if body.presence_penalty is not None:
generation_options.presence_penalty = body.presence_penalty
if body.frequency_penalty is not None:
generation_options.frequency_penalty = body.frequency_penalty
generation_options = body.options
if body.max_tokens:
generation_options.max_tokens = body.max_tokens
if body.temperature is not None:
generation_options.temperature = body.temperature
if body.top_p is not None:
generation_options.top_p = body.top_p
if body.stop:
generation_options.stop = body.stop
if body.presence_penalty is not None:
generation_options.presence_penalty = body.presence_penalty
if body.frequency_penalty is not None:
generation_options.frequency_penalty = body.frequency_penalty
if (

)
res = response.json()
assert res["messages"][0]["content"] == "Hello again!"
assert res["choices"]["messages"][0]["content"] == "Hello again!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Incorrect response path - should be res["choices"][0]["message"]["content"] to match OpenAI format

Suggested change
assert res["choices"]["messages"][0]["content"] == "Hello again!"
assert res["choices"][0]["message"]["content"] == "Hello again!"

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds OpenAI API compatibility by introducing a new /v1/models endpoint and modifying /v1/chat/completions to return OpenAI-compatible response structures. The changes include new schema definitions, test updates, and enhanced state handling for Colang 2.x.

Critical Issues Found:

  • Import/syntax errors: Old ResponseBody class not removed, causing conflicts with the new OpenAI-compatible version and referencing missing imports (BaseModel, GenerationLog)
  • Logic bugs: OpenAI parameters (max_tokens, temperature, etc.) are set but never used - both streaming and non-streaming paths pass body.options instead of the modified generation_options
  • Test error: Incorrect array indexing in test_threads.py line 143

Major Changes:

  • Added /v1/models endpoint that lists available guardrails configurations in OpenAI format
  • Restructured response format to include choices array with OpenAI-compatible structure while maintaining backward compatibility with NeMo-Guardrails fields (state, llm_output, etc.)
  • Enhanced Colang 2.x runtime to deserialize state from API calls
  • Updated all tests to use new response structure

Confidence Score: 1/5

  • This PR has critical syntax errors that will prevent the module from loading and logic bugs that break core functionality
  • Score reflects three critical issues: (1) syntax errors from conflicting ResponseBody classes and missing imports will cause immediate module load failures, (2) logic bug where OpenAI parameters are set but never used means the API won't work as intended, (3) test syntax error. These must be fixed before merge.
  • Pay immediate attention to nemoguardrails/server/api.py (has syntax and logic errors) and tests/test_threads.py (has syntax error)

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/server/api.py 1/5 Critical syntax errors (old ResponseBody class conflicts with import) and logic bugs (OpenAI parameters not applied to generation calls)
nemoguardrails/server/schemas/openai.py 5/5 New OpenAI schema definitions - well structured with proper field descriptions and backward compatibility fields
tests/test_threads.py 2/5 Updated to new response format but has incorrect array indexing on line 143

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as FastAPI Server
    participant Rails as LLMRails
    participant Datastore
    participant LLM

    Client->>API: POST /v1/chat/completions
    Note over API: RequestBody with model/config_id mapping
    API->>API: Validate thread_id (if provided)
    
    alt Thread ID provided
        API->>Datastore: get("thread-{thread_id}")
        Datastore-->>API: Previous messages
        API->>API: Prepend thread messages
    end
    
    API->>API: Set generation_options from OpenAI params
    Note over API: max_tokens, temperature, top_p, etc.
    
    alt Streaming mode
        API->>Rails: generate_async(messages, streaming_handler, options)
        Rails->>LLM: Generate with streaming
        LLM-->>API: Stream tokens
        API-->>Client: StreamingResponse
    else Non-streaming mode
        API->>Rails: generate_async(messages, options, state)
        Rails->>LLM: Generate completion
        LLM-->>Rails: Response
        Rails-->>API: GenerationResponse
        
        alt Thread ID provided
            API->>Datastore: set("thread-{thread_id}", updated messages)
        end
        
        API->>API: Build OpenAI-compatible ResponseBody
        Note over API: Add choices array with message
        API-->>Client: ResponseBody (OpenAI format + extensions)
    end
    
    Client->>API: GET /v1/models
    API->>API: List config directories
    API->>API: Convert to OpenAI Model format
    API-->>Client: ModelsResponse
Loading

Additional Comments (3)

  1. nemoguardrails/server/api.py, line 263-281 (link)

    syntax: Old ResponseBody class should be removed - it conflicts with the new OpenAI-compatible ResponseBody imported from openai.py (line 43). This class also references BaseModel and GenerationLog which are no longer imported, causing NameErrors when the module loads.

  2. nemoguardrails/server/api.py, line 538 (link)

    logic: OpenAI parameters not applied in streaming mode - should use generation_options instead of body.options

  3. nemoguardrails/server/api.py, line 546 (link)

    logic: OpenAI parameters not applied - should use generation_options instead of body.options

6 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

)
res = response.json()
assert res["messages"][0]["content"] == "Hello again!"
assert res["choices"]["message"][0]["content"] == "Hello again!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Wrong array index placement - choices is an array, not a dict

Suggested change
assert res["choices"]["message"][0]["content"] == "Hello again!"
assert res["choices"][0]["message"]["content"] == "Hello again!"

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR successfully refactors the NeMo Guardrails server to support OpenAI-compatible API endpoints.

Key Changes:

  • Adds new /v1/models endpoint that lists available guardrails configurations in OpenAI model format
  • Refactors /v1/chat/completions response to use OpenAI's choices array structure with Choice objects
  • Maps OpenAI request fields (model, max_tokens, temperature, top_p, stop, presence_penalty, frequency_penalty) to internal llm_params
  • Maintains backward compatibility by including NeMo-specific fields (state, llm_output, output_data, log) in responses
  • Enables Colang 2.x state deserialization from dict format for stateful conversations
  • Updates all tests to validate new response structure

Architecture:
The PR introduces a clean separation of concerns by creating nemoguardrails/server/schemas/openai.py with OpenAI-specific models, while extending RequestBody to inherit from OpenAIRequestFields. The model field is automatically mapped to config_id via a validator, and OpenAI parameters are converted to llm_params before passing to the LLM generation.

Confidence Score: 5/5

  • Safe to merge - well-structured refactoring with comprehensive test coverage and backward compatibility
  • The implementation is clean and follows OpenAI's API specification correctly. Previous indentation issues have been resolved. All tests are updated appropriately, and the changes maintain backward compatibility by keeping NeMo-specific fields in responses.
  • No files require special attention - all changes are well-implemented

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/server/schemas/openai.py 5/5 New file defining OpenAI-compatible schema classes (OpenAIRequestFields, Choice, ResponseBody, Model, ModelsResponse) - well-structured with clear field descriptions
nemoguardrails/server/api.py 4/5 Adds /v1/models endpoint and refactors /v1/chat/completions to use OpenAI-compatible response format, maps OpenAI parameters to llm_params
nemoguardrails/colang/v2_x/runtime/runtime.py 5/5 Adds state deserialization support for Colang 2.x from dict format using json_to_state - enables state continuity in API calls

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as FastAPI Server
    participant RequestBody
    participant RailsConfig
    participant LLMRails
    participant DataStore
    participant LLM as Language Model

    Client->>API: POST /v1/chat/completions
    API->>RequestBody: Validate request (OpenAI params)
    RequestBody->>RequestBody: Map model → config_id
    RequestBody->>RequestBody: Map OpenAI params → llm_params
    
    alt thread_id provided
        API->>DataStore: Fetch thread messages
        DataStore-->>API: Return thread history
        API->>API: Prepend thread messages
    end
    
    API->>RailsConfig: Load config by config_id
    RailsConfig-->>API: Return LLMRails instance
    
    alt streaming enabled
        API->>LLMRails: generate_async (streaming)
        LLMRails->>LLM: Stream tokens
        LLM-->>Client: Server-sent events
    else non-streaming
        API->>LLMRails: generate_async (options with llm_params)
        LLMRails->>LLM: Generate with parameters
        LLM-->>LLMRails: Response
        LLMRails-->>API: GenerationResponse
        
        alt thread_id provided
            API->>DataStore: Save updated thread messages
        end
        
        API->>API: Build ResponseBody (OpenAI format)
        API-->>Client: Return choices with message
    end
    
    Client->>API: GET /v1/models
    API->>RailsConfig: List available configs
    RailsConfig-->>API: config_ids
    API->>API: Convert to Model objects
    API-->>Client: ModelsResponse with data array
Loading

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Modified the /v1/chat/completions endpoint and added /v1/models to be OpenAI-compatible while maintaining backward compatibility with NeMo-Guardrails specific features.

Major changes:

  • Introduced new OpenAPI schema definitions (openai.py) with OpenAIRequestFields, ResponseBody, Choice, Model, and ModelsResponse classes
  • Updated RequestBody to inherit from OpenAIRequestFields, mapping model field to config_id for backward compatibility
  • Modified response format to match OpenAI spec with id, object, created, model, and choices array structure
  • Added /v1/models endpoint that lists available guardrails configurations as OpenAI-compatible models
  • OpenAI parameters (max_tokens, temperature, top_p, stop, presence_penalty, frequency_penalty) are now mapped to llm_params
  • Enhanced Colang 2.x runtime to deserialize state dicts from API calls
  • Updated all tests to work with the new response format
  • Error responses now return OpenAI-compatible structure with finish_reason="error"

Backward compatibility:

  • NeMo-Guardrails specific fields (state, llm_output, output_data, log) are preserved in responses
  • Existing config_id parameter continues to work alongside the new model parameter
  • The options parameter and all existing functionality remain unchanged

Confidence Score: 5/5

  • This PR is safe to merge with excellent test coverage and proper backward compatibility
  • The implementation is well-designed with comprehensive test coverage across all endpoints and response formats. The code properly maintains backward compatibility by preserving all existing NeMo-Guardrails fields while adding OpenAI compatibility. The OpenAI parameter mapping is correctly implemented outside the thread_id block (as noted in previous comments). State deserialization is properly handled with appropriate error checking. All tests have been updated to reflect the new response structure.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/server/schemas/openai.py 5/5 New file introducing OpenAI-compatible schema definitions for request/response formats, well-structured with proper field descriptions
nemoguardrails/server/api.py 4/5 Modified to support OpenAI compatibility: added /v1/models endpoint, updated response format to match OpenAI spec, and added parameter mapping
nemoguardrails/colang/v2_x/runtime/runtime.py 5/5 Added state deserialization from dict format to support API state continuation, properly handles serialized state from API calls

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as FastAPI Server
    participant RequestBody
    participant Rails as LLMRails
    participant DataStore
    participant Runtime as RuntimeV2_x

    Client->>API: POST /v1/chat/completions
    Note over Client,API: OpenAI-compatible request<br/>{model, messages, temperature, etc}
    
    API->>RequestBody: Validate & parse request
    Note over RequestBody: Map `model` → `config_id`<br/>Map OpenAI params → llm_params
    
    RequestBody->>API: body.options.llm_params populated
    
    alt thread_id provided
        API->>DataStore: get(thread-{thread_id})
        DataStore-->>API: Previous messages
        Note over API: Prepend thread messages
    end
    
    alt state provided
        API->>Runtime: Pass state dict
        Runtime->>Runtime: json_to_state(state["state"])
        Note over Runtime: Deserialize state for Colang 2.x
    end
    
    alt streaming enabled
        API->>Rails: generate_async(messages, options, state)
        Rails-->>API: StreamingResponse
        API-->>Client: Server-sent events stream
    else non-streaming
        API->>Rails: generate_async(messages, options, state)
        Rails-->>API: GenerationResponse
        Note over API: Build OpenAI-compatible response<br/>{id, object, created, model, choices}
        
        alt thread_id provided
            API->>DataStore: set(thread-{thread_id}, updated_messages)
        end
        
        API-->>Client: ResponseBody with choices array
    end
    
    Client->>API: GET /v1/models
    API->>API: List config directories
    Note over API: Convert configs → Model objects
    API-->>Client: ModelsResponse {data: [models]}
Loading

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 87.17949% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
nemoguardrails/server/api.py 81.57% 7 Missing ⚠️
nemoguardrails/colang/v2_x/runtime/runtime.py 25.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants