Skip to content

Commit 11010d0

Browse files
authored
docs: edit 1483 (#1489)
1 parent 415f816 commit 11010d0

File tree

1 file changed

+82
-52
lines changed

1 file changed

+82
-52
lines changed

docs/user-guides/configuration-guide/llm-configuration.md

Lines changed: 82 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -59,85 +59,127 @@ For more details about the command and its usage, see the [CLI documentation](..
5959

6060
### Using LLMs with Reasoning Traces
6161

62-
```{warning}
63-
**Breaking Change in v0.18.0**: The `reasoning_config` field and its options (`remove_reasoning_traces`, `start_token`, `end_token`) have been removed. The `rails.output.apply_to_reasoning_traces` field has also been removed. Use output rails to guardrail reasoning traces instead.
62+
```{deprecated} 0.18.0
63+
The `reasoning_config` field and its options `remove_reasoning_traces`, `start_token`, and `end_token` are deprecated. The `rails.output.apply_to_reasoning_traces` field has also been deprecated. Instead, use output rails to guardrail reasoning traces, as introduced in this section.
6464
```
6565

66-
Reasoning-capable LLMs such as [DeepSeek-R1](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) and [NVIDIA Llama 3.1 Nemotron Ultra 253B V1](https://build.nvidia.com/nvidia/llama-3_1-nemotron-ultra-253b-v1) include reasoning traces in their responses, typically wrapped in tokens like `<think>` and `</think>`. NeMo Guardrails automatically extracts these traces and makes them available throughout your guardrails configuration via the `$bot_thinking` variable in Colang flows and `bot_thinking` in Python contexts.
66+
Reasoning-capable LLMs such as [DeepSeek-R1](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) and [NVIDIA Llama 3.1 Nemotron Ultra 253B V1](https://build.nvidia.com/nvidia/llama-3_1-nemotron-ultra-253b-v1) include reasoning traces in their responses, typically wrapped in tokens such as `<think>` and `</think>`.
67+
68+
The NeMo Guardrails toolkit automatically extracts these traces and makes them available to set up in your guardrails configuration through the following variables:
69+
70+
- In Colang flows, use the `$bot_thinking` variable.
71+
- In Python contexts, use the `bot_thinking` variable.
6772

6873
#### Guardrailing Reasoning Traces with Output Rails
6974

70-
The primary approach is to use output rails to inspect and control reasoning traces. This allows you to:
75+
Use output rails to inspect and control reasoning traces. This allows you to:
7176

72-
- Block responses based on problematic reasoning patterns
73-
- Enhance moderation decisions with reasoning context
74-
- Monitor and filter sensitive information in reasoning
77+
- Block responses based on problematic reasoning patterns.
78+
- Enhance moderation decisions with reasoning context.
79+
- Monitor and filter sensitive information in reasoning.
7580

76-
Here's a minimal example:
81+
##### Prepare Configuration Files
7782

78-
```yaml
79-
models:
80-
- type: main
81-
engine: nim
82-
model: nvidia/llama-3.1-nemotron-ultra-253b-v1
83-
- type: self_check_output
84-
model: <your_moderation_model>
85-
engine: <your_engine>
83+
The following configuration files show a minimal configuration for guardrailing reasoning traces with output rails.
8684

87-
rails:
88-
output:
89-
flows:
90-
- self check output
91-
```
85+
1. Configure output rails in `config.yml`:
9286

93-
**prompts.yml**:
87+
```yaml
88+
models:
89+
- type: main
90+
engine: nim
91+
model: nvidia/llama-3.1-nemotron-ultra-253b-v1
92+
- type: self_check_output
93+
model: <your_moderation_model>
94+
engine: <your_engine>
9495

95-
```yaml
96-
prompts:
97-
- task: self_check_output
98-
content: |
99-
Your task is to check if the bot message complies with company policy.
96+
rails:
97+
output:
98+
flows:
99+
- self check output
100+
```
100101
101-
Bot message: "{{ bot_response }}"
102+
1. Configure the prompt to access the reasoning traces in `prompts.yml`:
102103

103-
{% if bot_thinking %}
104-
Bot reasoning: "{{ bot_thinking }}"
105-
{% endif %}
104+
```yaml
105+
prompts:
106+
- task: self_check_output
107+
content: |
108+
Your task is to check if the bot message complies with company policy.
106109
107-
Should this be blocked (Yes or No)?
108-
Answer:
109-
```
110+
Bot message: "{{ bot_response }}"
111+
112+
{% if bot_thinking %}
113+
Bot reasoning: "{{ bot_thinking }}"
114+
{% endif %}
110115
111-
For more detailed examples of guardrailing reasoning traces, see [Guardrailing Bot Reasoning Content](../../advanced/bot-thinking-guardrails.md).
116+
Should this be blocked (Yes or No)?
117+
Answer:
118+
```
119+
120+
For more detailed examples of guardrailing reasoning traces, refer to [Guardrailing Bot Reasoning Content](../../advanced/bot-thinking-guardrails.md).
112121

113122
#### Accessing Reasoning Traces in API Responses
114123

115-
##### With GenerationOptions (Structured Access)
124+
There are two ways to access reasoning traces in API responses: with generation options and without generation options.
125+
126+
Read the option **With GenerationOptions** when you:
127+
128+
- Need structured access to reasoning and response separately.
129+
- Are building a new application.
130+
- Need access to other structured fields such as state, output_data, or llm_metadata.
116131

117-
When you pass `GenerationOptions` to the API, the function returns a `GenerationResponse` object with structured fields, including `reasoning_content` for accessing reasoning traces separately from the main response:
132+
Read the option **Without GenerationOptions** when you:
133+
134+
- Need backward compatibility with existing code.
135+
- Want the raw response with inline reasoning tags.
136+
- Are integrating with systems that expect tagged strings.
137+
138+
##### With GenerationOptions for Structured Access
139+
140+
When you pass `GenerationOptions` to the API, the function returns a `GenerationResponse` object with structured fields. This approach provides clean separation between the reasoning traces and the final response content, making it easier to process each component independently.
141+
142+
The `reasoning_content` field contains the extracted reasoning traces, while `response` contains the main LLM response. This structured access pattern is recommended for new applications as it provides type safety and clear access to all response metadata.
143+
144+
The following example demonstrates how to use `GenerationOptions` in an guardrails async generation call `rails.generate_async` to access reasoning traces.
118145

119146
```python
120147
from nemoguardrails import RailsConfig, LLMRails
121148
from nemoguardrails.rails.llm.options import GenerationOptions
122149
150+
# Load the guardrails configuration
123151
config = RailsConfig.from_path("./config")
124152
rails = LLMRails(config)
125153
154+
# Create a GenerationOptions object to enable structured responses
126155
options = GenerationOptions()
156+
157+
# Make an async call with GenerationOptions
127158
result = await rails.generate_async(
128159
messages=[{"role": "user", "content": "What is 2+2?"}],
129160
options=options
130161
)
131162
163+
# Access reasoning traces separately from the response
132164
if result.reasoning_content:
133165
print("Reasoning:", result.reasoning_content)
134166
167+
# Access the main response content
135168
print("Response:", result.response[0]["content"])
136169
```
137170

138-
##### Without GenerationOptions (Tagged String)
171+
The following example output shows the reasoning traces and the main response content from the guardrailed generation result.
172+
173+
```
174+
Reasoning: Let me calculate: 2 plus 2 equals 4.
175+
Response: The answer is 4.
176+
```
177+
178+
##### Without GenerationOptions for Tagged String
179+
180+
When calling without `GenerationOptions`, such as by using a dict or string response, reasoning is wrapped in `<think>` tags.
139181
140-
When calling without `GenerationOptions` (e.g., via dict/string response), reasoning is wrapped in `<think>` tags:
182+
The following example demonstrates how to access reasoning traces without using `GenerationOptions`.
141183
142184
```python
143185
response = rails.generate(
@@ -147,25 +189,13 @@ response = rails.generate(
147189
print(response["content"])
148190
```
149191

150-
Output:
192+
The response is wrapped in `<think>` tags as shown in the following example output.
151193

152194
```
153195
<think>Let me calculate: 2 plus 2 equals 4.</think>
154196
The answer is 4.
155197
```
156198

157-
**Which pattern should you use?**
158-
159-
Use **Pattern 1 (With GenerationOptions)** when:
160-
- You need structured access to reasoning and response separately
161-
- You're building a new application
162-
- You need access to other structured fields (state, output_data, llm_metadata, etc.)
163-
164-
Use **Pattern 2 (Without GenerationOptions)** when:
165-
- You need backward compatibility with existing code
166-
- You want the raw response with inline reasoning tags
167-
- You're integrating with systems that expect tagged strings
168-
169199
### NIM for LLMs
170200

171201
[NVIDIA NIM](https://docs.nvidia.com/nim/index.html) is a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across the cloud, data center, and workstations.

0 commit comments

Comments
 (0)