Skip to content

Commit 551069e

Browse files
[Doc] Update structured output doc with upstream link (#4015)
### What this PR does / why we need it? Currently, the usage of structured output feature in vllm-ascend is totally the same as that in vllm. Thus, IMO, it's better to remove this doc directly to avoid some case that there are some changes in the upstream doc and we don't update our doc in time, which can be misleading to users. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: shen-shanshan <[email protected]>
1 parent 06a6693 commit 551069e

File tree

1 file changed

+3
-149
lines changed

1 file changed

+3
-149
lines changed

docs/source/user_guide/feature_guide/structured_output.md

Lines changed: 3 additions & 149 deletions
Original file line numberDiff line numberDiff line change
@@ -10,154 +10,8 @@ In simple terms, structured decoding gives LLMs a "template" to follow. Users pr
1010

1111
![structured decoding](./images/structured_output_1.png)
1212

13-
### Structured output in vllm-ascend
13+
## Usage in vllm-ascend
1414

15-
Currently, vllm-ascend supports **xgrammar** and **guidance** backends for structured output with vllm v1 engine.
15+
Currently, the usage of structured output feature in vllm-ascend is totally the same as that in vllm.
1616

17-
XGrammar introduces a new technique that batch constrained decoding through pushdown automaton (PDA). You can think of a PDA as a "collection of FSMs, and each FSM represents a context-free grammar (CFG)." One significant advantage of PDA is its recursive nature, allowing us to execute multiple state transitions. They also include additional optimizations (for those who are interested) to reduce grammar compilation overhead. Besides, you can also find more details about guidance by yourself.
18-
19-
## How to use structured output?
20-
21-
### Online inference
22-
23-
You can also generate structured outputs using the Completions and Chat API of OpenAI. The following parameters are supported, which must be added as extra parameters:
24-
25-
- `guided_choice`: the output will be exactly one of the choices.
26-
- `guided_regex`: the output will follow the regex pattern.
27-
- `guided_json`: the output will follow the JSON schema.
28-
- `guided_grammar`: the output will follow the context free grammar.
29-
30-
Structured outputs are supported by default in an OpenAI-Compatible Server. You can choose to specify the backend by setting the `--guided-decoding-backend` flag to vLLM serve. The default backend is `auto`, which will try to choose an appropriate backend based on the details of the request. You may also choose a specific backend, along with some options.
31-
32-
The following are examples for each of the cases, starting with the guided_choice, as it's the easiest one:
33-
34-
```python
35-
from openai import OpenAI
36-
client = OpenAI(
37-
base_url="http://localhost:8000/v1",
38-
api_key="-",
39-
)
40-
41-
completion = client.chat.completions.create(
42-
model="Qwen/Qwen2.5-3B-Instruct",
43-
messages=[
44-
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
45-
],
46-
extra_body={"guided_choice": ["positive", "negative"]},
47-
)
48-
print(completion.choices[0].message.content)
49-
```
50-
51-
The next example shows how to use the guided_regex. The idea is to generate an email address, given a simple regex template:
52-
53-
```python
54-
completion = client.chat.completions.create(
55-
model="Qwen/Qwen2.5-3B-Instruct",
56-
messages=[
57-
{
58-
"role": "user",
59-
"content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com and new line. Example result: [email protected]\n",
60-
}
61-
],
62-
extra_body={"guided_regex": r"\w+@\w+\.com\n", "stop": ["\n"]},
63-
)
64-
print(completion.choices[0].message.content)
65-
```
66-
67-
One of the most relevant features in structured text generation is the option to generate a valid JSON with pre-defined fields and formats. To achieve this, we can use the guided_json parameter in two different ways:
68-
69-
- Using a JSON Schema.
70-
- Defining a Pydantic model and then extracting the JSON Schema from it.
71-
72-
The next example shows how to use the guided_json parameter with a Pydantic model:
73-
74-
```python
75-
from pydantic import BaseModel
76-
from enum import Enum
77-
78-
class CarType(str, Enum):
79-
sedan = "sedan"
80-
suv = "SUV"
81-
truck = "Truck"
82-
coupe = "Coupe"
83-
84-
class CarDescription(BaseModel):
85-
brand: str
86-
model: str
87-
car_type: CarType
88-
89-
json_schema = CarDescription.model_json_schema()
90-
91-
completion = client.chat.completions.create(
92-
model="Qwen/Qwen2.5-3B-Instruct",
93-
messages=[
94-
{
95-
"role": "user",
96-
"content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
97-
}
98-
],
99-
extra_body={"guided_json": json_schema},
100-
)
101-
print(completion.choices[0].message.content)
102-
```
103-
104-
Finally we have the guided_grammar option, which is probably the most difficult to use, but it´s really powerful. It allows us to define complete languages like SQL queries. It works by using a context free EBNF grammar. As an example, we can define a specific format of simplified SQL queries:
105-
106-
```python
107-
simplified_sql_grammar = """
108-
root ::= select_statement
109-
110-
select_statement ::= "SELECT " column " from " table " where " condition
111-
112-
column ::= "col_1 " | "col_2 "
113-
114-
table ::= "table_1 " | "table_2 "
115-
116-
condition ::= column "= " number
117-
118-
number ::= "1 " | "2 "
119-
"""
120-
121-
completion = client.chat.completions.create(
122-
model="Qwen/Qwen2.5-3B-Instruct",
123-
messages=[
124-
{
125-
"role": "user",
126-
"content": "Generate an SQL query to show the 'username' and 'email' from the 'users' table.",
127-
}
128-
],
129-
extra_body={"guided_grammar": simplified_sql_grammar},
130-
)
131-
print(completion.choices[0].message.content)
132-
```
133-
134-
Find more examples [here](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/structured_outputs.py).
135-
136-
### Offline inference
137-
138-
To use structured output, we need to configure the guided decoding using the class `GuidedDecodingParams` inside `SamplingParams`. The main available options inside `GuidedDecodingParams` are:
139-
140-
- json
141-
- regex
142-
- choice
143-
- grammar
144-
145-
One example for using the choice parameter is shown below:
146-
147-
```python
148-
from vllm import LLM, SamplingParams
149-
from vllm.sampling_params import GuidedDecodingParams
150-
151-
llm = LLM(model="Qwen/Qwen2.5-7B-Instruct",
152-
guided_decoding_backend="xgrammar")
153-
154-
guided_decoding_params = GuidedDecodingParams(choice=["Positive", "Negative"])
155-
sampling_params = SamplingParams(guided_decoding=guided_decoding_params)
156-
outputs = llm.generate(
157-
prompts="Classify this sentiment: vLLM is wonderful!",
158-
sampling_params=sampling_params,
159-
)
160-
print(outputs[0].outputs[0].text)
161-
```
162-
163-
Find more examples of other usages [here](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/structured_outputs.py).
17+
Find more examples and explanations about these usages in [vLLM official document](https://docs.vllm.ai/en/stable/features/structured_outputs/).

0 commit comments

Comments
 (0)