Skip to content

Commit 25a47cd

Browse files
mzegladtrawins
andcommitted
Docs fixes (#2502)
Co-authored-by: Dariusz Trawinski <[email protected]>
1 parent 9e48bf0 commit 25a47cd

File tree

3 files changed

+20
-24
lines changed

3 files changed

+20
-24
lines changed

demos/continuous_batching/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Meta-Llama-3-8B-Instruct
9999

100100
The default configuration of the `LLMExecutor` should work in most cases but the parameters can be tunned inside the `node_options` section in the `graph.pbtxt` file.
101101
Note that the `models_path` parameter in the graph file can be an absolute path or relative to the `base_path` from `config.json`.
102-
Check the [LLM calculator documentation](./llm_calculator.md) to learn about configuration options.
102+
Check the [LLM calculator documentation](../../docs/llm/reference.md) to learn about configuration options.
103103

104104
> **Note:** The parameter `cache_size` in the graph represents KV cache size in GB. Reduce the value if you don't have enough RAM on the host.
105105

docs/llm/quickstart.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ node: {
5252
}
5353
}
5454
}
55-
' > TinyLlama-1.1B-Chat-v1.0/graph.pbtxt
55+
' >> TinyLlama-1.1B-Chat-v1.0/graph.pbtxt
5656
```
5757

5858
4. Create server `config.json` file:
@@ -67,7 +67,7 @@ echo '
6767
}
6868
]
6969
}
70-
' > config.json
70+
' >> config.json
7171
```
7272
5. Deploy:
7373

@@ -113,28 +113,28 @@ curl -s http://localhost:8000/v3/chat/completions \
113113
}'| jq .
114114
```
115115
```json
116+
{
116117
"choices": [
117118
{
118119
"finish_reason": "stop",
119120
"index": 0,
120121
"logprobs": null,
121122
"message": {
122-
"content": "OpenVINO is a software development kit (SDK) for machine learning (ML) and deep learning (DL) applications. It is developed",
123+
"content": "OpenVINO is a software toolkit developed by Intel that enables developers to accelerate the training and deployment of deep learning models on Intel hardware.",
123124
"role": "assistant"
124125
}
125126
}
126127
],
127-
"created": 1718401064,
128+
"created": 1718607923,
128129
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
129130
"object": "chat.completion"
130131
}
131-
132132
```
133133
**Note:** If you want to get the response chunks streamed back as they are generated change `stream` parameter in the request to `true`.
134134
135135
136-
## References:
137-
- [Efficient LLM Serving - reference](./reference.md)
138-
- [Chat Completions API](./model_server_rest_api_chat.md)
139-
- [Completions API](./model_server_rest_api_completions.md)
140-
- [Demo with Llama3 serving](./../demos/continuous_batching/)
136+
## References
137+
- [Efficient LLM Serving - reference](reference.md)
138+
- [Chat Completions API](../model_server_rest_api_chat.md)
139+
- [Completions API](../model_server_rest_api_completions.md)
140+
- [Demo with Llama3 serving](../../demos/continuous_batching/README.md)

docs/llm/reference.md

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -73,13 +73,7 @@ node: {
7373
}
7474
```
7575

76-
Above node configuration should be used as a template since user is not expected to change most of it's content. Fields that can be safely changed are:
77-
- `name`
78-
- `input_stream: "HTTP_REQUEST_PAYLOAD:input"` - in case you want to change input name
79-
- `output_stream: "HTTP_RESPONSE_PAYLOAD:output"` - in case you want to change input name
80-
- `node_options`
81-
82-
From this options only `node_options` really requires user attention as they specify LLM engine parameters. The rest of them can remain unchanged.
76+
Above node configuration should be used as a template since user is not expected to change most of it's content. Actually only `node_options` requires user attention as it specifies LLM engine parameters. The rest of the configuration can remain unchanged.
8377

8478
The calculator supports the following `node_options` for tuning the pipeline configuration:
8579
- `required string models_path` - location of the model directory (can be relative);
@@ -109,10 +103,12 @@ In node configuration we set `models_path` indicating location of the directory
109103
├── template.jinja
110104
```
111105

112-
Main model as well as tokenizer and detokenizer are loaded from `.xml` and `.bin` files and all of them are required. `tokenizer_config.json` and `template.jinja` are loaded to read information required for chat template processing. Chat template is used only on `/chat/completions` endpoint. Template is not applied for calls to `/completions`, so it doesn't have to exist, if you plan to work only with `/completions`.
106+
Main model as well as tokenizer and detokenizer are loaded from `.xml` and `.bin` files and all of them are required. `tokenizer_config.json` and `template.jinja` are loaded to read information required for chat template processing.
113107

114108
### Chat template
115109

110+
Chat template is used only on `/chat/completions` endpoint. Template is not applied for calls to `/completions`, so it doesn't have to exist, if you plan to work only with `/completions`.
111+
116112
Loading chat template proceeds as follows:
117113
1. If `tokenizer.jinja` is present, try to load template from it.
118114
2. If there is no `tokenizer.jinja` and `tokenizer_config.json` exists, try to read template from its `chat_template` field. If it's not present, use default template.
@@ -134,12 +130,12 @@ When default template is loaded, servable accepts `/chat/completions` calls when
134130

135131
As it's in preview, this feature has set of limitations:
136132

137-
- Limited support for [API parameters](./model_server_rest_api_chat.md#request),
133+
- Limited support for [API parameters](../model_server_rest_api_chat.md#request),
138134
- Only one node with LLM calculator can be deployed at once,
139135
- Metrics related to text generation - they are planned to be added later,
140136
- Improvements in stability and recovery mechanisms are also expected
141137

142-
## References:
143-
- [Chat Completions API](./model_server_rest_api_chat.md)
144-
- [Completions API](./model_server_rest_api_completions.md)
145-
- [Demo](./../demos/continuous_batching/)
138+
## References
139+
- [Chat Completions API](../model_server_rest_api_chat.md)
140+
- [Completions API](../model_server_rest_api_completions.md)
141+
- [Demo](../../demos/continuous_batching/README.md)

0 commit comments

Comments
 (0)