Skip to content

Commit 97bf422

Browse files
finbarrtimbersBofeng BF1 Xue
authored andcommitted
Update FAQ on interleaving sliding windows support (vllm-project#29796)
Signed-off-by: Finbarr Timbers <[email protected]> Signed-off-by: Bofeng BF1 Xue <[email protected]>
1 parent 03cfc91 commit 97bf422

File tree

1 file changed

+0
-2
lines changed

1 file changed

+0
-2
lines changed

docs/contributing/model/basic.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,8 +113,6 @@ See [this page](registration.md) for instructions on how to register your new mo
113113

114114
### How to support models with interleaving sliding windows?
115115

116-
For models with interleaving sliding windows (e.g. `google/gemma-2-2b-it` and `mistralai/Ministral-8B-Instruct-2410`), the scheduler will treat the model as a full-attention model, i.e., kv-cache of all tokens will not be dropped. This is to make sure prefix caching works with these models. Sliding window only appears as a parameter to the attention kernel computation.
117-
118116
To support a model with interleaving sliding windows, we need to take care of the following details:
119117

120118
- Make sure the model's `config.json` contains `layer_types`.

0 commit comments

Comments
 (0)