Skip to content

Commit 2497bbb

Browse files
authored
[Misc] Update pooling example (#5002)
### What this PR does / why we need it? Since the param `task` has been depprecated, we should use the latest unified standard parameters for pooling models, this should be more clear - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <[email protected]>
1 parent bb7b74c commit 2497bbb

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

docs/source/tutorials/Qwen3_embedding.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ export PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256
4040
### Online Inference
4141

4242
```bash
43-
vllm serve Qwen/Qwen3-Embedding-8B --task embed
43+
vllm serve Qwen/Qwen3-Embedding-8B --runner pooling
4444
```
4545

4646
Once your server is started, you can query the model with input prompts.
@@ -81,7 +81,7 @@ if __name__=="__main__":
8181
input_texts = queries + documents
8282

8383
model = LLM(model="Qwen/Qwen3-Embedding-8B",
84-
task="embed",
84+
runner="pooling",
8585
distributed_executor_backend="mp")
8686

8787
outputs = model.embed(input_texts)

examples/offline_embed.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ def main():
4444
]
4545
input_texts = queries + documents
4646

47-
model = LLM(model="Qwen/Qwen3-Embedding-0.6B", task="embed")
47+
model = LLM(model="Qwen/Qwen3-Embedding-0.6B", runner="pooling")
4848

4949
outputs = model.embed(input_texts)
5050
embeddings = torch.tensor([o.outputs.embedding for o in outputs])

0 commit comments

Comments
 (0)