Skip to content

Commit e4d043e

Browse files
committed
[Doc] add qwen3 reranker
Signed-off-by: TingW09 <[email protected]>
1 parent 5658ff2 commit e4d043e

File tree

3 files changed

+9
-8
lines changed

3 files changed

+9
-8
lines changed

docs/source/tutorials/Qwen3_embedding.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Qwen3-Embedding-8B
1+
# Qwen3-Embedding
22

33
## Introduction
44
The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This guide describes how to run the model with vLLM Ascend. Note that only 0.9.2rc1 and higher versions of vLLM Ascend support the model.
@@ -91,8 +91,7 @@ Processed prompts: 100%|██████████████████
9191
[[0.7477798461914062, 0.07548339664936066], [0.0886271521449089, 0.6311039924621582]]
9292
```
9393

94-
## Accuracy Evaluation
95-
will be provided later...
96-
9794
## Performance
98-
will be provided later...
95+
```bash
96+
vllm bench serve --model Qwen3-embedding --backend openai-embeddings --dataset-name random --tokenizer /data/Qwen3-reembedding --host 127.0.0.1 --port 8888 --endpoint /v1/embeddings
97+
```

docs/source/tutorials/Qwen3_reranker.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,4 +162,6 @@ If you run this script successfully, you will see a list of scores printed to th
162162
```
163163

164164
## Performance
165-
will be provided later...
165+
```bash
166+
vllm bench serve --model Qwen3-reranker --backend vllm-rerank --dataset-name random-rerank --tokenizer /data/Qwen3-reranker --host 127.0.0.1 --port 8888 --endpoint /v1/rerank
167+
```

docs/source/user_guide/support_matrix/supported_models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160
4646

4747
| Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
4848
|-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
49-
| Qwen3-Embedding || [Qwen3-Embedding tutorials](../../tutorials/Qwen3-Embedding.md) |||||||||||||||||||
50-
| Qwen3-Reranker || [Qwen3-Reranker tutorials](../../tutorials/Qwen3-Reranker.md) |||||||||||||||||||
49+
| Qwen3-Embedding || |||||||||||||||||||
50+
| Qwen3-Reranker || |||||||||||||||||||
5151
| Molmo || [1942](https://github.com/vllm-project/vllm-ascend/issues/1942) |||||||||||||||||||
5252
| XLM-RoBERTa-based || |||||||||||||||||||
5353
| Bert || |||||||||||||||||||

0 commit comments

Comments
 (0)