[Doc] add qwen3 reranker

TingW09 · TingW09 · commit e4d043eb7daa · 2025-12-16T17:20:15.000+08:00
Signed-off-by: TingW09 &lt;944713709@qq.com&gt;
diff --git a/docs/source/tutorials/Qwen3_embedding.md b/docs/source/tutorials/Qwen3_embedding.md
@@ -1,4 +1,4 @@
-# Qwen3-Embedding-8B
+# Qwen3-Embedding
 
 ## Introduction
 The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This guide describes how to run the model with vLLM Ascend. Note that only 0.9.2rc1 and higher versions of vLLM Ascend support the model.
@@ -91,8 +91,7 @@ Processed prompts: 100%|██████████████████
 [[0.7477798461914062, 0.07548339664936066], [0.0886271521449089, 0.6311039924621582]]
 ```
 
-## Accuracy Evaluation
-will be provided later...
-
 ## Performance
-will be provided later...
+```bash
+vllm bench serve --model Qwen3-embedding --backend openai-embeddings --dataset-name random --tokenizer /data/Qwen3-reembedding --host 127.0.0.1 --port 8888 --endpoint /v1/embeddings
+```
diff --git a/docs/source/tutorials/Qwen3_reranker.md b/docs/source/tutorials/Qwen3_reranker.md
@@ -162,4 +162,6 @@ If you run this script successfully, you will see a list of scores printed to th
 ```
 
 ## Performance
-will be provided later...
+```bash
+vllm bench serve --model Qwen3-reranker --backend vllm-rerank --dataset-name random-rerank --tokenizer /data/Qwen3-reranker --host 127.0.0.1 --port 8888 --endpoint /v1/rerank
+```
diff --git a/docs/source/user_guide/support_matrix/supported_models.md b/docs/source/user_guide/support_matrix/supported_models.md
@@ -46,8 +46,8 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160
 
 | Model                         | Support   | Note                                                                 | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
 |-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
-| Qwen3-Embedding               | ✅        | [Qwen3-Embedding tutorials](../../tutorials/Qwen3-Embedding.md)      |||||||||||||||||||
-| Qwen3-Reranker                | ✅        | [Qwen3-Reranker tutorials](../../tutorials/Qwen3-Reranker.md)        |||||||||||||||||||
+| Qwen3-Embedding               | ✅        |                                                                      |||||||||||||||||||
+| Qwen3-Reranker                | ✅        |                                                                      |||||||||||||||||||
 | Molmo                         | ✅        | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942)      |||||||||||||||||||
 | XLM-RoBERTa-based             | ✅        |                                                                      |||||||||||||||||||
 | Bert                          | ✅        |                                                                      |||||||||||||||||||