apache · uchenily · Nov 3, 2025
diff --git a/docs/ai/vector-search.md b/docs/ai/vector-search.md
@@ -311,6 +311,7 @@ On 768-D Cohere-MEDIUM-1M and Cohere-LARGE-10M datasets, SQ8 reduces index size
 |---------|-----|----------------------|------------|-----------|------------|-------|
 | Cohere-MEDIUM-1M | 768D | Doris (FLAT) | 5.647 GB (2.533 + 3.114) | 2.533 GB | 3.114 GB | 1M vectors |
 | Cohere-MEDIUM-1M | 768D | Doris SQ INT8 | 3.501 GB (2.533 + 0.992) | 2.533 GB | 0.992 GB | INT8 symmetric quantization |
+| Cohere-MEDIUM-1M | 768D | Doris PQ(pq_m=384,pq_nbits=8)   | 3.149 GB (2.535 + 0.614) | 2.535 GB | 0.614 GB | product quantization |
 | Cohere-LARGE-10M | 768D | Doris (FLAT) | 56.472 GB (25.328 + 31.145) | 25.328 GB | 31.145 GB | 10M vectors |
 | Cohere-LARGE-10M | 768D | Doris SQ INT8 | 35.016 GB (25.329 + 9.687) | 25.329 GB | 9.687 GB | INT8 quantization |
 
@@ -319,7 +320,9 @@ Quantization introduces extra build-time overhead because each distance computat
 Similarly, Doris also supports product quantization, but note that when using PQ, additional parameters need to be provided:
 
 - `pq_m`: Indicates how many sub-vectors to split the original high-dimensional vector into (vector dimension dim must be divisible by pq_m).
-- `pq_nbits`: Indicates the number of bits for each sub-vector quantization, which determines the size of each subspace codebook (k = 2 ^ pq_nbits), in faiss pq_nbits is generally required to be no greater than 24.
+- `pq_nbits`: Indicates the number of bits for each sub-vector quantization, which determines the size of each subspace codebook, in faiss pq_nbits is generally required to be no greater than 24.
+
+Note that PQ quantization requires sufficient data during the training, the number of training points needing to be at least as large as the number of clusters (n >= 2 ^ pq_nbits).
 
 ```sql
 CREATE TABLE sift_1M (

diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search.md
@@ -287,6 +287,7 @@ PROPERTIES (
 |--------|----------|---------------|------------|----------|----------|------|
 | Cohere-MEDIUM-1M | 768D | Doris (FLAT)    | 5.647 GB (2.533 + 3.114) | 2.533 GB | 3.114 GB | 1M 向量，原始 + HNSW FLAT 索引 |
 | Cohere-MEDIUM-1M | 768D | Doris SQ INT8   | 3.501 GB (2.533 + 0.992) | 2.533 GB | 0.992 GB | INT8 对称量化 |
+| Cohere-MEDIUM-1M | 768D | Doris PQ(pq_m=384,pq_nbits=8)   | 3.149 GB (2.535 + 0.614) | 2.535 GB | 0.614 GB | 乘积量化 |
 | Cohere-LARGE-10M | 768D | Doris (FLAT)    | 56.472 GB (25.328 + 31.145) | 25.328 GB | 31.145 GB | 10M 向量 |
 | Cohere-LARGE-10M | 768D | Doris SQ INT8   | 35.016 GB (25.329 + 9.687) | 25.329 GB | 9.687 GB | INT8 量化，索引显著减小 |
 
@@ -295,7 +296,9 @@ PROPERTIES (
 类似的, Doris也支持乘积量化, 不过需要注意的是在使用PQ时需要提供额外的参数:
 
 - `pq_m`: 表示将原始的高维向量分割成多少个子向量(向量维度 dim 必须能被 pq_m 整除)。
-- `pq_nbits`: 表示每个子向量量化的比特数, 它决定了每个子空间码本的大小(k = 2 ^ pq_nbits), 在faiss中pq_nbits值一般要求不大于24。
+- `pq_nbits`: 表示每个子向量量化的比特数, 它决定了每个子空间码本的大小, 在faiss中pq_nbits值一般要求不大于24。
+
+特别需要注意的是, pq量化在训练阶段对训练的数据量有要求, 至少需要与每一个聚类中心数量一样多(即 训练点个数 n >= 2 ^ pq_nbits)。
 
 ```sql
 CREATE TABLE sift_1M (

diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search.md
@@ -287,6 +287,7 @@ PROPERTIES (
 |--------|----------|---------------|------------|----------|----------|------|
 | Cohere-MEDIUM-1M | 768D | Doris (FLAT)    | 5.647 GB (2.533 + 3.114) | 2.533 GB | 3.114 GB | 1M 向量，原始 + HNSW FLAT 索引 |
 | Cohere-MEDIUM-1M | 768D | Doris SQ INT8   | 3.501 GB (2.533 + 0.992) | 2.533 GB | 0.992 GB | INT8 对称量化 |
+| Cohere-MEDIUM-1M | 768D | Doris PQ(pq_m=384,pq_nbits=8)   | 3.149 GB (2.535 + 0.614) | 2.535 GB | 0.614 GB | 乘积量化 |
 | Cohere-LARGE-10M | 768D | Doris (FLAT)    | 56.472 GB (25.328 + 31.145) | 25.328 GB | 31.145 GB | 10M 向量 |
 | Cohere-LARGE-10M | 768D | Doris SQ INT8   | 35.016 GB (25.329 + 9.687) | 25.329 GB | 9.687 GB | INT8 量化，索引显著减小 |
 
@@ -295,7 +296,9 @@ PROPERTIES (
 类似的, Doris也支持乘积量化, 不过需要注意的是在使用PQ时需要提供额外的参数:
 
 - `pq_m`: 表示将原始的高维向量分割成多少个子向量(向量维度 dim 必须能被 pq_m 整除)。
-- `pq_nbits`: 表示每个子向量量化的比特数, 它决定了每个子空间码本的大小(k = 2 ^ pq_nbits), 在faiss中pq_nbits值一般要求不大于24。
+- `pq_nbits`: 表示每个子向量量化的比特数, 它决定了每个子空间码本的大小, 在faiss中pq_nbits值一般要求不大于24。
+
+特别需要注意的是, pq量化在训练阶段对训练的数据量有要求, 至少需要与每一个聚类中心数量一样多(即 训练点个数 n >= 2 ^ pq_nbits)。
 
 ```sql
 CREATE TABLE sift_1M (

diff --git a/versioned_docs/version-4.x/ai/vector-search.md b/versioned_docs/version-4.x/ai/vector-search.md
@@ -311,6 +311,7 @@ On 768-D Cohere-MEDIUM-1M and Cohere-LARGE-10M datasets, SQ8 reduces index size
 |---------|-----|----------------------|------------|-----------|------------|-------|
 | Cohere-MEDIUM-1M | 768D | Doris (FLAT) | 5.647 GB (2.533 + 3.114) | 2.533 GB | 3.114 GB | 1M vectors |
 | Cohere-MEDIUM-1M | 768D | Doris SQ INT8 | 3.501 GB (2.533 + 0.992) | 2.533 GB | 0.992 GB | INT8 symmetric quantization |
+| Cohere-MEDIUM-1M | 768D | Doris PQ(pq_m=384,pq_nbits=8)   | 3.149 GB (2.535 + 0.614) | 2.535 GB | 0.614 GB | product quantization |
 | Cohere-LARGE-10M | 768D | Doris (FLAT) | 56.472 GB (25.328 + 31.145) | 25.328 GB | 31.145 GB | 10M vectors |
 | Cohere-LARGE-10M | 768D | Doris SQ INT8 | 35.016 GB (25.329 + 9.687) | 25.329 GB | 9.687 GB | INT8 quantization |
 
@@ -319,7 +320,9 @@ Quantization introduces extra build-time overhead because each distance computat
 Similarly, Doris also supports product quantization, but note that when using PQ, additional parameters need to be provided:
 
 - `pq_m`: Indicates how many sub-vectors to split the original high-dimensional vector into (vector dimension dim must be divisible by pq_m).
-- `pq_nbits`: Indicates the number of bits for each sub-vector quantization, which determines the size of each subspace codebook (k = 2 ^ pq_nbits), in faiss pq_nbits is generally required to be no greater than 24.
+- `pq_nbits`: Indicates the number of bits for each sub-vector quantization, which determines the size of each subspace codebook, in faiss pq_nbits is generally required to be no greater than 24.
+
+Note that PQ quantization requires sufficient data during the training, the number of training points needing to be at least as large as the number of clusters (n >= 2 ^ pq_nbits).
 
 ```sql
 CREATE TABLE sift_1M (