Skip to content

Commit a83fc4d

Browse files
apsonawanehariharans29github-actions[bot]edgchen1toothache
authored
ORT 1.23.2 cherrypick 1 (#26368)
Adds the following commits to the release-1.23.2 branch for ORT 1.23.2: - [TensorRT] Fix DDS output bug during engine update - PR: #26272 - commit id: 00e85dd - Fix shape inference failure with in-memory external data - PR: #26263 - commit id: d955476 - [CUDA] replace 90a-virtual by 90-virtual for forward compatible - PR: #26230 - commit id: b58911f - [QNN-EP] Fix logic flow bug - PR: #26148 - commit id: b282379 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt - PR: #26103 - commit id: 7362518 - Update qMoE spec to support block quantization - PR: #25641 - commit id: 7a8ffa8 - [VitisAI] add new api to VitisAI to save graph as a string - PR: #25602 - commit id: 3361d72 - [[Build] Lock torch, onnxscript and onnx-ir versions to latest] - PR: #26315 - commit id: ea69c4d --------- Co-authored-by: Hariharan Seshadri <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Yateng Hong <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: quic-calvnguy <[email protected]> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com> Co-authored-by: yifei410 <[email protected]> Co-authored-by: yifei <[email protected]>
1 parent d9b2048 commit a83fc4d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+964
-105
lines changed

VERSION_NUMBER

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.23.1
1+
1.23.2

cmake/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ option(onnxruntime_USE_VSINPU "Build with VSINPU support" OFF)
9999
cmake_dependent_option(onnxruntime_USE_FLASH_ATTENTION "Build flash attention kernel for scaled dot product attention" ON "onnxruntime_USE_CUDA" OFF)
100100
option(onnxruntime_USE_LEAN_ATTENTION "Build lean attention kernel for scaled dot product attention" OFF)
101101
cmake_dependent_option(onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION "Build memory efficient attention kernel for scaled dot product attention" ON "onnxruntime_USE_CUDA" OFF)
102-
cmake_dependent_option(onnxruntime_USE_FPA_INTB_GEMM "Build FpA IntB gemm cuda kernels" ON "onnxruntime_USE_CUDA" OFF)
102+
option(onnxruntime_USE_FPA_INTB_GEMM "Build FpA IntB gemm cuda kernels" OFF)
103103

104104
option(onnxruntime_BUILD_FOR_NATIVE_MACHINE "Enable this option for turning on optimization specific to this machine" OFF)
105105
option(onnxruntime_USE_AVX "Use AVX instructions" OFF)

docs/ContribOperators.md

Lines changed: 36 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3121,13 +3121,13 @@ This version of the operator has been available since version 1 of the 'com.micr
31213121

31223122
<dl>
31233123
<dt><tt>input</tt> : T</dt>
3124-
<dd>2D input tensor with shape (num_rows, hidden_size) or 3D input tensor with shape (batch_size, sequence_length, hidden_size)</dd>
3124+
<dd>2D input tensor with shape (num_tokens, hidden_size) or 3D input tensor with shape (batch_size, sequence_length, hidden_size)</dd>
31253125
<dt><tt>router_probs</tt> : T</dt>
3126-
<dd>2D input tensor with shape (num_rows, num_experts)</dd>
3126+
<dd>2D input tensor with shape (num_tokens, num_experts)</dd>
31273127
<dt><tt>fc1_experts_weights</tt> : T</dt>
3128-
<dd>3D input tensor with shape (num_experts, inter_size, hidden_size), or (num_experts, 2 * inter_size, hidden_size) for swiglu</dd>
3128+
<dd>3D input tensor with shape (num_experts, fusion_size * inter_size, hidden_size), where fusion_size is 2 for fused swiglu, and 1 otherwise</dd>
31293129
<dt><tt>fc1_experts_bias</tt> (optional) : T</dt>
3130-
<dd>2D optional input tensor with shape (num_experts, inter_size), or (num_experts, 2 * inter_size) for swiglu</dd>
3130+
<dd>2D optional input tensor with shape (num_experts, fusion_size * inter_size)</dd>
31313131
<dt><tt>fc2_experts_weights</tt> : T</dt>
31323132
<dd>3D input tensor with shape (num_experts, hidden_size, inter_size)</dd>
31333133
<dt><tt>fc2_experts_bias</tt> (optional) : T</dt>
@@ -3142,7 +3142,7 @@ This version of the operator has been available since version 1 of the 'com.micr
31423142

31433143
<dl>
31443144
<dt><tt>output</tt> : T</dt>
3145-
<dd>2D input tensor with shape (num_rows, hidden_size) or 3D input tensor with shape (batch_size, sequence_length, hidden_size)</dd>
3145+
<dd>2D input tensor with shape (num_tokens, hidden_size) or 3D input tensor with shape (batch_size, sequence_length, hidden_size)</dd>
31463146
</dl>
31473147

31483148
#### Type Constraints
@@ -4532,7 +4532,23 @@ This version of the operator has been available since version 1 of the 'com.micr
45324532

45334533
### <a name="com.microsoft.QMoE"></a><a name="com.microsoft.qmoe">**com.microsoft.QMoE**</a>
45344534

4535-
Quantized MoE
4535+
Quantized mixture of experts (MoE).
4536+
4537+
Only weights are quantized with symmetric quantization.
4538+
The quantized weights are stored in column major order per expert.
4539+
The quantization block size can be specified. If not provided, column wise quantization is used.
4540+
4541+
The SwiGLU (Swish-Gated Linear Unit) activation function is like:
4542+
g = xW + b
4543+
l = xV + c
4544+
G = clamp(g, max=limit)
4545+
L = clamp(l, min=-limit, max=limit)
4546+
swiglu = G * sigmoid(alpha * G) * (L + beta)
4547+
where x is the input, W and V are weight matrices, b and c are bias vectors, and alpha, beta and limit are constant float parameters.
4548+
When swiglu_fusion=0, two GEMMs are not fused, and they are FC1 and FC3 in the inputs.
4549+
When swiglu_fusion=1, two GEMMs are fused so that g and l are computed in a single GEMM (FC1), and g and l are interleaved on each row of size 2 * inter_size.
4550+
When swiglu_fusion=2, two GEMMs are fused, and g and l are concatenated on each row.
4551+
45364552

45374553
#### Version
45384554

@@ -4547,6 +4563,8 @@ This version of the operator has been available since version 1 of the 'com.micr
45474563
<dd>Beta parameter used in activation function.</dd>
45484564
<dt><tt>activation_type</tt> : string</dt>
45494565
<dd>Activation function to use. Choose from relu, gelu, silu, swiglu and identity. Default is relu</dd>
4566+
<dt><tt>block_size</tt> : int</dt>
4567+
<dd>Size of each quantization block along the K (input feature) dimension. Must be power of two and ≥ 16 (e.g., 16, 32, 64, 128). If provided, both hidden_size and inter_size must be divisible by the block size. Otherwise, there is no blocking and a whole column shares one scaling factor. </dd>
45504568
<dt><tt>expert_weight_bits</tt> : int</dt>
45514569
<dd>Number of bits used in quantized weights. Default is 4 bits</dd>
45524570
<dt><tt>k</tt> : int</dt>
@@ -4565,34 +4583,34 @@ This version of the operator has been available since version 1 of the 'com.micr
45654583

45664584
<dl>
45674585
<dt><tt>input</tt> : T</dt>
4568-
<dd>2D input tensor with shape (num_rows, hidden_size) or 3D input tensor with shape (batch_size, sequence_length, hidden_size)</dd>
4586+
<dd>2D tensor with shape (num_tokens, hidden_size), or 3D tensor with shape (batch_size, sequence_length, hidden_size)</dd>
45694587
<dt><tt>router_probs</tt> : T</dt>
4570-
<dd>2D input tensor with shape (num_rows, num_experts)</dd>
4588+
<dd>2D tensor with shape (num_tokens, num_experts)</dd>
45714589
<dt><tt>fc1_experts_weights</tt> : T1</dt>
4572-
<dd>3D input tensor with shape (num_experts, inter_size, hidden_size), or (num_experts, inter_size, hidden_size / 2) for 4 bits. For swiglu, shape can be (num_experts, 2 * inter_size, hidden_size), or (num_experts, 2 * inter_size, hidden_size / 2) for 4 bits.</dd>
4590+
<dd>3D tensor with shape (num_experts, fusion_size * inter_size, hidden_size / pack_size), The fusion_size is 2 for fused swiglu, or 1 otherwise. The pack_size is 8 / expert_weight_bits.</dd>
45734591
<dt><tt>fc1_scales</tt> : T2</dt>
4574-
<dd>2D input tensor with shape (num_experts, inter_size), or (num_experts, 2 * inter_size) for swiglu</dd>
4592+
<dd>2D tensor with shape (num_experts, fusion_size * inter_size), or 3D tensor with shape (num_experts, fusion_size * inter_size, hidden_size / block_size) when block_size is provided.</dd>
45754593
<dt><tt>fc1_experts_bias</tt> (optional) : T</dt>
4576-
<dd>2D optional input tensor with shape (num_experts, inter_size), or (num_experts, 2 * inter_size) for swiglu</dd>
4594+
<dd>2D optional tensor with shape (num_experts, fusion_size * inter_size)</dd>
45774595
<dt><tt>fc2_experts_weights</tt> : T1</dt>
4578-
<dd>3D input tensor with shape (num_experts, hidden_size, inter_size) or (num_experts, hidden_size, inter_size / 2) for 4 bits</dd>
4596+
<dd>3D tensor with shape (num_experts, hidden_size, inter_size / pack_size)</dd>
45794597
<dt><tt>fc2_scales</tt> : T2</dt>
4580-
<dd>2D input tensor with shape (num_experts, hidden_size)</dd>
4598+
<dd>2D tensor with shape (num_experts, hidden_size), or 3D tensor with shape (num_experts, hidden_size, inter_size / block_size) when block_size is provided.</dd>
45814599
<dt><tt>fc2_experts_bias</tt> (optional) : T</dt>
4582-
<dd>2D optional input tensor with shape (num_experts, hidden_size)</dd>
4600+
<dd>2D optional tensor with shape (num_experts, hidden_size)</dd>
45834601
<dt><tt>fc3_experts_weights</tt> (optional) : T1</dt>
4584-
<dd>3D optional input tensor with shape (num_experts, inter_size, hidden_size) or (num_experts, inter_size, hidden_size / 2)</dd>
4602+
<dd>3D optional tensor with shape (num_experts, inter_size, hidden_size / pack_size)</dd>
45854603
<dt><tt>fc3_scales</tt> (optional) : T2</dt>
4586-
<dd>2D optional input tensor with shape (num_experts, inter_size)</dd>
4604+
<dd>2D optional tensor with shape (num_experts, inter_size), or 3D optional tensor with shape (num_experts, inter_size, hidden_size / block_size) when block_size is provided.</dd>
45874605
<dt><tt>fc3_experts_bias</tt> (optional) : T</dt>
4588-
<dd>2D optional input tensor with shape (num_experts, inter_size)</dd>
4606+
<dd>2D optional tensor with shape (num_experts, inter_size)</dd>
45894607
</dl>
45904608

45914609
#### Outputs
45924610

45934611
<dl>
45944612
<dt><tt>output</tt> : T</dt>
4595-
<dd>2D input tensor with shape (num_rows, hidden_size) or 3D input tensor with shape (batch_size, sequence_length, hidden_size)</dd>
4613+
<dd>output tensor with same shape of input</dd>
45964614
</dl>
45974615

45984616
#### Type Constraints

docs/python/README.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ For more information on ONNX Runtime, please see `aka.ms/onnxruntime <https://ak
77

88
Changes
99
-------
10+
1.23.2
11+
^^^^^^
12+
13+
Release Notes : https://github.com/Microsoft/onnxruntime/releases/tag/v1.23.2
14+
1015

1116
1.23.1
1217
^^^^^^

js/common/lib/version.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@
44
// This file is generated by /js/scripts/update-version.ts
55
// Do not modify file content manually.
66

7-
export const version = '1.23.1';
7+
export const version = '1.23.2';

js/common/package-lock.json

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

js/common/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"license": "MIT",
33
"type": "module",
44
"name": "onnxruntime-common",
5-
"version": "1.23.1",
5+
"version": "1.23.2",
66
"repository": {
77
"url": "https://github.com/Microsoft/onnxruntime.git",
88
"type": "git"

js/node/lib/version.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@
44
// This file is generated by /js/scripts/update-version.ts
55
// Do not modify file content manually.
66

7-
export const version = '1.23.1';
7+
export const version = '1.23.2';

js/node/package-lock.json

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

js/node/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
6
1212
]
1313
},
14-
"version": "1.23.1",
14+
"version": "1.23.2",
1515
"dependencies": {
1616
"adm-zip": "^0.5.16",
1717
"global-agent": "^3.0.0",

0 commit comments

Comments
 (0)