Commit 0d61efc
authored
Qualcomm AI Engine Direct - Add MHA2SHA pass (pytorch#15438)
### Background
We observed that quantizing and compiling the original sha model
requires a significant amount of time. Switching to the mha model speeds
up this process. Therefore, we investigated whether converting the mha
model after quantization is feasible. However, we cannot perform this
conversion during the to_edge transformation, as splitting the
convolution weights to sha would require modifying the state_dict, which
is not permitted at that stage. Therefore, we decided to apply this pass
during qnn_preprocess.
### Summary:
- Integrated mha into sha pass and implemented it in qnn_preprocess
- Refactored mha in static llama
- Included spin quant r3 support and masked softmax for MHA model in
static llama
- Combined the n_heads key-value cache into a single cache for each
layer to decrease the number of inputs and outputs, which enhances
performance.
- Deprecated ShiftPointer kv updater mode
- Since each layer now has its own kv cache, the v cache no longer
benefits from ShiftPointer, which previously avoided copying the new v
cache to the input v cache. To prevent user confusion, ShiftPointer mode
has been deprecated
- Applied the correct input template for smollm2 135m
- Correct the quantization annotation for reshape
- Remove outdated code from CanonicalizeConv
### Results
Follow [README
setting](https://github.com/pytorch/executorch/blob/main/examples/qualcomm/oss_scripts/llama/README.md),
test on SM8750 with QNN 2.37. Compared the new pass `convert_mha_to_sha`
with original sha structure
<img width="1731" height="734" alt="image"
src="https://github.com/user-attachments/assets/2b1c2b66-77c0-4662-a035-900ad9091d67"
/>1 parent 32916d3 commit 0d61efc
File tree
39 files changed
+1222
-1004
lines changed- backends/qualcomm
- _passes
- quantizer
- serialization
- tests
- utils
- examples/qualcomm/oss_scripts/llama
- artifacts
- assets
- model
- runner
39 files changed
+1222
-1004
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
56 | 57 | | |
57 | 58 | | |
58 | 59 | | |
| 60 | + | |
59 | 61 | | |
60 | 62 | | |
61 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
13 | 12 | | |
14 | 13 | | |
15 | 14 | | |
| |||
197 | 196 | | |
198 | 197 | | |
199 | 198 | | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | 199 | | |
209 | 200 | | |
210 | 201 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
51 | 57 | | |
52 | 58 | | |
53 | 59 | | |
| |||
0 commit comments