Skip to content

Conversation

@Alcpz
Copy link
Contributor

@Alcpz Alcpz commented Nov 5, 2025

While testing #16739, perplexities for LFM2 skyrocketed. @ggerganov pointed out that some matrix shapes would probably not be supported.

LFM2 has some layers that have two batches, so MAT_MULs were only done partially, leading to incorrect results. See #16739 (comment)

This patch adds basic support for tensors with ne2 > 1, with very naive chunking based on the non repack MUL MAT.

Perplexities using this patch:

# REPACK ON
[1]9.9763,[2]15.1558,[3]13.9708,[4]13.7465,[5]14.2039,[6]14.6234,[7]14.6543,[8]15.6984,[9]16.7691,[10]17.1773,[11]16.9814,[12]17.2111,[13]17.7539,[14]17.2013,[15]16.8515,[16]16.9276,[17]15.8386,[18]16.1010,[19]15.9863,[20]15.7344,
Final estimate: PPL = 15.7344 +/- 0.63198
# REPACK OFF
[1]9.9763,[2]15.1558,[3]13.9708,[4]13.7465,[5]14.2039,[6]14.6234,[7]14.6543,[8]15.6984,[9]16.7691,[10]17.1773,[11]16.9814,[12]17.2111,[13]17.7539,[14]17.2013,[15]16.8515,[16]16.9276,[17]15.8386,[18]16.1010,[19]15.9863,[20]15.7344,
Final estimate: PPL = 15.7344 +/- 0.63198

I can provide logs for other models if needed.

Repro commands:

# GGML_CPU_REPACK=ON|OFF GGML_BLAS=OFF GGML_METAL=OFF

for model in "unsloth/Qwen3-8B-128K-GGUF:Q4_0 LiquidAI/LFM2-1.2B-GGUF:Q4_0 LiquidAI/LFM2-2.6B-GGUF:Q4_0"; do
  ./bin/llama-perplexity -hf $model -f ./wikitext-2-raw/wiki.test.raw --chunks 100 -dev none
done

Other models:

# Qwen 3 Repack
perplexities_build-cpu-aarm64_Qwen3-8B-128K-GGUF:Q4_0.txt-[1]7.6803,[2]10.1811,[3]9.4260,[4]9.0666,[5]9.2647,[6]9.6980,[7]9.8774,[8]10.4100,[9]10.9424,[10]11.4185,[11]11.4938,[12]11.6893,[13]12.1807,[14]11.7433,[15]11.5808,[16]11.7468,[17]11.0987,[18]11.2603,[19]11.0962,[20]11.1735,[21]10.8974,[22]10.9181,[23]10.4976,[24]9.9920,[25]9.7800,[26]9.5234,[27]9.2917,[28]9.1358,[29]9.1840,[30]9.1386,[31]9.1237,[32]9.1164,[33]8.9839,[34]9.0213,[35]9.0949,[36]9.2154,[37]9.3887,[38]9.4512,[39]9.4129,[40]9.4693,[41]9.4650,[42]9.3915,[43]9.4305,[44]9.4552,[45]9.4605,[46]9.4598,[47]9.6747,[48]9.7829,[49]9.7476,[50]9.8248,[51]9.8489,[52]9.8696,[53]9.9087,[54]9.9802,[55]10.0069,[56]10.0701,[57]10.0272,[58]10.0532,[59]10.1151,[60]10.1645,[61]10.1961,[62]10.2441,[63]10.3282,[64]10.3811,[65]10.4620,[66]10.5540,[67]10.6382,[68]10.6259,[69]10.6246,[70]10.6129,[71]10.6290,[72]10.6951,[73]10.7189,[74]10.7327,[75]10.6625,[76]10.6244,[77]10.6562,[78]10.6942,[79]10.6103,[80]10.5880,[81]10.5408,[82]10.5761,[83]10.5308,[84]10.5104,[85]10.5348,[86]10.6326,[87]10.6827,[88]10.6733,[89]10.6861,[90]10.6783,[91]10.7371,[92]10.6980,[93]10.7394,[94]10.7430,[95]10.7241,[96]10.7199,[97]10.6880,[98]10.6990,[99]10.6692,[100]10.7254,
perplexities_build-cpu-aarm64_Qwen3-8B-128K-GGUF:Q4_0.txt:Final estimate: PPL = 10.7254 +/- 0.20427

# Qwen 3 REPACK OFF
perplexities_build-cpu-aarm64-norepack_Qwen3-8B-128K-GGUF:Q4_0.txt-[1]7.6803,[2]10.1811,[3]9.4260,[4]9.0666,[5]9.2647,[6]9.6980,[7]9.8774,[8]10.4100,[9]10.9424,[10]11.4185,[11]11.4938,[12]11.6893,[13]12.1807,[14]11.7433,[15]11.5808,[16]11.7468,[17]11.0987,[18]11.2603,[19]11.0962,[20]11.1735,[21]10.8974,[22]10.9181,[23]10.4976,[24]9.9920,[25]9.7800,[26]9.5234,[27]9.2917,[28]9.1358,[29]9.1840,[30]9.1386,[31]9.1237,[32]9.1164,[33]8.9839,[34]9.0213,[35]9.0949,[36]9.2154,[37]9.3887,[38]9.4512,[39]9.4129,[40]9.4693,[41]9.4650,[42]9.3915,[43]9.4305,[44]9.4552,[45]9.4605,[46]9.4598,[47]9.6747,[48]9.7829,[49]9.7476,[50]9.8248,[51]9.8489,[52]9.8696,[53]9.9087,[54]9.9802,[55]10.0069,[56]10.0701,[57]10.0272,[58]10.0532,[59]10.1151,[60]10.1645,[61]10.1961,[62]10.2441,[63]10.3282,[64]10.3811,[65]10.4620,[66]10.5540,[67]10.6382,[68]10.6259,[69]10.6246,[70]10.6129,[71]10.6290,[72]10.6951,[73]10.7189,[74]10.7327,[75]10.6625,[76]10.6244,[77]10.6562,[78]10.6942,[79]10.6103,[80]10.5880,[81]10.5408,[82]10.5761,[83]10.5308,[84]10.5104,[85]10.5348,[86]10.6326,[87]10.6827,[88]10.6733,[89]10.6861,[90]10.6783,[91]10.7371,[92]10.6980,[93]10.7394,[94]10.7430,[95]10.7241,[96]10.7199,[97]10.6880,[98]10.6990,[99]10.6692,[100]10.7254,
perplexities_build-cpu-aarm64-norepack_Qwen3-8B-128K-GGUF:Q4_0.txt:Final estimate: PPL = 10.7254 +/- 0.20427

# LFM2 REPACK
perplexities_build-cpu-aarm64_LFM2-2.6B-GGUF:Q4_0.txt-[1]7.0724,[2]11.2417,[3]11.3736,[4]11.0566,[5]11.2978,[6]11.8576,[7]12.1547,[8]12.8728,[9]13.8226,[10]14.0957,[11]13.6415,[12]13.7865,[13]14.1242,[14]13.5275,[15]13.2750,[16]13.1469,[17]12.3869,[18]12.5628,[19]12.4196,[20]12.2570,[21]11.8653,[22]11.8657,[23]11.5625,[24]11.3099,[25]11.2837,[26]11.0172,[27]10.9685,[28]10.9421,[29]10.8844,[30]11.0062,[31]10.9984,[32]11.1214,[33]11.0812,[34]11.0926,[35]11.0572,[36]11.1630,[37]11.3042,[38]11.1564,[39]11.3252,[40]11.2555,[41]11.2296,[42]11.2722,[43]11.3182,[44]11.2066,[45]11.2418,[46]11.3877,[47]11.5001,[48]11.4392,[49]11.4613,[50]11.5636,[51]11.5742,[52]11.5927,[53]11.6412,[54]11.6469,[55]11.7139,[56]11.7273,[57]11.7956,[58]11.8651,[59]11.9185,[60]11.9757,[61]11.9816,[62]12.0535,[63]12.1499,[64]12.2589,[65]12.3879,[66]12.4853,[67]12.4684,[68]12.4438,[69]12.4475,[70]12.4592,[71]12.5043,[72]12.5274,[73]12.5598,[74]12.5025,[75]12.4682,[76]12.4976,[77]12.5186,[78]12.4596,[79]12.3959,[80]12.3615,[81]12.4195,[82]12.4745,[83]12.4321,[84]12.4450,[85]12.5002,[86]12.5583,[87]12.5979,[88]12.5772,[89]12.5398,[90]12.5321,[91]12.4828,[92]12.5500,[93]12.5727,[94]12.5613,[95]12.5658,[96]12.5653,[97]12.5379,[98]12.5156,[99]12.5447,[100]12.5589,
perplexities_build-cpu-aarm64_LFM2-2.6B-GGUF:Q4_0.txt:Final estimate: PPL = 12.5589 +/- 0.21849

# LFM2 REPACK OFF
perplexities_build-cpu-aarm64-norepack_LFM2-2.6B-GGUF:Q4_0.txt-[1]7.0724,[2]11.2417,[3]11.3736,[4]11.0566,[5]11.2978,[6]11.8576,[7]12.1547,[8]12.8728,[9]13.8226,[10]14.0957,[11]13.6415,[12]13.7865,[13]14.1242,[14]13.5275,[15]13.2750,[16]13.1469,[17]12.3869,[18]12.5628,[19]12.4196,[20]12.2570,[21]11.8653,[22]11.8657,[23]11.5625,[24]11.3099,[25]11.2837,[26]11.0172,[27]10.9685,[28]10.9421,[29]10.8844,[30]11.0062,[31]10.9984,[32]11.1214,[33]11.0812,[34]11.0926,[35]11.0572,[36]11.1630,[37]11.3042,[38]11.1564,[39]11.3252,[40]11.2555,[41]11.2296,[42]11.2722,[43]11.3182,[44]11.2066,[45]11.2418,[46]11.3877,[47]11.5001,[48]11.4392,[49]11.4613,[50]11.5636,[51]11.5742,[52]11.5927,[53]11.6412,[54]11.6469,[55]11.7139,[56]11.7273,[57]11.7956,[58]11.8651,[59]11.9185,[60]11.9757,[61]11.9816,[62]12.0535,[63]12.1499,[64]12.2589,[65]12.3879,[66]12.4853,[67]12.4684,[68]12.4438,[69]12.4475,[70]12.4592,[71]12.5043,[72]12.5274,[73]12.5598,[74]12.5025,[75]12.4682,[76]12.4976,[77]12.5186,[78]12.4596,[79]12.3959,[80]12.3615,[81]12.4195,[82]12.4745,[83]12.4321,[84]12.4450,[85]12.5002,[86]12.5583,[87]12.5979,[88]12.5772,[89]12.5398,[90]12.5321,[91]12.4828,[92]12.5500,[93]12.5727,[94]12.5613,[95]12.5658,[96]12.5653,[97]12.5379,[98]12.5156,[99]12.5447,[100]12.5589,
perplexities_build-cpu-aarm64-norepack_LFM2-2.6B-GGUF:Q4_0.txt:Final estimate: PPL = 12.5589 +/- 0.21849

@Alcpz Alcpz changed the title ggml-cpu: handle 3d tensors in repack mul_mat ggml-cpu: handle 3d tensors in repack mat_mul Nov 5, 2025
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 5, 2025
@Alcpz Alcpz force-pushed the Alcpz/batched_repack_mul_mat branch from eadb483 to 950671d Compare November 5, 2025 18:03
@Alcpz Alcpz marked this pull request as draft November 5, 2025 18:46
@Alcpz Alcpz marked this pull request as ready for review November 6, 2025 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant