convert: add dequant function for compressed_tensor (kimi-k2-thinking) #17064

ngxson · 2025-11-06T21:32:08Z

Need help for testing this

Model: https://huggingface.co/moonshotai/Kimi-K2-Thinking

csabakecskemeti · 2025-11-06T21:56:08Z

@ngxson still downloading the model but will test and report back!

ngxson · 2025-11-06T22:07:01Z

The output GGUF quantized to Q8_0 will be over 1 terabyte. Now I'm doubt is I even have enough memory to test it.

csabakecskemeti · 2025-11-06T22:11:29Z

over how much? :) I have ~1.1TB + 64G vram

ngxson · 2025-11-06T22:13:24Z

python convert_hf_to_gguf.py --outfile model.gguf --outtype q8_0 .

Output GGUF will be 1.09T

ubergarm · 2025-11-06T22:15:49Z

Exciting, thanks for looking into this one y'all!

Well, started off strong, but then died RuntimeError: Tensor on device cpu is not on the expected device meta!.

I'm on a CPU-only rig with 1.5TB RAM and plenty of disk space. But no GPUs. I have installed triton-cpu instead of triton in my python venv for what it is worth.

Also had to manualy push Y to accept in lieu of trust_remote_code=True.

👈 Details Command and full Logs

$ numactl -N 1 -m 1 \
python \
    convert_hf_to_gguf.py \
    --outtype bf16 \
    --split-max-size 50G \
    --outfile /mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF \
    /mnt/data/models/moonshotai/Kimi-K2-Thinking/

INFO:hf-to-gguf:Loading model: Kimi-K2-Thinking
WARNING:hf-to-gguf:Failed to load model config from /mnt/data/models/moonshotai/Kimi-K2-Thinking: The repository /mnt/data/models/moonshotai/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/data/models/moonshotai/Kimi-K2-Thinking .
 You can inspect the repository content at https://hf.co//mnt/data/models/moonshotai/Kimi-K2-Thinking.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: DeepseekV3ForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /mnt/data/models/moonshotai/Kimi-K2-Thinking: The repository /mnt/data/models/moonshotai/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/data/models/moonshotai/Kimi-K2-Thinking .
 You can inspect the repository content at https://hf.co//mnt/data/models/moonshotai/Kimi-K2-Thinking.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00003-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00004-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00005-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00006-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00007-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00008-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00009-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00010-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00011-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00012-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00013-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00014-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00015-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00016-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00017-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00018-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00019-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00020-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00021-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00022-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00023-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00024-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00025-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00026-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00027-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00028-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00029-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00030-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00031-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00032-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00033-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00034-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00035-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00036-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00037-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00038-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00039-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00040-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00041-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00042-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00043-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00044-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00045-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00046-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00047-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00048-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00049-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00050-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00051-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00052-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00053-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00054-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00055-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00056-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00057-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00058-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00059-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00060-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00061-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00062-of-000062.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.bfloat16 --> BF16, shape = {18432, 7168}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,        torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_up.weight,          torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.0.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.0.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.0.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.0.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.0.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.0.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.1.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.1.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.1.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.1.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.1.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.1.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.1.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.1.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.1.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.1.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.1.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.1.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.2.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.2.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.2.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.2.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.2.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.2.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.2.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.2.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.2.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.2.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.2.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.2.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.2.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.2.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.3.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.3.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.3.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.3.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.3.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.3.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.3.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.3.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.3.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.3.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.3.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.3.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.3.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.3.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.4.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.4.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.4.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.4.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.4.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.4.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.4.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.4.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.4.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.4.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.4.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.4.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.4.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.4.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.5.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.5.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.5.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.5.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.5.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.5.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.5.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.5.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.5.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.5.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.5.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.5.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.5.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.6.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.6.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.6.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.6.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.6.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.6.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.6.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.6.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.6.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.6.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.6.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.6.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.6.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.6.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.7.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.7.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.7.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.7.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.7.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.7.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.7.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.7.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.7.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.7.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.7.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.7.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.7.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.7.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.8.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.8.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.8.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.8.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.8.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.8.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.8.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.8.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.8.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.9.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.9.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.9.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.9.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.9.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.9.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.9.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.9.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.9.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.9.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.9.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.9.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.9.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.9.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.10.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.10.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.10.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.10.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.10.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.10.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.10.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.10.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.10.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.10.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.10.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.10.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.10.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.10.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.11.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.11.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.11.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.11.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.11.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.11.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.11.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.11.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.11.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.11.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.11.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.11.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.11.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.12.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.12.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.12.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.12.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.12.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.12.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.12.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.12.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.12.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.12.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.12.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.12.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.12.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.12.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.13.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.13.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.13.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.13.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.13.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.13.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.13.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.13.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.13.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.13.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.13.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.13.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.13.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.13.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.14.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.14.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.14.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.14.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.14.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.14.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.14.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.14.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.14.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.14.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.14.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.14.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.14.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.14.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.15.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.15.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.15.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.15.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.15.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.15.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.15.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.15.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.15.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.15.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.15.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.15.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.15.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.15.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.16.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.16.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.16.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.16.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.16.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.16.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.16.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.16.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.16.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.16.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.16.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.16.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.16.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.16.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.17.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.17.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.17.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.17.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.17.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.17.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.17.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.17.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.17.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.17.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.17.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.17.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.18.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.18.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.18.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.18.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.18.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.18.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.18.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.18.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.18.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.18.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.18.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.18.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.18.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.18.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.19.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.19.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.19.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.19.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.19.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.19.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.19.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.19.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.19.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.19.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.19.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.19.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.19.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.19.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.20.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.20.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.20.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.20.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.20.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.20.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.20.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.20.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.20.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.20.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.20.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.20.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.20.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.20.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.21.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.21.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.21.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.21.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.21.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.21.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.21.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.21.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.21.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.21.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.21.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.21.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.21.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.21.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.22.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.22.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.22.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.22.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.22.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.22.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.22.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.22.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.22.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.22.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.22.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.22.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.22.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.22.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.23.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.23.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.23.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.23.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.23.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.23.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.23.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.23.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.23.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.23.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.23.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.23.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.23.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.24.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.24.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.24.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.24.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.24.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.24.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.24.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.24.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.24.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.24.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.24.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.24.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.24.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.24.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.25.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.25.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.25.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.25.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.25.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.25.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.25.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.25.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.25.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.25.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.25.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.25.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.25.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.25.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.26.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.26.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.26.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.26.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.26.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.26.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.26.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.26.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.26.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.26.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.26.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.26.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.26.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.26.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.27.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.27.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.27.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.27.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.27.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.27.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.27.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.27.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.27.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.27.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.27.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.27.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.27.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.27.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.28.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.28.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.28.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.28.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.28.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.28.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.28.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.28.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.28.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.28.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.28.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.28.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.28.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.28.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.29.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.29.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.29.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.29.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.29.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.29.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.29.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.29.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.29.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.29.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.29.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.29.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.29.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.30.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.30.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.30.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.30.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.30.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.30.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.30.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.30.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.30.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.30.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.30.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.30.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.30.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.30.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.31.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.31.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.31.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.31.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.31.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.31.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.31.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.31.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.31.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.31.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.31.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.31.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.31.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.31.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.32.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.32.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.32.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.32.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.32.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.32.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.32.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.32.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.32.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.32.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.32.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.32.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.32.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.32.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.32.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.33.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.33.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.33.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.33.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.33.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.33.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.33.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.33.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.33.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.33.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.33.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.33.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.33.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.33.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.33.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.34.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.34.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.34.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.34.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.34.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.34.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.34.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.34.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.34.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.34.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.34.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.34.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.34.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.34.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.34.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.35.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.35.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.35.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.35.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.35.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.35.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.35.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.35.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.35.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.35.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.35.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.35.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.35.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.35.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.36.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.36.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.36.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.36.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.36.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.36.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.36.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.36.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.36.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.36.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.36.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.36.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.36.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.36.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.36.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.37.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.37.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.37.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.37.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.37.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.37.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.37.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.37.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.37.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.37.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.37.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.37.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.37.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.37.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.37.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.38.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.38.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.38.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.38.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.38.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.38.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.38.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.38.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.38.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.38.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.38.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.38.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.38.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.38.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.38.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.39.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.39.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.39.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.39.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.39.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.39.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.39.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.39.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.39.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.39.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.39.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.39.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.39.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.39.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.39.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.40.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.40.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.40.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.40.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.40.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.40.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.40.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.40.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.40.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.40.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.40.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.40.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.40.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.40.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.40.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.41.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.41.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.41.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.41.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.41.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.41.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.41.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.41.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.41.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.41.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.41.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.41.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.41.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.41.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.42.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.42.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.42.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.42.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.42.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.42.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.42.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.42.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.42.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.42.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.42.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.42.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.42.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.42.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.42.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.43.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.43.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.43.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.43.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.43.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.43.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.43.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.43.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.43.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.43.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.43.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.43.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.43.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.43.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.43.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.44.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.44.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.44.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.44.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.44.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.44.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.44.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.44.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.44.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.44.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.44.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.44.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.44.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.44.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.44.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.45.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.45.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.45.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.45.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.45.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.45.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.45.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.45.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.45.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.45.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.45.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.45.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.45.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.45.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.45.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.46.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.46.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.46.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.46.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.46.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.46.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.46.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.46.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.46.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.46.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.46.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.46.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.46.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.46.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.46.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.47.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.47.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.47.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.47.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.47.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.47.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.47.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.47.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.47.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.47.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.47.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.47.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.47.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.47.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.47.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.48.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.48.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.48.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.48.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.48.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.48.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.48.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.48.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.48.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.48.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.48.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.48.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.48.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.48.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.48.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.49.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.49.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.49.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.49.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.49.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.49.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.49.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.49.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.49.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.49.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.49.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.49.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.49.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.49.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.49.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.50.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.50.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.50.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.50.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.50.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.50.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.50.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.50.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.50.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.50.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.50.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.50.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.50.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.50.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.50.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.51.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.51.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.51.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.51.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.51.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.51.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.51.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.51.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.51.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.51.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.51.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.51.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.51.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.51.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.51.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.52.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.52.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.52.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.52.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.52.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.52.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.52.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.52.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.52.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.52.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.52.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.52.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.52.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.52.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.52.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.53.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.53.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.53.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.53.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.53.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.53.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.53.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.53.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.53.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.53.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.53.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.53.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.53.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.53.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.53.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.54.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.54.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.54.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.54.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.54.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.54.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.54.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.54.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.54.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.54.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.54.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.54.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.54.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.54.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.54.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.55.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.55.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.55.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.55.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.55.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.55.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.55.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.55.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.55.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.55.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.55.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.55.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.55.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.55.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.55.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.56.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.56.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.56.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.56.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.56.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.56.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.56.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.56.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.56.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.56.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.56.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.56.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.56.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.56.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.56.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.57.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.57.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.57.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.57.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.57.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.57.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.57.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.57.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.57.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.57.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.57.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.57.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.57.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.57.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.57.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.58.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.58.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.58.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.58.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.58.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.58.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.58.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.58.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.58.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.58.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.58.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.58.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.58.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.58.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.58.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.59.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.59.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.59.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.59.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.59.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.59.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.59.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.59.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.59.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.59.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.59.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.59.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.59.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.59.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.59.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.60.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.60.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.60.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.60.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.60.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.60.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.60.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.60.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.60.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.60.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.60.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.60.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.60.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.60.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.60.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:output.weight,                torch.bfloat16 --> BF16, shape = {7168, 163840}
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> BF16, shape = {7168, 163840}
INFO:hf-to-gguf:output_norm.weight,           torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.1.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.1.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.2.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.4.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.4.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.5.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.5.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.5.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.6.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.6.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.6.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.7.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.7.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.7.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.9.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.9.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.9.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.10.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.10.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.10.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.11.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.11.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.11.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.12.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.12.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.12.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.13.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.13.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.13.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.14.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.14.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.14.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.15.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.15.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.15.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.16.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.16.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.16.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.17.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.17.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.17.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.18.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.18.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.18.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.19.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.19.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.19.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.20.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.20.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.20.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.21.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.21.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.21.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.22.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.22.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.22.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.23.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.23.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.23.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.24.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.24.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.24.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.25.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.25.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.25.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.26.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.26.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.26.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.27.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.27.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.27.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.28.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.28.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.28.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.29.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.29.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.29.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.30.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.30.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.30.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.31.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.31.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.31.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.32.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.32.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.32.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.33.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.33.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.33.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.34.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.34.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.34.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.35.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.35.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.35.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.36.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.36.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.36.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.37.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.37.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.37.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.38.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.38.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.38.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.39.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.39.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.39.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.40.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.40.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.40.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.41.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.41.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.41.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.42.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.42.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.42.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.43.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.43.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.43.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.44.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.44.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.44.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.45.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.45.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.45.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.46.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.46.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.46.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.47.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.47.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.47.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.48.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.48.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.48.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.49.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.49.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.49.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.50.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.50.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.50.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.51.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.51.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.51.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.52.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.52.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.52.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.53.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.53.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.53.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.54.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.54.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.54.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.55.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.55.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.55.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.56.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.56.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.56.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.57.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.57.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.57.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.58.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.58.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.58.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.59.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.59.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.59.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.60.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.60.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.60.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 262144
INFO:hf-to-gguf:gguf: embedding length = 7168
INFO:hf-to-gguf:gguf: feed forward length = 18432
INFO:hf-to-gguf:gguf: head count = 64
INFO:hf-to-gguf:gguf: key-value head count = 1
INFO:hf-to-gguf:gguf: rope theta = 50000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 8
INFO:hf-to-gguf:gguf: expert groups count = 1
INFO:hf-to-gguf:gguf: expert groups used count = 1
INFO:hf-to-gguf:gguf: file type = 32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
The repository /mnt/data/models/moonshotai/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/data/models/moonshotai/Kimi-K2-Thinking .
 You can inspect the repository content at https://hf.co//mnt/data/models/moonshotai/Kimi-K2-Thinking.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:Reloaded tiktoken model from /mnt/data/models/moonshotai/Kimi-K2-Thinking/tiktoken.model
INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:Reloaded tiktoken model from /mnt/data/models/moonshotai/Kimi-K2-Thinking/tiktoken.model
INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
INFO:gguf.vocab:Setting special token type bos to 163584
INFO:gguf.vocab:Setting special token type eos to 163586
INFO:gguf.vocab:Setting special token type pad to 163839
INFO:gguf.vocab:Setting chat_template to {%- macro render_content(msg) -%}
    {%- set c = msg.get('content') -%}
    {%- if c is string -%}
      {{ c }}
    {%- elif c is not none -%}
      {% for content in c -%}
        {% if content['type'] == 'image' or 'image' in content or 'image_url' in content -%}
          <|media_start|>image<|media_content|><|media_pad|><|media_end|>
        {% else -%}
          {{ content['text'] }}
        {%- endif -%}
      {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

{% macro set_roles(message) -%}
  {%- set role_name =  message.get('name') or  message['role'] -%}
  {%- if message['role'] == 'user' -%}
    <|im_user|>{{role_name}}<|im_middle|>
  {%- elif message['role'] == 'assistant' -%}
    <|im_assistant|>{{role_name}}<|im_middle|>
  {%- else -%}
    <|im_system|>{{role_name}}<|im_middle|>
  {%- endif -%}
{%- endmacro -%}


{%- macro render_toolcalls(message) -%}
  <|tool_calls_section_begin|>
  {%- for tool_call in message['tool_calls'] -%}
    {%- set formatted_id = tool_call['id'] -%}
    <|tool_call_begin|>{{ formatted_id }}<|tool_call_argument_begin|>{% if tool_call['function']['arguments'] is string %}{{ tool_call['function']['arguments'] }}{% else %}{{ tool_call['function']['arguments'] | tojson }}{% endif %}<|tool_call_end|>
  {%- endfor -%}
  <|tool_calls_section_end|>
{%- endmacro -%}


{# Find last non-tool-call assisitant message #}
{%- set ns = namespace(last_non_tool_call_assistant_msg=-1) -%}
{%- for idx in range(messages|length-1, -1, -1) -%}
    {%- if messages[idx]['role'] == 'assistant' and not messages[idx].get('tool_calls') -%}
        {%- set ns.last_non_tool_call_assistant_msg = idx -%}
        {%- break -%}
    {%- endif -%}
{%- endfor -%}

{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}
{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}

{%- if tools -%}
  <|im_system|>tool_declare<|im_middle|>{{ tools | tojson(separators=(',', ':')) }}<|im_end|>
{%- endif -%}

{%- for message in hist_msgs -%}
  {%- if loop.first and messages[0]['role'] != 'system' -%}
  <|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|>
  {%- endif -%}
  {{set_roles(message)}}
  {%- if message['role'] == 'assistant' -%}
    <think></think>{{render_content(message)}}
    {%- if message.get('tool_calls') -%}
      {{render_toolcalls(message)}}
    {%- endif -%}
  {%- elif message['role'] == 'tool' -%}
    {%- set tool_call_id = message.tool_call_id -%}
    ## Return of {{ tool_call_id }}
{{render_content(message)}}
  {%- elif message['content'] is not none -%}
    {{render_content(message)}}
  {%- endif -%}
  <|im_end|>
{%- endfor -%}

{%- for message in suffix_msgs -%}
  {{set_roles(message)}}
  {%- if message['role'] == 'assistant' -%}
    {%- set rc = message.get('reasoning_content', '') -%}
    <think>{{rc}}</think>{{render_content(message)}}
    {%- if message.get('tool_calls') -%}
     {{render_toolcalls(message)}}
    {%- endif -%}
  {%- elif message['role'] == 'tool' -%}
    {%- set tool_call_id = message.tool_call_id -%}
    ## Return of {{ tool_call_id }}
{{render_content(message)}}
  {%- elif message['content'] is not none -%}
    {{render_content(message)}}
  {%- endif -%}
  <|im_end|>
{%- endfor -%}


{%- if add_generation_prompt -%}
  <|im_assistant|>assistant<|im_middle|>
{%- endif -%}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00001-of-00046.gguf: n_tensors = 918, total_size = 46.3G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00002-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00003-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00004-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00005-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00006-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00007-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00008-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00009-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00010-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00011-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00012-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00013-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00014-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00015-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00016-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00017-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00018-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00019-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00020-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00021-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00022-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00023-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00024-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00025-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00026-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00027-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00028-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00029-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00030-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00031-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00032-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00033-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00034-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00035-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00036-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00037-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00038-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00039-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00040-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00041-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00042-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00043-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00044-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00045-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00046-of-00046.gguf: n_tensors = 2, total_size = 22.5G

Shard (0/46): 0.00byte [00:00, ?byte/s]

Writing:   1%|          | 21.4G/2.05T [00:30<45:06, 751Mbyte/s]�[A
Shard (1/46):  51%|█████▏    | 23.8G/46.3G [00:33<00:32, 684Mbyte/s]

Writing:   1%|          | 23.8G/2.05T [00:33<49:27, 684Mbyte/s]�[ATraceback (most recent call last):
  File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 10314, in <module>
    main()
  File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 10308, in main
    model_instance.write()
  File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 634, in write
    self.gguf_writer.write_tensors_to_file(progress=True)
  File "/home/w/projects/llama.cpp/gguf-py/gguf/gguf_writer.py", line 456, in write_tensors_to_file
    ti.tensor.tofile(fout)
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 220, in tofile
    eager = LazyNumpyTensor.to_eager(self)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 179, in to_eager
    return cls._recurse_apply(t, simple_to_eager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 170, in simple_to_eager
    _t._data = _t._func(*_t._args, **_t._kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/wrappers.py", line 309, in _fn
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/wrappers.py", line 149, in _fn
    result = fn(**bound.arguments)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_refs/__init__.py", line 1139, in _ref
    output = prim(a, b)
             ^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_refs/__init__.py", line 1746, in mul
    return prims.mul(a, b)
           ^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_ops.py", line 841, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_library/fake_impl.py", line 109, in meta_kernel
    return fake_impl_holder.kernel(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_library/utils.py", line 22, in __call__
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/library.py", line 1430, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_library/custom_ops.py", line 627, in fake_impl
    return self._abstract_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims/__init__.py", line 404, in _prim_elementwise_meta
    utils.check_same_device(*args_, allow_cpu_scalar_tensors=True)
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/__init__.py", line 878, in check_same_device
    raise RuntimeError(msg)
RuntimeError: Tensor on device cpu is not on the expected device meta!

Shard (1/46):  51%|█████▏    | 23.8G/46.3G [00:36<00:34, 655Mbyte/s]

Writing:   1%|          | 23.8G/2.05T [00:36<51:36, 655Mbyte/s]

ngxson · 2025-11-06T22:28:40Z

Last commit should fix the error. I successfully converted the first layer of the model to GGUF.

ubergarm · 2025-11-06T22:49:43Z

Huh, not sure how I got so far the first time. This time it ballooned RAM and oom-killer got me even running across both NUMA nodes for full 1.5TB and going with q8_0 output instead of bf16...

kimi-k2-thinking-convert-fun-lmao-oomkiller

I don't need to pass anything to enable lazy psure right?

So seems like it goes through all the non routed experts first pretty quickly and lowish RAM, but then it slows down once it hits those routed experts and memory usage monotonicly increases at that point:

👈 Partial Logs with comment

INFO:hf-to-gguf:blk.60.attn_kv_a_mqa.weight,  torch.bfloat16 --> Q8_0, shape = {7168, 576}
INFO:hf-to-gguf:blk.60.attn_k_b.weight,       torch.bfloat16 --> Q8_0, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.60.attn_v_b.weight,       torch.bfloat16 --> Q8_0, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.60.attn_output.weight,    torch.bfloat16 --> Q8_0, shape = {8192, 7168}
INFO:hf-to-gguf:blk.60.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.60.attn_q_a.weight,       torch.bfloat16 --> Q8_0, shape = {7168, 1536}
INFO:hf-to-gguf:blk.60.attn_q_b.weight,       torch.bfloat16 --> Q8_0, shape = {1536, 12288}
INFO:hf-to-gguf:output.weight,                torch.bfloat16 --> Q8_0, shape = {7168, 163840}
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> Q8_0, shape = {7168, 163840}
INFO:hf-to-gguf:output_norm.weight,           torch.bfloat16 --> F32, shape = {7168}
# runs smooth before here, but then it really slows down here and RAM usage keeps going up
INFO:hf-to-gguf:blk.1.ffn_down_exps.weight,   torch.float32 --> Q8_0, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.1.ffn_gate_exps.weight,   torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.1.ffn_up_exps.weight,     torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_down_exps.weight,   torch.float32 --> Q8_0, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.2.ffn_gate_exps.weight,   torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_up_exps.weight,     torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight,   torch.float32 --> Q8_0, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight,   torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight,     torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight,   torch.float32 --> Q8_0, shape = {2048, 7168, 384}

ngxson · 2025-11-06T22:57:57Z

yes lazy should be enabled by default

I'm trying another way: directly mapping the quantization to Q4_0. the only disadvantage is that this will downcast the scale bf16 to f16

compilade · 2025-11-06T23:05:35Z

convert_hf_to_gguf.py

                                ".scales",
                            )
                        ]
+            elif quant_method == "compressed-tensors":


Might want to check for quant_config["format"] == "pack-quantized" near here instead of in dequant_compressed_tensors, because the compressed-tensors method has multiple formats which could technically be supported eventually (notably, float-quantized seems relatively similar to (but not quite like) the fp8 method).

csabakecskemeti · 2025-11-06T23:18:11Z

Q8 same as for @ubergarm memory balooned

compilade · 2025-11-06T23:19:37Z

convert_hf_to_gguf.py

+                else:
+                    unpacked = unpacked.to(weight.device) # is this needed?
+                for i in range(pack_factor):
+                    unpacked[:, i::pack_factor] = (weight >> (num_bits * i)) & mask


Lazy tensors don't handle __setitem__ correctly, I think (or it causes eager evaluation). That's because the function returns None and so the change tree can't really be updated with how it's currently implemented.

Prefer explicit concatenation instead if possible (like with torch.cat, torch.stack, etc.). (this should help with memory usage)

Alternatively, there are other ways to unpack without concatenation, like the broadcasting shifts done in gguf-py/gguf/quants.py.

Hmm yeah I need to go offline in next few minutes. Feel free to push directly to this branch if you have any suggestions!

This reverts commit caf0e42.

ngxson · 2025-11-06T23:35:01Z

Made a hack for repacking int4 to Q4_0, I pushed it in another branch: https://github.com/ngxson/llama.cpp/tree/xsn/convert_kimi_k2_quant_repack

IMPORTANT: This requires deleting the "quantization_config" section in config.json; You can also rename it:

ubergarm · 2025-11-06T23:47:56Z

Running xsn/convert_kimi_k2_quant_repack now after editing the config.json as you mention. Seems to be going well! Memory usage is staying low so i put it back on a single NUMA node.

The output splits are missing the name maybe, which it was on this PR branch too psure:

INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf: n_tensors = 99, total_size = 49.9G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00002-of-00013.gguf: n_tensors = 95, total_size = 49.2G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00003-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00004-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00005-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00006-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00007-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00008-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00009-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00010-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00011-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00012-of-00013.gguf: n_tensors = 89, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00013-of-00013.gguf: n_tensors = 3, total_size = 4.7G
Shard (3/13):  13%|█▎        | 6.34G/49.1G [00:08<00:55, 766Mbyte/s]
Writing:  18%|█▊        | 105G/595G [01:15<11:12, 727Mbyte/s]

Regarding casting bf16 -> f16 for the block scales, i added a quick print(scale) and ran it with --no-lazy and at a glance they seemed to be very small numbers less than 1.0. I didn't check them all nor add any checks to see if they exceed +- 65k which could clip possibly.

Have to go for now to play DND, will check later. If this finishes I'll try to generate imatrix and see how the numbers look. Thanks for all the help!

csabakecskemeti · 2025-11-06T23:52:23Z

I'm also running the Q4 hack...

Will report back once it's done

ngxson · 2025-11-06T23:54:41Z

btw @ubergarm I've just pushed a small fix to the repack branch: ngxson@505f8be

what I worry is that the packed layout of compressed-tensors could be reversed to ggml, but we never know until we actually run the model. if that's the case, we will need something like the transform_nibble_layout used by GPT-OSS

a fun story: I wrote the code to repack GPT-OSS to GGML's MXFP4, just a 2 days before its release. repacking nibble layout was a real pain

bartowski1182 · 2025-11-06T23:58:34Z

I'm trying with your latest changes now

ubergarm · 2025-11-07T00:03:32Z

Aye, it generates a roughly correct size looking output gguf, but got errors trying to start it up:

edit to be clear I was using xsn/convert_kimi_k2_quant_repack@caf0e4230:

srv    load_model: loading model '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf'
gguf_init_from_file_impl: tensor 'blk.1.ffn_gate_exps.weight' has offset 4165955584, expected 13678637056
gguf_init_from_file_impl: failed to read tensor data
llama_model_load: error loading model: llama_model_loader: failed to load model from /mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf', try reducing --n-gpu-layers if you're running out of VRAM
srv    load_model: failed to load model, '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

For funzies I tried to start it on ik's fork too with errors there too:

llama_model_load: error loading model: tensor 'blk.5.ffn_down_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf'
main : failed to init

Good run though! My impression is there is really one main quant of this, q8_0 attn/shexp/first dense layer and q4_0 all routed experts. Maybe one could shrink the non routed experts a little bit, but historically they were best left q8_0 imo.

So I hope DevQuasar and bartowski have better luck with the more recent PR! Gotta run tho 🫶

csabakecskemeti · 2025-11-07T00:04:44Z

Similar as mentioned above (with the Q4 hack):

gguf_init_from_file_impl: tensor 'blk.1.ffn_gate_exps.weight' has offset 3699564544, expected 13212246016
gguf_init_from_file_impl: failed to read tensor data
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/kecso/8t_nvme/moonshotai.Kimi-K2-Thinking-GGUF/Q4/moonshotai.Kimi-K2-Thinking.Q4_0-00001-of-00045.gguf
llama_model_load_from_file_impl: failed to load model
main: error: unable to load model

bartowski1182 · 2025-11-07T01:04:18Z

Conversion succeeded, but when loaded it doesn't give coherent responses, just endlessly repeats tokens

csabakecskemeti · 2025-11-07T01:09:25Z

@bartowski1182 you haven't had memory ballooning issue? Or you've just have enough memory?

bartowski1182 · 2025-11-07T01:13:22Z

I've got 768gb

csabakecskemeti · 2025-11-07T01:16:53Z

Interesting...
Which branch you've tried xsn/convert_kimi_k2_quant or the q4 hack: xsn/convert_kimi_k2_quant_repack?

bartowski1182 · 2025-11-07T01:20:26Z

tried both, xsn/convert_kimi_k2_quant initially OOMed, then just didn't work at all after some changes

xsn/convert_kimi_k2_quant_repack works exceptionally slowly and then doesn't produce good output

csabakecskemeti · 2025-11-07T01:21:59Z

latest conv from the repack generated a 12.9GB GGUF for me.. no clue why

SHOOT - I've reverted the config.json so the quantization conf wasn't commented out

retry

csabakecskemeti · 2025-11-07T01:43:19Z

Same as @bartowski1182 repack produce gibberish

compilade · 2025-11-07T03:29:06Z

@ngxson I've implemented my suggestion from #17064 (comment) in (with broadcasting shifts instead of assignment in sub-ranges), as well as a few other sub-formats of the compressed-tensors quant method in a separate PR:

convert : handle compressed-tensors quant method #17069

I didn't test on Kimi-K2-Thinking, but I did test it on a (hopefully) similarly-packed model. If anyone wants to try it, you're welcome to do so.

Note that I didn't implement repacking (only requantization), and so --outtype applies (and that unfortunately means no direct Q4_0 (because #10008 (comment), although maybe that could be revisited)).

compilade · 2025-11-07T04:27:55Z

what I worry is that the packed layout of compressed-tensors could be reversed to ggml

That seems to be the case, unfortunately. In Q4_0, the first 16 values of a block of 32 are in the low nibbles, while the next 16 are in the high nibbles.

In pack-quantized tensors from the compressed-tensors quant method, the values are in contiguous nibbles (the first 8 are in the first int32 value of a .weight_packed tensor, starting from the least significant part with the first nibble, and so on).

So the nibble layout does need to be transformed.

DocShotgun · 2025-11-07T05:34:17Z

I'd worry that we lose some of the benefit of the 4bit QAT of kimi-k2 thinking if we dequant to bf16 and then requant. From what I'm reading, it's not directly compatible to convert the blockwise INT4 of kimi-k2 thinking to a GGUF-compatible format for inference?

CHNtentes · 2025-11-07T05:51:41Z

I'd worry that we lose some of the benefit of the 4bit QAT of kimi-k2 thinking if we dequant to bf16 and then requant. From what I'm reading, it's not directly compatible to convert the blockwise INT4 of kimi-k2 thinking to a GGUF-compatible format for inference?

Is there a way to know the block size of GGUF's K-quants? For GPTQ / AWQ etc. , we can find it in config json. If it's possible to do size 32 quant in llama.cpp, maybe we can directly use original INT4 weights and scales?

ngxson · 2025-11-07T10:11:00Z

Closing this in favor of #17069

ngxson · 2025-11-07T10:13:16Z

Is there a way to know the block size of GGUF's K-quants? For GPTQ / AWQ etc. , we can find it in config json. If it's possible to do size 32 quant in llama.cpp, maybe we can directly use original INT4 weights and scales?

The block size table is in gguf-py/gguf/constants.py, the first column of the table is block size (element count) and second column is block size in bytes:

QK_K = 256
GGML_QUANT_SIZES: dict[GGMLQuantizationType, tuple[int, int]] = {
    GGMLQuantizationType.F32:     (1, 4),
    GGMLQuantizationType.F16:     (1, 2),
    GGMLQuantizationType.Q4_0:    (32, 2 + 16),
    GGMLQuantizationType.Q4_1:    (32, 2 + 2 + 16),
    GGMLQuantizationType.Q5_0:    (32, 2 + 4 + 16),
    GGMLQuantizationType.Q5_1:    (32, 2 + 2 + 4 + 16),
    GGMLQuantizationType.Q8_0:    (32, 2 + 32),
    GGMLQuantizationType.Q8_1:    (32, 4 + 4 + 32),
    GGMLQuantizationType.Q2_K:    (256, 2 + 2 + QK_K // 16 + QK_K // 4),
    GGMLQuantizationType.Q3_K:    (256, 2 + QK_K // 4 + QK_K // 8 + 12),
    GGMLQuantizationType.Q4_K:    (256, 2 + 2 + QK_K // 2 + 12),
    GGMLQuantizationType.Q5_K:    (256, 2 + 2 + QK_K // 2 + QK_K // 8 + 12),
    GGMLQuantizationType.Q6_K:    (256, 2 + QK_K // 2 + QK_K // 4 + QK_K // 16),
    GGMLQuantizationType.Q8_K:    (256, 4 + QK_K + QK_K // 8),
    GGMLQuantizationType.IQ2_XXS: (256, 2 + QK_K // 4),
    GGMLQuantizationType.IQ2_XS:  (256, 2 + QK_K // 4 + QK_K // 32),
    GGMLQuantizationType.IQ3_XXS: (256, 2 + QK_K // 4 + QK_K // 8),
    GGMLQuantizationType.IQ1_S:   (256, 2 + QK_K // 8 + QK_K // 16),
    GGMLQuantizationType.IQ4_NL:  (32, 2 + 16),
    GGMLQuantizationType.IQ3_S:   (256, 2 + QK_K // 4 + QK_K // 8 + QK_K // 32 + 4),
    GGMLQuantizationType.IQ2_S:   (256, 2 + QK_K // 4 + QK_K // 16),
    GGMLQuantizationType.IQ4_XS:  (256, 2 + 2 + QK_K // 2 + QK_K // 64),
    GGMLQuantizationType.I8:      (1, 1),
    GGMLQuantizationType.I16:     (1, 2),
    GGMLQuantizationType.I32:     (1, 4),
    GGMLQuantizationType.I64:     (1, 8),
    GGMLQuantizationType.F64:     (1, 8),
    GGMLQuantizationType.IQ1_M:   (256, QK_K // 8 + QK_K // 16  + QK_K // 32),
    GGMLQuantizationType.BF16:    (1, 2),
    GGMLQuantizationType.TQ1_0:   (256, 2 + 4 * 13),
    GGMLQuantizationType.TQ2_0:   (256, 2 + 64),
    GGMLQuantizationType.MXFP4:   (32, 1 + 16),
}

jukofyork · 2025-11-07T12:02:06Z

Closing this in favor of #17069

Does that PR directly convert the INT4 values to Q4_0 or is it doing a round trip to BF16 and then Q4_0?

jukofyork · 2025-11-07T12:14:51Z

If it doesn't, then looking at the source, it looks like if there is no imatrix that this is what eventually gets called:

// reference implementation for deterministic creation of model files
void quantize_row_q4_0_ref(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k) {
    static const int qk = QK4_0;

    assert(k % qk == 0);

    const int nb = k / qk;

    for (int i = 0; i < nb; i++) {
        float amax = 0.0f; // absolute max
        float max  = 0.0f;

        for (int j = 0; j < qk; j++) {
            const float v = x[i*qk + j];
            if (amax < fabsf(v)) {
                amax = fabsf(v);
                max  = v;
            }
        }

        const float d  = max / -8;
        const float id = d ? 1.0f/d : 0.0f;

        y[i].d = GGML_FP32_TO_FP16(d);

        for (int j = 0; j < qk/2; ++j) {
            const float x0 = x[i*qk + 0    + j]*id;
            const float x1 = x[i*qk + qk/2 + j]*id;

            const uint8_t xi0 = MIN(15, (int8_t)(x0 + 8.5f));
            const uint8_t xi1 = MIN(15, (int8_t)(x1 + 8.5f));

            y[i].qs[j]  = xi0;
            y[i].qs[j] |= xi1 << 4;
        }
    }
}

Can we prove this is going to convert back to the original values when we do a round trip via BF16, for example:

Imagine we have a block of 32 INT4 values that don't contain both 0b0000 and 0b1111.

What will happen in this case? I have a feeling the n=16 lattice will get superimposed on top of a n<16 lattice and not work for all non-power-of-2 n lattices?

csabakecskemeti · 2025-11-07T14:54:03Z

I think I've made it work with alternative way.
I've build a conversion utility inspired by the Deepseek V3 dequantizer
int4-to-bf16

Both Q3 and Q2 GGUF seems working:

Experimental quants uploading (please allow some more time for the upload) here:
DevQuasar/moonshotai.Kimi-K2-Thinking-GGUF

Feel free to test the quants and the converter

jukofyork · 2025-11-07T15:42:31Z

I think I've made it work with alternative way. I've build a conversion utility inspired by the Deepseek V3 dequantizer int4-to-bf16

Both Q3 and Q2 GGUF seems working:

Experimental quants uploading (please allow some more time for the upload) here: DevQuasar/moonshotai.Kimi-K2-Thinking-GGUF

Feel free to test the quants and the converter

This will work, but not losslessly for the same reason as the other PR.

The fundamental problem here is that the QAT-trained blocks of 32 nibbles might not take up the full range of values. If the original block has a range of 0b0001 to 0x1111 then the Q4_0 code will try to create a lattice of 16 values 0b0000 to 0x1111.

It's easier to see if you look at 2bit version, eg:

If the original quant only had the half-nibbles 0b00 (0), 0b01 (1) and 0b10 (2) then the quantize_row_q4_0_ref can't recover this and you will end up with this sort of thing:

OLD: 0        1        2
NEW: 0     1     2     3

ngxson · 2025-11-07T16:05:28Z

The fundamental problem here is that the QAT-trained blocks of 32 nibbles might not take up the full range of values

Is there any reasons it cannot take the full range? IIUC from the original compressed-tensor dequant code, it should take up the full int4 range

ngxson · 2025-11-07T16:11:27Z

hmm nevermind, I think I understand what you're saying now. Did you mean that it's possible that the training code prevents using 0b0000 (and not the quant/dequant code)? If that's the case then yes, it's possible that int4 can jump to another value on q4_0 even if the q4_0 somehow support BF16

jukofyork · 2025-11-07T17:24:11Z

hmm nevermind, I think I understand what you're saying now. Did you mean that it's possible that the training code prevents using 0b0000 (and not the quant/dequant code)? If that's the case then yes, it's possible that int4 can jump to another value on q4_0 even if the q4_0 somehow support BF16

Yeah, if we were just to use something like this to first quantise:

// reference implementation for deterministic creation of model files
void quantize_row_q4_0_ref(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k) {
    static const int qk = QK4_0;

    assert(k % qk == 0);

    const int nb = k / qk;

    for (int i = 0; i < nb; i++) {
        float amax = 0.0f; // absolute max
        float max  = 0.0f;

        for (int j = 0; j < qk; j++) {
            const float v = x[i*qk + j];
            if (amax < fabsf(v)) {
                amax = fabsf(v);
                max  = v;
            }
        }

        const float d  = max / -8;
        const float id = d ? 1.0f/d : 0.0f;

        y[i].d = GGML_FP32_TO_FP16(d);

        for (int j = 0; j < qk/2; ++j) {
            const float x0 = x[i*qk + 0    + j]*id;
            const float x1 = x[i*qk + qk/2 + j]*id;

            const uint8_t xi0 = MIN(15, (int8_t)(x0 + 8.5f));
            const uint8_t xi1 = MIN(15, (int8_t)(x1 + 8.5f));

            y[i].qs[j]  = xi0;
            y[i].qs[j] |= xi1 << 4;
        }
    }
}

turn these back into BF16 or F32 and rerun the same code, we wouldn't lose anything as this line:

        const float d  = max / -8;

is implicitly assuming that there will be a lower value of 0b0000 and an upper value of 0b1111.

But because the QAT was trained, it's quite likely that not every block of 32 will necessarily maintain a lower value of 0b0000 and an upper value of 0b1111 for the nibbles.

It's not something you can change after the QAT training (which likely used some form of regularisation term on the intervals and/or stochastic rounding), so the only way to maintain the full range for all blocks would be to adjust it during training (which would probably be really hard/awkward to do for "Adam-like" optimisers with extra the "memory" parameters, as you would have to keep adjusting these too).

If it can be shown that somehow they have done this, and like the output of the quantize_row_q4_0_ref it is guaranteed for every 32-element block the lowest nibble will be 0b0000 and the largest nibble will be 0b1111, then it wouldn't be a problem and quantize_row_q4_0_ref could (almost) losslessly recover an equivalent set of values (assuming the BF16 --> F16 scales doesn't overflow, which seems unlikely).

The maximum relative error from converting from BF16 --> F16 will be something really tiny and is related to the way they represent sub-normals (if it wasn't for this then it would be essentially lossless as my bit shifting example erroneously showed in the other thread).

jukofyork · 2025-11-07T17:28:04Z

If it can be shown that somehow they have done this, and like the output of the quantize_row_q4_0_ref it is guaranteed for every 32-element block the lowest nibble will be 0b0000 and the largest nibble will be 0b1111, then it wouldn't be a problem and quantize_row_q4_0_ref could (almost) losslessly recover an equivalent set of values (assuming the BF16 --> F16 scales doesn't overflow, which seems unlikely).

It's definitely worth testing if this is the case as it saves a lot of hassle if it is like this! I'm about another day away from getting the model at 4MB/s sadly 😦

convert: add dequant function for compressed_tensor (kimi-k2-thinking)

1bd57a3

github-actions bot added the python python script changes label Nov 6, 2025

rm redundant code

ab0b550

DajanaV mentioned this pull request Nov 6, 2025

UPSTREAM PR #17064: convert: add dequant function for compressed_tensor (kimi-k2-thinking) auroralabs-loci/llama.cpp#108

Open

fix lazy loading

ed7b7c7

This comment was marked as outdated.

Sign in to view

fix device error

489a7b8

compilade reviewed Nov 6, 2025

View reviewed changes

ngxson added 2 commits November 7, 2025 00:32

DEMO repack

caf0e42

Revert "DEMO repack"

f46686b

This reverts commit caf0e42.

compilade mentioned this pull request Nov 7, 2025

convert : handle compressed-tensors quant method #17069

Open

7 tasks

ngxson closed this Nov 7, 2025

convert: add dequant function for compressed_tensor (kimi-k2-thinking) #17064

convert: add dequant function for compressed_tensor (kimi-k2-thinking) #17064

Conversation

ngxson commented Nov 6, 2025

Uh oh!

csabakecskemeti commented Nov 6, 2025

Uh oh!

ngxson commented Nov 6, 2025

Uh oh!

csabakecskemeti commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Nov 6, 2025

Uh oh!

ubergarm commented Nov 6, 2025

Uh oh!

This comment was marked as outdated.

ngxson commented Nov 6, 2025

Uh oh!

ubergarm commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Nov 6, 2025

Uh oh!

compilade Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

csabakecskemeti commented Nov 6, 2025

Uh oh!

compilade Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ubergarm commented Nov 6, 2025

Uh oh!

csabakecskemeti commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Nov 6, 2025

Uh oh!

ubergarm commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csabakecskemeti commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Nov 7, 2025

Uh oh!

csabakecskemeti commented Nov 7, 2025

Uh oh!

bartowski1182 commented Nov 7, 2025

Uh oh!

csabakecskemeti commented Nov 7, 2025

Uh oh!

bartowski1182 commented Nov 7, 2025

Uh oh!

csabakecskemeti commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csabakecskemeti commented Nov 7, 2025

Uh oh!

compilade commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

compilade commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DocShotgun commented Nov 7, 2025

Uh oh!

CHNtentes commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

csabakecskemeti commented Nov 6, 2025 •

edited

Loading

ubergarm commented Nov 6, 2025 •

edited

Loading

compilade Nov 6, 2025 •

edited

Loading

ngxson commented Nov 6, 2025 •

edited

Loading

csabakecskemeti commented Nov 6, 2025 •

edited

Loading

ngxson commented Nov 6, 2025 •

edited

Loading

ubergarm commented Nov 7, 2025 •

edited

Loading

csabakecskemeti commented Nov 7, 2025 •

edited

Loading

csabakecskemeti commented Nov 7, 2025 •

edited

Loading

compilade commented Nov 7, 2025 •

edited

Loading

compilade commented Nov 7, 2025 •

edited

Loading

CHNtentes commented Nov 7, 2025 •

edited

Loading

ngxson commented Nov 7, 2025 •

edited

Loading

jukofyork commented Nov 7, 2025 •

edited

Loading

csabakecskemeti commented Nov 7, 2025 •

edited

Loading

jukofyork commented Nov 7, 2025 •

edited

Loading

ngxson commented Nov 7, 2025 •

edited

Loading