Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Nov 6, 2025

Need help for testing this

Model: https://huggingface.co/moonshotai/Kimi-K2-Thinking

@github-actions github-actions bot added the python python script changes label Nov 6, 2025
@csabakecskemeti
Copy link
Contributor

@ngxson still downloading the model but will test and report back!

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 6, 2025

The output GGUF quantized to Q8_0 will be over 1 terabyte. Now I'm doubt is I even have enough memory to test it.

@csabakecskemeti
Copy link
Contributor

csabakecskemeti commented Nov 6, 2025

over how much? :) I have ~1.1TB + 64G vram

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 6, 2025

python convert_hf_to_gguf.py --outfile model.gguf --outtype q8_0 .

Output GGUF will be 1.09T

@ubergarm
Copy link

ubergarm commented Nov 6, 2025

Exciting, thanks for looking into this one y'all!

Well, started off strong, but then died RuntimeError: Tensor on device cpu is not on the expected device meta!.

I'm on a CPU-only rig with 1.5TB RAM and plenty of disk space. But no GPUs. I have installed triton-cpu instead of triton in my python venv for what it is worth.

Also had to manualy push Y to accept in lieu of trust_remote_code=True.

👈 Details Command and full Logs
$ numactl -N 1 -m 1 \
python \
    convert_hf_to_gguf.py \
    --outtype bf16 \
    --split-max-size 50G \
    --outfile /mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF \
    /mnt/data/models/moonshotai/Kimi-K2-Thinking/

INFO:hf-to-gguf:Loading model: Kimi-K2-Thinking
WARNING:hf-to-gguf:Failed to load model config from /mnt/data/models/moonshotai/Kimi-K2-Thinking: The repository /mnt/data/models/moonshotai/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/data/models/moonshotai/Kimi-K2-Thinking .
 You can inspect the repository content at https://hf.co//mnt/data/models/moonshotai/Kimi-K2-Thinking.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: DeepseekV3ForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /mnt/data/models/moonshotai/Kimi-K2-Thinking: The repository /mnt/data/models/moonshotai/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/data/models/moonshotai/Kimi-K2-Thinking .
 You can inspect the repository content at https://hf.co//mnt/data/models/moonshotai/Kimi-K2-Thinking.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00003-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00004-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00005-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00006-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00007-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00008-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00009-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00010-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00011-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00012-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00013-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00014-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00015-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00016-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00017-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00018-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00019-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00020-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00021-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00022-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00023-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00024-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00025-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00026-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00027-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00028-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00029-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00030-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00031-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00032-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00033-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00034-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00035-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00036-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00037-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00038-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00039-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00040-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00041-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00042-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00043-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00044-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00045-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00046-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00047-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00048-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00049-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00050-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00051-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00052-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00053-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00054-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00055-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00056-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00057-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00058-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00059-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00060-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00061-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00062-of-000062.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.bfloat16 --> BF16, shape = {18432, 7168}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,        torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_up.weight,          torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.0.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.0.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.0.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.0.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.0.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.0.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.1.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.1.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.1.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.1.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.1.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.1.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.1.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.1.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.1.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.1.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.1.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.1.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.2.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.2.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.2.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.2.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.2.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.2.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.2.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.2.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.2.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.2.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.2.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.2.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.2.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.2.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.3.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.3.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.3.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.3.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.3.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.3.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.3.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.3.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.3.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.3.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.3.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.3.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.3.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.3.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.4.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.4.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.4.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.4.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.4.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.4.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.4.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.4.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.4.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.4.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.4.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.4.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.4.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.4.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.5.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.5.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.5.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.5.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.5.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.5.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.5.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.5.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.5.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.5.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.5.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.5.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.5.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.6.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.6.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.6.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.6.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.6.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.6.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.6.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.6.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.6.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.6.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.6.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.6.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.6.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.6.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.7.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.7.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.7.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.7.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.7.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.7.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.7.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.7.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.7.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.7.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.7.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.7.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.7.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.7.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.8.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.8.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.8.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.8.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.8.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.8.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.8.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.8.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.8.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.9.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.9.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.9.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.9.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.9.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.9.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.9.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.9.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.9.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.9.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.9.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.9.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.9.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.9.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.10.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.10.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.10.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.10.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.10.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.10.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.10.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.10.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.10.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.10.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.10.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.10.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.10.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.10.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.11.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.11.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.11.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.11.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.11.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.11.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.11.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.11.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.11.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.11.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.11.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.11.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.11.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.12.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.12.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.12.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.12.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.12.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.12.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.12.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.12.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.12.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.12.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.12.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.12.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.12.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.12.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.13.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.13.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.13.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.13.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.13.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.13.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.13.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.13.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.13.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.13.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.13.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.13.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.13.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.13.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.14.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.14.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.14.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.14.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.14.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.14.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.14.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.14.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.14.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.14.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.14.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.14.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.14.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.14.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.15.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.15.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.15.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.15.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.15.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.15.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.15.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.15.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.15.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.15.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.15.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.15.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.15.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.15.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.16.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.16.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.16.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.16.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.16.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.16.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.16.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.16.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.16.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.16.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.16.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.16.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.16.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.16.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.17.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.17.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.17.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.17.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.17.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.17.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.17.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.17.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.17.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.17.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.17.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.17.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.18.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.18.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.18.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.18.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.18.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.18.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.18.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.18.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.18.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.18.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.18.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.18.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.18.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.18.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.19.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.19.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.19.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.19.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.19.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.19.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.19.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.19.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.19.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.19.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.19.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.19.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.19.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.19.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.20.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.20.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.20.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.20.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.20.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.20.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.20.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.20.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.20.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.20.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.20.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.20.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.20.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.20.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.21.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.21.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.21.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.21.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.21.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.21.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.21.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.21.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.21.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.21.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.21.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.21.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.21.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.21.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.22.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.22.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.22.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.22.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.22.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.22.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.22.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.22.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.22.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.22.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.22.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.22.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.22.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.22.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.23.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.23.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.23.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.23.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.23.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.23.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.23.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.23.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.23.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.23.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.23.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.23.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.23.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.24.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.24.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.24.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.24.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.24.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.24.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.24.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.24.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.24.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.24.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.24.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.24.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.24.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.24.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.25.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.25.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.25.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.25.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.25.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.25.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.25.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.25.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.25.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.25.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.25.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.25.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.25.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.25.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.26.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.26.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.26.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.26.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.26.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.26.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.26.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.26.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.26.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.26.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.26.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.26.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.26.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.26.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.27.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.27.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.27.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.27.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.27.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.27.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.27.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.27.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.27.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.27.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.27.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.27.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.27.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.27.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.28.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.28.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.28.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.28.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.28.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.28.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.28.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.28.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.28.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.28.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.28.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.28.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.28.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.28.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.29.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.29.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.29.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.29.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.29.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.29.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.29.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.29.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.29.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.29.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.29.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.29.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.29.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.30.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.30.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.30.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.30.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.30.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.30.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.30.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.30.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.30.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.30.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.30.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.30.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.30.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.30.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.31.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.31.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.31.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.31.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.31.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.31.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.31.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.31.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.31.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.31.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.31.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.31.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.31.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.31.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.32.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.32.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.32.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.32.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.32.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.32.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.32.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.32.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.32.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.32.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.32.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.32.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.32.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.32.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.32.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.33.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.33.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.33.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.33.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.33.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.33.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.33.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.33.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.33.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.33.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.33.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.33.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.33.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.33.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.33.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.34.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.34.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.34.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.34.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.34.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.34.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.34.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.34.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.34.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.34.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.34.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.34.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.34.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.34.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.34.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.35.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.35.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.35.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.35.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.35.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.35.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.35.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.35.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.35.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.35.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.35.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.35.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.35.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.35.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.36.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.36.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.36.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.36.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.36.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.36.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.36.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.36.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.36.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.36.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.36.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.36.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.36.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.36.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.36.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.37.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.37.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.37.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.37.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.37.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.37.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.37.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.37.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.37.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.37.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.37.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.37.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.37.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.37.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.37.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.38.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.38.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.38.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.38.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.38.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.38.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.38.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.38.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.38.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.38.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.38.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.38.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.38.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.38.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.38.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.39.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.39.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.39.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.39.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.39.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.39.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.39.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.39.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.39.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.39.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.39.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.39.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.39.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.39.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.39.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.40.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.40.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.40.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.40.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.40.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.40.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.40.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.40.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.40.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.40.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.40.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.40.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.40.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.40.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.40.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.41.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.41.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.41.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.41.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.41.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.41.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.41.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.41.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.41.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.41.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.41.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.41.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.41.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.41.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.42.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.42.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.42.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.42.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.42.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.42.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.42.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.42.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.42.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.42.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.42.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.42.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.42.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.42.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.42.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.43.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.43.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.43.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.43.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.43.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.43.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.43.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.43.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.43.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.43.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.43.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.43.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.43.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.43.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.43.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.44.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.44.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.44.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.44.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.44.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.44.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.44.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.44.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.44.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.44.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.44.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.44.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.44.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.44.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.44.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.45.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.45.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.45.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.45.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.45.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.45.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.45.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.45.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.45.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.45.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.45.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.45.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.45.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.45.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.45.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.46.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.46.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.46.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.46.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.46.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.46.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.46.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.46.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.46.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.46.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.46.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.46.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.46.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.46.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.46.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.47.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.47.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.47.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.47.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.47.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.47.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.47.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.47.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.47.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.47.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.47.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.47.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.47.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.47.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.47.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.48.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.48.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.48.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.48.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.48.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.48.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.48.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.48.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.48.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.48.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.48.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.48.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.48.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.48.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.48.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.49.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.49.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.49.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.49.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.49.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.49.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.49.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.49.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.49.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.49.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.49.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.49.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.49.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.49.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.49.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.50.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.50.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.50.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.50.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.50.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.50.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.50.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.50.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.50.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.50.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.50.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.50.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.50.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.50.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.50.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.51.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.51.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.51.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.51.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.51.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.51.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.51.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.51.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.51.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.51.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.51.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.51.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.51.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.51.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.51.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.52.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.52.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.52.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.52.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.52.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.52.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.52.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.52.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.52.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.52.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.52.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.52.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.52.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.52.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.52.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.53.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.53.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.53.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.53.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.53.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.53.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.53.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.53.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.53.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.53.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.53.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.53.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.53.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.53.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.53.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.54.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.54.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.54.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.54.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.54.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.54.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.54.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.54.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.54.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.54.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.54.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.54.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.54.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.54.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.54.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.55.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.55.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.55.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.55.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.55.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.55.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.55.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.55.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.55.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.55.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.55.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.55.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.55.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.55.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.55.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.56.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.56.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.56.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.56.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.56.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.56.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.56.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.56.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.56.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.56.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.56.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.56.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.56.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.56.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.56.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.57.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.57.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.57.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.57.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.57.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.57.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.57.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.57.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.57.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.57.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.57.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.57.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.57.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.57.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.57.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.58.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.58.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.58.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.58.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.58.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.58.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.58.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.58.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.58.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.58.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.58.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.58.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.58.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.58.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.58.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.59.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.59.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.59.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.59.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.59.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.59.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.59.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.59.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.59.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.59.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.59.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.59.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.59.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.59.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.59.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.60.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.60.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.60.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.60.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.60.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.60.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.60.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.60.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.60.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.60.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.60.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.60.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.60.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.60.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.60.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:output.weight,                torch.bfloat16 --> BF16, shape = {7168, 163840}
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> BF16, shape = {7168, 163840}
INFO:hf-to-gguf:output_norm.weight,           torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.1.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.1.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.2.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.4.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.4.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.5.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.5.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.5.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.6.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.6.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.6.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.7.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.7.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.7.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.9.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.9.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.9.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.10.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.10.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.10.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.11.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.11.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.11.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.12.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.12.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.12.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.13.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.13.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.13.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.14.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.14.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.14.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.15.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.15.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.15.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.16.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.16.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.16.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.17.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.17.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.17.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.18.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.18.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.18.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.19.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.19.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.19.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.20.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.20.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.20.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.21.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.21.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.21.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.22.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.22.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.22.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.23.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.23.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.23.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.24.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.24.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.24.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.25.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.25.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.25.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.26.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.26.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.26.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.27.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.27.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.27.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.28.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.28.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.28.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.29.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.29.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.29.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.30.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.30.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.30.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.31.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.31.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.31.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.32.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.32.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.32.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.33.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.33.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.33.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.34.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.34.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.34.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.35.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.35.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.35.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.36.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.36.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.36.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.37.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.37.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.37.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.38.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.38.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.38.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.39.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.39.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.39.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.40.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.40.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.40.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.41.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.41.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.41.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.42.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.42.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.42.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.43.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.43.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.43.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.44.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.44.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.44.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.45.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.45.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.45.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.46.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.46.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.46.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.47.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.47.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.47.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.48.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.48.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.48.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.49.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.49.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.49.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.50.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.50.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.50.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.51.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.51.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.51.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.52.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.52.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.52.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.53.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.53.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.53.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.54.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.54.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.54.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.55.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.55.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.55.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.56.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.56.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.56.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.57.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.57.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.57.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.58.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.58.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.58.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.59.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.59.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.59.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.60.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.60.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.60.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 262144
INFO:hf-to-gguf:gguf: embedding length = 7168
INFO:hf-to-gguf:gguf: feed forward length = 18432
INFO:hf-to-gguf:gguf: head count = 64
INFO:hf-to-gguf:gguf: key-value head count = 1
INFO:hf-to-gguf:gguf: rope theta = 50000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 8
INFO:hf-to-gguf:gguf: expert groups count = 1
INFO:hf-to-gguf:gguf: expert groups used count = 1
INFO:hf-to-gguf:gguf: file type = 32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
The repository /mnt/data/models/moonshotai/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/data/models/moonshotai/Kimi-K2-Thinking .
 You can inspect the repository content at https://hf.co//mnt/data/models/moonshotai/Kimi-K2-Thinking.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:Reloaded tiktoken model from /mnt/data/models/moonshotai/Kimi-K2-Thinking/tiktoken.model
INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:Reloaded tiktoken model from /mnt/data/models/moonshotai/Kimi-K2-Thinking/tiktoken.model
INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
INFO:gguf.vocab:Setting special token type bos to 163584
INFO:gguf.vocab:Setting special token type eos to 163586
INFO:gguf.vocab:Setting special token type pad to 163839
INFO:gguf.vocab:Setting chat_template to {%- macro render_content(msg) -%}
    {%- set c = msg.get('content') -%}
    {%- if c is string -%}
      {{ c }}
    {%- elif c is not none -%}
      {% for content in c -%}
        {% if content['type'] == 'image' or 'image' in content or 'image_url' in content -%}
          <|media_start|>image<|media_content|><|media_pad|><|media_end|>
        {% else -%}
          {{ content['text'] }}
        {%- endif -%}
      {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

{% macro set_roles(message) -%}
  {%- set role_name =  message.get('name') or  message['role'] -%}
  {%- if message['role'] == 'user' -%}
    <|im_user|>{{role_name}}<|im_middle|>
  {%- elif message['role'] == 'assistant' -%}
    <|im_assistant|>{{role_name}}<|im_middle|>
  {%- else -%}
    <|im_system|>{{role_name}}<|im_middle|>
  {%- endif -%}
{%- endmacro -%}


{%- macro render_toolcalls(message) -%}
  <|tool_calls_section_begin|>
  {%- for tool_call in message['tool_calls'] -%}
    {%- set formatted_id = tool_call['id'] -%}
    <|tool_call_begin|>{{ formatted_id }}<|tool_call_argument_begin|>{% if tool_call['function']['arguments'] is string %}{{ tool_call['function']['arguments'] }}{% else %}{{ tool_call['function']['arguments'] | tojson }}{% endif %}<|tool_call_end|>
  {%- endfor -%}
  <|tool_calls_section_end|>
{%- endmacro -%}


{# Find last non-tool-call assisitant message #}
{%- set ns = namespace(last_non_tool_call_assistant_msg=-1) -%}
{%- for idx in range(messages|length-1, -1, -1) -%}
    {%- if messages[idx]['role'] == 'assistant' and not messages[idx].get('tool_calls') -%}
        {%- set ns.last_non_tool_call_assistant_msg = idx -%}
        {%- break -%}
    {%- endif -%}
{%- endfor -%}

{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}
{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}

{%- if tools -%}
  <|im_system|>tool_declare<|im_middle|>{{ tools | tojson(separators=(',', ':')) }}<|im_end|>
{%- endif -%}

{%- for message in hist_msgs -%}
  {%- if loop.first and messages[0]['role'] != 'system' -%}
  <|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|>
  {%- endif -%}
  {{set_roles(message)}}
  {%- if message['role'] == 'assistant' -%}
    <think></think>{{render_content(message)}}
    {%- if message.get('tool_calls') -%}
      {{render_toolcalls(message)}}
    {%- endif -%}
  {%- elif message['role'] == 'tool' -%}
    {%- set tool_call_id = message.tool_call_id -%}
    ## Return of {{ tool_call_id }}
{{render_content(message)}}
  {%- elif message['content'] is not none -%}
    {{render_content(message)}}
  {%- endif -%}
  <|im_end|>
{%- endfor -%}

{%- for message in suffix_msgs -%}
  {{set_roles(message)}}
  {%- if message['role'] == 'assistant' -%}
    {%- set rc = message.get('reasoning_content', '') -%}
    <think>{{rc}}</think>{{render_content(message)}}
    {%- if message.get('tool_calls') -%}
     {{render_toolcalls(message)}}
    {%- endif -%}
  {%- elif message['role'] == 'tool' -%}
    {%- set tool_call_id = message.tool_call_id -%}
    ## Return of {{ tool_call_id }}
{{render_content(message)}}
  {%- elif message['content'] is not none -%}
    {{render_content(message)}}
  {%- endif -%}
  <|im_end|>
{%- endfor -%}


{%- if add_generation_prompt -%}
  <|im_assistant|>assistant<|im_middle|>
{%- endif -%}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00001-of-00046.gguf: n_tensors = 918, total_size = 46.3G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00002-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00003-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00004-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00005-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00006-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00007-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00008-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00009-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00010-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00011-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00012-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00013-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00014-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00015-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00016-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00017-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00018-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00019-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00020-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00021-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00022-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00023-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00024-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00025-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00026-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00027-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00028-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00029-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00030-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00031-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00032-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00033-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00034-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00035-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00036-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00037-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00038-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00039-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00040-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00041-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00042-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00043-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00044-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00045-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00046-of-00046.gguf: n_tensors = 2, total_size = 22.5G

Shard (0/46): 0.00byte [00:00, ?byte/s]

Writing:   1%|          | 21.4G/2.05T [00:30<45:06, 751Mbyte/s]�[A
Shard (1/46):  51%|█████▏    | 23.8G/46.3G [00:33<00:32, 684Mbyte/s]

Writing:   1%|          | 23.8G/2.05T [00:33<49:27, 684Mbyte/s]�[ATraceback (most recent call last):
  File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 10314, in <module>
    main()
  File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 10308, in main
    model_instance.write()
  File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 634, in write
    self.gguf_writer.write_tensors_to_file(progress=True)
  File "/home/w/projects/llama.cpp/gguf-py/gguf/gguf_writer.py", line 456, in write_tensors_to_file
    ti.tensor.tofile(fout)
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 220, in tofile
    eager = LazyNumpyTensor.to_eager(self)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 179, in to_eager
    return cls._recurse_apply(t, simple_to_eager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 170, in simple_to_eager
    _t._data = _t._func(*_t._args, **_t._kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/wrappers.py", line 309, in _fn
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/wrappers.py", line 149, in _fn
    result = fn(**bound.arguments)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_refs/__init__.py", line 1139, in _ref
    output = prim(a, b)
             ^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_refs/__init__.py", line 1746, in mul
    return prims.mul(a, b)
           ^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_ops.py", line 841, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_library/fake_impl.py", line 109, in meta_kernel
    return fake_impl_holder.kernel(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_library/utils.py", line 22, in __call__
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/library.py", line 1430, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_library/custom_ops.py", line 627, in fake_impl
    return self._abstract_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims/__init__.py", line 404, in _prim_elementwise_meta
    utils.check_same_device(*args_, allow_cpu_scalar_tensors=True)
  File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/__init__.py", line 878, in check_same_device
    raise RuntimeError(msg)
RuntimeError: Tensor on device cpu is not on the expected device meta!

Shard (1/46):  51%|█████▏    | 23.8G/46.3G [00:36<00:34, 655Mbyte/s]

Writing:   1%|          | 23.8G/2.05T [00:36<51:36, 655Mbyte/s]

@ngxson

This comment was marked as outdated.

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 6, 2025

Last commit should fix the error. I successfully converted the first layer of the model to GGUF.

@ubergarm
Copy link

ubergarm commented Nov 6, 2025

Huh, not sure how I got so far the first time. This time it ballooned RAM and oom-killer got me even running across both NUMA nodes for full 1.5TB and going with q8_0 output instead of bf16...

kimi-k2-thinking-convert-fun-lmao-oomkiller

I don't need to pass anything to enable lazy psure right?

So seems like it goes through all the non routed experts first pretty quickly and lowish RAM, but then it slows down once it hits those routed experts and memory usage monotonicly increases at that point:

👈 Partial Logs with comment
INFO:hf-to-gguf:blk.60.attn_kv_a_mqa.weight,  torch.bfloat16 --> Q8_0, shape = {7168, 576}
INFO:hf-to-gguf:blk.60.attn_k_b.weight,       torch.bfloat16 --> Q8_0, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.60.attn_v_b.weight,       torch.bfloat16 --> Q8_0, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.60.attn_output.weight,    torch.bfloat16 --> Q8_0, shape = {8192, 7168}
INFO:hf-to-gguf:blk.60.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.60.attn_q_a.weight,       torch.bfloat16 --> Q8_0, shape = {7168, 1536}
INFO:hf-to-gguf:blk.60.attn_q_b.weight,       torch.bfloat16 --> Q8_0, shape = {1536, 12288}
INFO:hf-to-gguf:output.weight,                torch.bfloat16 --> Q8_0, shape = {7168, 163840}
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> Q8_0, shape = {7168, 163840}
INFO:hf-to-gguf:output_norm.weight,           torch.bfloat16 --> F32, shape = {7168}
# runs smooth before here, but then it really slows down here and RAM usage keeps going up
INFO:hf-to-gguf:blk.1.ffn_down_exps.weight,   torch.float32 --> Q8_0, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.1.ffn_gate_exps.weight,   torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.1.ffn_up_exps.weight,     torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_down_exps.weight,   torch.float32 --> Q8_0, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.2.ffn_gate_exps.weight,   torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_up_exps.weight,     torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight,   torch.float32 --> Q8_0, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight,   torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight,     torch.float32 --> Q8_0, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight,   torch.float32 --> Q8_0, shape = {2048, 7168, 384}

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 6, 2025

yes lazy should be enabled by default

I'm trying another way: directly mapping the quantization to Q4_0. the only disadvantage is that this will downcast the scale bf16 to f16

".scales",
)
]
elif quant_method == "compressed-tensors":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to check for quant_config["format"] == "pack-quantized" near here instead of in dequant_compressed_tensors, because the compressed-tensors method has multiple formats which could technically be supported eventually (notably, float-quantized seems relatively similar to (but not quite like) the fp8 method).

@csabakecskemeti
Copy link
Contributor

Q8 same as for @ubergarm memory balooned
Screenshot From 2025-11-06 15-15-25

else:
unpacked = unpacked.to(weight.device) # is this needed?
for i in range(pack_factor):
unpacked[:, i::pack_factor] = (weight >> (num_bits * i)) & mask
Copy link
Collaborator

@compilade compilade Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy tensors don't handle __setitem__ correctly, I think (or it causes eager evaluation). That's because the function returns None and so the change tree can't really be updated with how it's currently implemented.

Prefer explicit concatenation instead if possible (like with torch.cat, torch.stack, etc.). (this should help with memory usage)

Alternatively, there are other ways to unpack without concatenation, like the broadcasting shifts done in gguf-py/gguf/quants.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I need to go offline in next few minutes. Feel free to push directly to this branch if you have any suggestions!

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 6, 2025

Made a hack for repacking int4 to Q4_0, I pushed it in another branch: https://github.com/ngxson/llama.cpp/tree/xsn/convert_kimi_k2_quant_repack

IMPORTANT: This requires deleting the "quantization_config" section in config.json; You can also rename it:

image

@ubergarm
Copy link

ubergarm commented Nov 6, 2025

Running xsn/convert_kimi_k2_quant_repack now after editing the config.json as you mention. Seems to be going well! Memory usage is staying low so i put it back on a single NUMA node.

The output splits are missing the name maybe, which it was on this PR branch too psure:

INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf: n_tensors = 99, total_size = 49.9G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00002-of-00013.gguf: n_tensors = 95, total_size = 49.2G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00003-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00004-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00005-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00006-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00007-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00008-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00009-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00010-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00011-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00012-of-00013.gguf: n_tensors = 89, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00013-of-00013.gguf: n_tensors = 3, total_size = 4.7G
Shard (3/13):  13%|█▎        | 6.34G/49.1G [00:08<00:55, 766Mbyte/s]
Writing:  18%|█▊        | 105G/595G [01:15<11:12, 727Mbyte/s]

Regarding casting bf16 -> f16 for the block scales, i added a quick print(scale) and ran it with --no-lazy and at a glance they seemed to be very small numbers less than 1.0. I didn't check them all nor add any checks to see if they exceed +- 65k which could clip possibly.

Have to go for now to play DND, will check later. If this finishes I'll try to generate imatrix and see how the numbers look. Thanks for all the help!

@csabakecskemeti
Copy link
Contributor

csabakecskemeti commented Nov 6, 2025

I'm also running the Q4 hack...
Screenshot From 2025-11-06 15-50-51

Will report back once it's done

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 6, 2025

btw @ubergarm I've just pushed a small fix to the repack branch: ngxson@505f8be

what I worry is that the packed layout of compressed-tensors could be reversed to ggml, but we never know until we actually run the model. if that's the case, we will need something like the transform_nibble_layout used by GPT-OSS

a fun story: I wrote the code to repack GPT-OSS to GGML's MXFP4, just a 2 days before its release. repacking nibble layout was a real pain

@bartowski1182
Copy link
Contributor

I'm trying with your latest changes now

@ubergarm
Copy link

ubergarm commented Nov 7, 2025

Aye, it generates a roughly correct size looking output gguf, but got errors trying to start it up:

edit to be clear I was using xsn/convert_kimi_k2_quant_repack@caf0e4230:

srv    load_model: loading model '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf'
gguf_init_from_file_impl: tensor 'blk.1.ffn_gate_exps.weight' has offset 4165955584, expected 13678637056
gguf_init_from_file_impl: failed to read tensor data
llama_model_load: error loading model: llama_model_loader: failed to load model from /mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf', try reducing --n-gpu-layers if you're running out of VRAM
srv    load_model: failed to load model, '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

For funzies I tried to start it on ik's fork too with errors there too:

llama_model_load: error loading model: tensor 'blk.5.ffn_down_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf'
main : failed to init

Good run though! My impression is there is really one main quant of this, q8_0 attn/shexp/first dense layer and q4_0 all routed experts. Maybe one could shrink the non routed experts a little bit, but historically they were best left q8_0 imo.

So I hope DevQuasar and bartowski have better luck with the more recent PR! Gotta run tho 🫶

@csabakecskemeti
Copy link
Contributor

csabakecskemeti commented Nov 7, 2025

Similar as mentioned above (with the Q4 hack):

gguf_init_from_file_impl: tensor 'blk.1.ffn_gate_exps.weight' has offset 3699564544, expected 13212246016
gguf_init_from_file_impl: failed to read tensor data
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/kecso/8t_nvme/moonshotai.Kimi-K2-Thinking-GGUF/Q4/moonshotai.Kimi-K2-Thinking.Q4_0-00001-of-00045.gguf
llama_model_load_from_file_impl: failed to load model
main: error: unable to load model

@bartowski1182
Copy link
Contributor

Conversion succeeded, but when loaded it doesn't give coherent responses, just endlessly repeats tokens

@csabakecskemeti
Copy link
Contributor

@bartowski1182 you haven't had memory ballooning issue? Or you've just have enough memory?

@bartowski1182
Copy link
Contributor

I've got 768gb

@csabakecskemeti
Copy link
Contributor

Interesting...
Which branch you've tried xsn/convert_kimi_k2_quant or the q4 hack: xsn/convert_kimi_k2_quant_repack?

@bartowski1182
Copy link
Contributor

tried both, xsn/convert_kimi_k2_quant initially OOMed, then just didn't work at all after some changes

xsn/convert_kimi_k2_quant_repack works exceptionally slowly and then doesn't produce good output

@csabakecskemeti
Copy link
Contributor

csabakecskemeti commented Nov 7, 2025

latest conv from the repack generated a 12.9GB GGUF for me.. no clue why

SHOOT - I've reverted the config.json so the quantization conf wasn't commented out

retry

@csabakecskemeti
Copy link
Contributor

Same as @bartowski1182 repack produce gibberish
Screenshot From 2025-11-06 17-42-31

@compilade
Copy link
Collaborator

compilade commented Nov 7, 2025

@ngxson I've implemented my suggestion from #17064 (comment) in (with broadcasting shifts instead of assignment in sub-ranges), as well as a few other sub-formats of the compressed-tensors quant method in a separate PR:

I didn't test on Kimi-K2-Thinking, but I did test it on a (hopefully) similarly-packed model. If anyone wants to try it, you're welcome to do so.

Note that I didn't implement repacking (only requantization), and so --outtype applies (and that unfortunately means no direct Q4_0 (because #10008 (comment), although maybe that could be revisited)).

@compilade
Copy link
Collaborator

compilade commented Nov 7, 2025

what I worry is that the packed layout of compressed-tensors could be reversed to ggml

That seems to be the case, unfortunately. In Q4_0, the first 16 values of a block of 32 are in the low nibbles, while the next 16 are in the high nibbles.

In pack-quantized tensors from the compressed-tensors quant method, the values are in contiguous nibbles (the first 8 are in the first int32 value of a .weight_packed tensor, starting from the least significant part with the first nibble, and so on).

So the nibble layout does need to be transformed.

@DocShotgun
Copy link
Contributor

I'd worry that we lose some of the benefit of the 4bit QAT of kimi-k2 thinking if we dequant to bf16 and then requant. From what I'm reading, it's not directly compatible to convert the blockwise INT4 of kimi-k2 thinking to a GGUF-compatible format for inference?

@CHNtentes
Copy link

CHNtentes commented Nov 7, 2025

I'd worry that we lose some of the benefit of the 4bit QAT of kimi-k2 thinking if we dequant to bf16 and then requant. From what I'm reading, it's not directly compatible to convert the blockwise INT4 of kimi-k2 thinking to a GGUF-compatible format for inference?

Is there a way to know the block size of GGUF's K-quants? For GPTQ / AWQ etc. , we can find it in config json. If it's possible to do size 32 quant in llama.cpp, maybe we can directly use original INT4 weights and scales?

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 7, 2025

Closing this in favor of #17069

@ngxson ngxson closed this Nov 7, 2025
@ngxson
Copy link
Collaborator Author

ngxson commented Nov 7, 2025

Is there a way to know the block size of GGUF's K-quants? For GPTQ / AWQ etc. , we can find it in config json. If it's possible to do size 32 quant in llama.cpp, maybe we can directly use original INT4 weights and scales?

The block size table is in gguf-py/gguf/constants.py, the first column of the table is block size (element count) and second column is block size in bytes:

QK_K = 256
GGML_QUANT_SIZES: dict[GGMLQuantizationType, tuple[int, int]] = {
    GGMLQuantizationType.F32:     (1, 4),
    GGMLQuantizationType.F16:     (1, 2),
    GGMLQuantizationType.Q4_0:    (32, 2 + 16),
    GGMLQuantizationType.Q4_1:    (32, 2 + 2 + 16),
    GGMLQuantizationType.Q5_0:    (32, 2 + 4 + 16),
    GGMLQuantizationType.Q5_1:    (32, 2 + 2 + 4 + 16),
    GGMLQuantizationType.Q8_0:    (32, 2 + 32),
    GGMLQuantizationType.Q8_1:    (32, 4 + 4 + 32),
    GGMLQuantizationType.Q2_K:    (256, 2 + 2 + QK_K // 16 + QK_K // 4),
    GGMLQuantizationType.Q3_K:    (256, 2 + QK_K // 4 + QK_K // 8 + 12),
    GGMLQuantizationType.Q4_K:    (256, 2 + 2 + QK_K // 2 + 12),
    GGMLQuantizationType.Q5_K:    (256, 2 + 2 + QK_K // 2 + QK_K // 8 + 12),
    GGMLQuantizationType.Q6_K:    (256, 2 + QK_K // 2 + QK_K // 4 + QK_K // 16),
    GGMLQuantizationType.Q8_K:    (256, 4 + QK_K + QK_K // 8),
    GGMLQuantizationType.IQ2_XXS: (256, 2 + QK_K // 4),
    GGMLQuantizationType.IQ2_XS:  (256, 2 + QK_K // 4 + QK_K // 32),
    GGMLQuantizationType.IQ3_XXS: (256, 2 + QK_K // 4 + QK_K // 8),
    GGMLQuantizationType.IQ1_S:   (256, 2 + QK_K // 8 + QK_K // 16),
    GGMLQuantizationType.IQ4_NL:  (32, 2 + 16),
    GGMLQuantizationType.IQ3_S:   (256, 2 + QK_K // 4 + QK_K // 8 + QK_K // 32 + 4),
    GGMLQuantizationType.IQ2_S:   (256, 2 + QK_K // 4 + QK_K // 16),
    GGMLQuantizationType.IQ4_XS:  (256, 2 + 2 + QK_K // 2 + QK_K // 64),
    GGMLQuantizationType.I8:      (1, 1),
    GGMLQuantizationType.I16:     (1, 2),
    GGMLQuantizationType.I32:     (1, 4),
    GGMLQuantizationType.I64:     (1, 8),
    GGMLQuantizationType.F64:     (1, 8),
    GGMLQuantizationType.IQ1_M:   (256, QK_K // 8 + QK_K // 16  + QK_K // 32),
    GGMLQuantizationType.BF16:    (1, 2),
    GGMLQuantizationType.TQ1_0:   (256, 2 + 4 * 13),
    GGMLQuantizationType.TQ2_0:   (256, 2 + 64),
    GGMLQuantizationType.MXFP4:   (32, 1 + 16),
}

@jukofyork
Copy link
Collaborator

Closing this in favor of #17069

Does that PR directly convert the INT4 values to Q4_0 or is it doing a round trip to BF16 and then Q4_0?

@jukofyork
Copy link
Collaborator

jukofyork commented Nov 7, 2025

If it doesn't, then looking at the source, it looks like if there is no imatrix that this is what eventually gets called:

// reference implementation for deterministic creation of model files
void quantize_row_q4_0_ref(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k) {
    static const int qk = QK4_0;

    assert(k % qk == 0);

    const int nb = k / qk;

    for (int i = 0; i < nb; i++) {
        float amax = 0.0f; // absolute max
        float max  = 0.0f;

        for (int j = 0; j < qk; j++) {
            const float v = x[i*qk + j];
            if (amax < fabsf(v)) {
                amax = fabsf(v);
                max  = v;
            }
        }

        const float d  = max / -8;
        const float id = d ? 1.0f/d : 0.0f;

        y[i].d = GGML_FP32_TO_FP16(d);

        for (int j = 0; j < qk/2; ++j) {
            const float x0 = x[i*qk + 0    + j]*id;
            const float x1 = x[i*qk + qk/2 + j]*id;

            const uint8_t xi0 = MIN(15, (int8_t)(x0 + 8.5f));
            const uint8_t xi1 = MIN(15, (int8_t)(x1 + 8.5f));

            y[i].qs[j]  = xi0;
            y[i].qs[j] |= xi1 << 4;
        }
    }
}

Can we prove this is going to convert back to the original values when we do a round trip via BF16, for example:

  • Imagine we have a block of 32 INT4 values that don't contain both 0b0000 and 0b1111.

What will happen in this case? I have a feeling the n=16 lattice will get superimposed on top of a n<16 lattice and not work for all non-power-of-2 n lattices?

@csabakecskemeti
Copy link
Contributor

csabakecskemeti commented Nov 7, 2025

I think I've made it work with alternative way.
I've build a conversion utility inspired by the Deepseek V3 dequantizer
int4-to-bf16

Both Q3 and Q2 GGUF seems working:
kimi-think-proof

Experimental quants uploading (please allow some more time for the upload) here:
DevQuasar/moonshotai.Kimi-K2-Thinking-GGUF

Feel free to test the quants and the converter

@jukofyork
Copy link
Collaborator

jukofyork commented Nov 7, 2025

I think I've made it work with alternative way. I've build a conversion utility inspired by the Deepseek V3 dequantizer int4-to-bf16

Both Q3 and Q2 GGUF seems working: Screenshot From 2025-11-07 06-48-30

Experimental quants uploading (please allow some more time for the upload) here: DevQuasar/moonshotai.Kimi-K2-Thinking-GGUF

Feel free to test the quants and the converter

This will work, but not losslessly for the same reason as the other PR.

The fundamental problem here is that the QAT-trained blocks of 32 nibbles might not take up the full range of values. If the original block has a range of 0b0001 to 0x1111 then the Q4_0 code will try to create a lattice of 16 values 0b0000 to 0x1111.

It's easier to see if you look at 2bit version, eg:

If the original quant only had the half-nibbles 0b00 (0), 0b01 (1) and 0b10 (2) then the quantize_row_q4_0_ref can't recover this and you will end up with this sort of thing:

OLD: 0        1        2
NEW: 0     1     2     3

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 7, 2025

The fundamental problem here is that the QAT-trained blocks of 32 nibbles might not take up the full range of values

Is there any reasons it cannot take the full range? IIUC from the original compressed-tensor dequant code, it should take up the full int4 range

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 7, 2025

hmm nevermind, I think I understand what you're saying now. Did you mean that it's possible that the training code prevents using 0b0000 (and not the quant/dequant code)? If that's the case then yes, it's possible that int4 can jump to another value on q4_0 even if the q4_0 somehow support BF16

@jukofyork
Copy link
Collaborator

hmm nevermind, I think I understand what you're saying now. Did you mean that it's possible that the training code prevents using 0b0000 (and not the quant/dequant code)? If that's the case then yes, it's possible that int4 can jump to another value on q4_0 even if the q4_0 somehow support BF16

Yeah, if we were just to use something like this to first quantise:

// reference implementation for deterministic creation of model files
void quantize_row_q4_0_ref(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k) {
    static const int qk = QK4_0;

    assert(k % qk == 0);

    const int nb = k / qk;

    for (int i = 0; i < nb; i++) {
        float amax = 0.0f; // absolute max
        float max  = 0.0f;

        for (int j = 0; j < qk; j++) {
            const float v = x[i*qk + j];
            if (amax < fabsf(v)) {
                amax = fabsf(v);
                max  = v;
            }
        }

        const float d  = max / -8;
        const float id = d ? 1.0f/d : 0.0f;

        y[i].d = GGML_FP32_TO_FP16(d);

        for (int j = 0; j < qk/2; ++j) {
            const float x0 = x[i*qk + 0    + j]*id;
            const float x1 = x[i*qk + qk/2 + j]*id;

            const uint8_t xi0 = MIN(15, (int8_t)(x0 + 8.5f));
            const uint8_t xi1 = MIN(15, (int8_t)(x1 + 8.5f));

            y[i].qs[j]  = xi0;
            y[i].qs[j] |= xi1 << 4;
        }
    }
}

turn these back into BF16 or F32 and rerun the same code, we wouldn't lose anything as this line:

        const float d  = max / -8;

is implicitly assuming that there will be a lower value of 0b0000 and an upper value of 0b1111.


But because the QAT was trained, it's quite likely that not every block of 32 will necessarily maintain a lower value of 0b0000 and an upper value of 0b1111 for the nibbles.

It's not something you can change after the QAT training (which likely used some form of regularisation term on the intervals and/or stochastic rounding), so the only way to maintain the full range for all blocks would be to adjust it during training (which would probably be really hard/awkward to do for "Adam-like" optimisers with extra the "memory" parameters, as you would have to keep adjusting these too).

If it can be shown that somehow they have done this, and like the output of the quantize_row_q4_0_ref it is guaranteed for every 32-element block the lowest nibble will be 0b0000 and the largest nibble will be 0b1111, then it wouldn't be a problem and quantize_row_q4_0_ref could (almost) losslessly recover an equivalent set of values (assuming the BF16 --> F16 scales doesn't overflow, which seems unlikely).

The maximum relative error from converting from BF16 --> F16 will be something really tiny and is related to the way they represent sub-normals (if it wasn't for this then it would be essentially lossless as my bit shifting example erroneously showed in the other thread).

@jukofyork
Copy link
Collaborator

If it can be shown that somehow they have done this, and like the output of the quantize_row_q4_0_ref it is guaranteed for every 32-element block the lowest nibble will be 0b0000 and the largest nibble will be 0b1111, then it wouldn't be a problem and quantize_row_q4_0_ref could (almost) losslessly recover an equivalent set of values (assuming the BF16 --> F16 scales doesn't overflow, which seems unlikely).

It's definitely worth testing if this is the case as it saves a lot of hassle if it is like this! I'm about another day away from getting the model at 4MB/s sadly 😦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants