Skip to content

Conversation

@bartowski1182
Copy link
Contributor

Adds support for upcoming AfmoeForCausalLM

Tokenizer is public ahead of model launch to avoid breaking conversion code

Make sure to read the contributing guidelines before submitting a PR

@bartowski1182 bartowski1182 marked this pull request as ready for review November 13, 2025 17:36
Comment on lines +14 to +15
inpL = ggml_scale(ctx0, inpL, sqrtf(float(n_embd)));
cb(inpL, "inp_embd_scaled", -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not very important to fix right now, but if the model supports multimodal in the future, you may need to skip scaling if the input is non-text:

// important: do not normalize weights for raw embeddings input (i.e. encoded image emdeddings)
if (ubatch.token) {
inpL = ggml_scale(ctx0, inpL, sqrtf(n_embd));
cb(inpL, "inp_scaled", -1);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, good to know thanks !

@CISC
Copy link
Collaborator

CISC commented Nov 14, 2025

@bartowski1182 @ggerganov Merging in a little while unless you have anything more to add.

@CISC CISC merged commit e1fcf8b into ggml-org:master Nov 14, 2025
76 checks passed
basnijholt pushed a commit to basnijholt/llama.cpp that referenced this pull request Nov 16, 2025
* Add AFMOE model support

* Update to vocab

* Add model sizing

* Undo Rope change for ARCEE model

* Address review comments

* Update modeling code is_sliding -> use_rope, replace hard-coded logic

* Fix AFMOE tokenizer

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update AFMoE tokenizer class identification to be more unique

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants