Skip to content

Conversation

@JohannesGaessler
Copy link
Collaborator

@JohannesGaessler JohannesGaessler commented Nov 7, 2025

See #16988 (comment) .

The logic for preventing misaligned pointers on strides not divisible by the size of the data type is stricter than necessary, for a single column we only need to check the stride of dimension 0 the strides in question are for src0 rather than src1, strictly speaking we would need to be checking that tensor too. @am17an can you share the model and command line you had used to test this?

@JohannesGaessler
Copy link
Collaborator Author

Sorry, I just realized that I misdiagnosed the problem, I'll push another version.

@am17an
Copy link
Collaborator

am17an commented Nov 7, 2025

I tested this on your server llama-bench -m /opt/models/LFM2-8B-A1B-F16.gguf

@JohannesGaessler JohannesGaessler force-pushed the cuda-fix-should_use_mmvf branch from 9779b58 to 2bd5465 Compare November 7, 2025 16:54
if (src0_ne[0] % 2 != 0) {
return false;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

return false;
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@JohannesGaessler
Copy link
Collaborator Author

Okay, so the problem should be fixed now. We were checking that the stride for dimension 0 is divisible by the sizes of e.g. half2 but that is incorrect, the correct check for dimension 0 is nb[0] == ggml_element_size(src0). For MMVF my initial idea was wrong because I had confused which tensor would have the bad stride.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 7, 2025
@JohannesGaessler JohannesGaessler merged commit 6515610 into ggml-org:master Nov 7, 2025
62 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants