-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support #16900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
d5192bf to
d2f8f00
Compare
AMD Radeon Pro VII
AMD Radeon RX 6800 XT
Intel A770
RTX 3090
|
jeffbolznv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only did a quick read through. I'll do some perf testing soon.
|
As usual, I appear to have caused an llvmpipe issue. I'll look into it. |
|
Some initial perf results: I reran some of the models with the biggest deltas. Most seem to be noise, except the improvement for gpt-oss MXFP4 is real: |
b153aac to
1b78909
Compare
The funny thing about that is that I didn't even enable the MMVQ path for Nvidia Turing+ on MXFP4. Not sure what is going on there. I still have some tuning to do here, my Strix Halo device isn't liking this PR yet. |
Add k-quant mul_mat_vec support, and enable MUL_MAT_ID integer dot vector path.
Tuning this is quite difficult. I've included an attempt, but I'm not done. I'll add performance numbers later.
Q3_K and Q6_K currently don't work well at all, I'm still trying to figure out why.