Skip to content

Conversation

@topolarity
Copy link

More conservative version of #21 (thanks to @el-oso for the inspiration).

This makes the package (nearly) --trim compatible and also provides a nice preference for users who do not wish for HostCPUFeatures.jl to ever cause an "invalidation storm", even if it means running with the wrong CPU info.

If the CPU info measured at build (precompile) time does not match runtime, users will see a warning:

┌ Warning: Runtime invalidation was disabled, but the CPU info is out-of-date.
│ Will continue with incorrect CPU name (from build time).
└ @ HostCPUFeatures ~/repos/HostCPUFeatures.jl/src/HostCPUFeatures.jl:62

This makes the package (nearly) `--trim` compatible and also provides a
nice preference for users who do not wish for HostCPUFeatures.jl to ever
cause an "invalidation storm", even if it means running with the wrong
CPU info.
fast_half() = False()

@noinline function setfeaturefalse(s)
@inline function setfeaturefalse(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the inlining switch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to improve inferrability - has_feature(Val(s)) is only inferrable if the literal value of s is available in the function, which is true in all of the callers (the argument is always a literal Symbol)

@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

❌ Patch coverage is 7.14286% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.50%. Comparing base (8ebc3d2) to head (723446a).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
src/cpu_info.jl 0.00% 9 Missing ⚠️
src/cpu_info_aarch64.jl 0.00% 7 Missing ⚠️
src/cpu_info_x86.jl 0.00% 6 Missing ⚠️
src/HostCPUFeatures.jl 33.33% 4 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (8ebc3d2) and HEAD (723446a). Click for more details.

HEAD has 5 uploads less than BASE
Flag BASE (8ebc3d2) HEAD (723446a)
15 10
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #22       +/-   ##
===========================================
- Coverage   36.12%   18.50%   -17.63%     
===========================================
  Files           6        6               
  Lines         191      200        +9     
===========================================
- Hits           69       37       -32     
- Misses        122      163       +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@debug "Defining $(has ? "presence" : "absense") of feature $feature."
set_feature(feature, has)
else
@warn "Runtime invalidation was disabled, but the CPU info is out-of-date.\nWill continue with incorrect CPU feature flag: $ext."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside of this change, I expect that feature_string not always return same set of features, that will make certain features stuck in wrong state just because LLVM does not report them for that CPU

For the reference, that's the info I've collected from real and virtual machines with

@ccall jl_get_cpu_name()::String
@ccall jl_get_cpu_features()::String
* "broadwell"
  "+prfchw,+avx,+aes,+sahf,+pclmul,+crc32,+sse4.1,+xsave,+sse4.2,+invpcid,+64bit,+cmov,+movbe,+rtm,+adx,+avx2,+bmi,+sse,+xsaveopt,+rdrnd,+cx8,+sse3,+fsgsbase,+lzcnt,+ssse3,+cx16,+bmi2,+fma,+popcnt,+f16c,+mmx,+sse2,+rdseed,+fxsr,-cldemote,-xop,-xsaves,-avx512fp16,-usermsr,-sm4,-avx512ifma,-avx512pf,-tsxldtrk,-ptwrite,-widekl,-sm3,-xsavec,-avx10.1-512,-avx512vpopcntdq,-avx512vp2intersect,-avx512cd,-avxvnniint8,-avx512er,-amx-int8,-kl,-avx10.1-256,-sha512,-avxvnni,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-amx-tile,-gfni,-avxvnniint16,-amx-fp16,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,-avx512bw,-pku,-clzero,-mwaitx,-lwp,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,-avxifma,-avx512bitalg,-rdpru,-clwb,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,-avx512dq,-sse4a"
* "cascadelake"
  "+prfchw,+avx,+aes,+sahf,+pclmul,+crc32,+xsaves,+sse4.1,+xsave,+sse4.2,+invpcid,+64bit,+xsavec,+cmov,+avx512cd,+movbe,+evex512,+adx,+avx2,+avx512vl,+clflushopt,+bmi,+sse,+xsaveopt,+rdrnd,+avx512f,+avx512vnni,+cx8,+avx512bw,+sse3,+pku,+fsgsbase,+lzcnt,+ssse3,+cx16,+bmi2,+fma,+popcnt,+f16c,+clwb,+mmx,+sse2,+rdseed,+fxsr,+avx512dq,-cldemote,-xop,-avx512fp16,-usermsr,-sm4,-avx512ifma,-avx512pf,-tsxldtrk,-ptwrite,-widekl,-sm3,-avx10.1-512,-avx512vpopcntdq,-avx512vp2intersect,-avxvnniint8,-avx512er,-amx-int8,-kl,-avx10.1-256,-avxvnni,-rtm,-hreset,-movdiri,-serialize,-sha512,-vpclmulqdq,-uintr,-raoint,-cmpccxadd,-amx-tile,-gfni,-avxvnniint16,-amx-fp16,-amx-bf16,-avx512bf16,-clzero,-mwaitx,-lwp,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,-avxifma,-avx512bitalg,-rdpru,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,-sse4a"
* "skylake-avx512"
  "+cx16,+sahf,+crc32,+prfchw,+bmi2,+fsgsbase,+popcnt,+aes,+xsaves,+clwb,+avx512f,+xsavec,+pku,+mmx,+rdseed,+avx512bw,+clflushopt,+xsave,+64bit,+avx512vl,+invpcid,+avx512cd,+avx,+cx8,+fma,+bmi,+rdrnd,+sse4.1,+sse4.2,+avx2,+fxsr,+sse,+lzcnt,+pclmul,+f16c,+ssse3,+cmov,+movbe,+xsaveopt,+avx512dq,+sse2,+adx,+sse3,-avx512pf,-tsxldtrk,-tbm,-avx512ifma,-sha,-fma4,-vpclmulqdq,-cldemote,-avx512bf16,-amx-tile,-raoint,-uintr,-gfni,-ptwrite,-avx512bitalg,-movdiri,-widekl,-avx512er,-avxvnni,-avx512fp16,-avx512vnni,-amx-bf16,-avxvnniint8,-avx512vpopcntdq,-pconfig,-cmpccxadd,-clzero,-amx-fp16,-lwp,-rdpid,-xop,-waitpkg,-prefetchi,-kl,-movdir64b,-sse4a,-avxneconvert,-avx512vbmi2,-serialize,-hreset,-vaes,-amx-int8,-rtm,-enqcmd,-mwaitx,-wbnoinvd,-rdpru,-avxifma,-sgx,-prefetchwt1,-avx512vbmi,-shstk,-avx512vp2intersect"
* "skylake-avx512"
  "+prfchw,+avx,+aes,+sahf,+pclmul,+crc32,+sse4.1,+xsave,+sse4.2,+64bit,+cmov,+movbe,+sse,+rdrnd,+cx8,+sse3,+fsgsbase,+ssse3,+cx16,+fma,+popcnt,+f16c,+mmx,+sse2,+fxsr,-cldemote,-xop,-xsaves,-avx512fp16,-usermsr,-sm4,-avx512ifma,-avx512pf,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,-xsavec,-avx10.1-512,-avx512vpopcntdq,-avx512vp2intersect,-avx512cd,-avxvnniint8,-avx512er,-amx-int8,-kl,-avx10.1-256,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,-gfni,-avxvnniint16,-amx-fp16,-xsaveopt,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,-avx512bw,-pku,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,-bmi2,-avxifma,-avx512bitalg,-rdpru,-clwb,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,-avx512dq,-sse4a"
* "icelake-server"
  "+prfchw,+avx,+aes,+sahf,+pclmul,+crc32,+sse4.1,+xsave,+sse4.2,+64bit,+cmov,+movbe,+sse,+rdrnd,+cx8,+sse3,+fsgsbase,+ssse3,+cx16,+fma,+popcnt,+f16c,+mmx,+sse2,+fxsr,-cldemote,-xop,-xsaves,-avx512fp16,-usermsr,-sm4,-avx512ifma,-avx512pf,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,-xsavec,-avx10.1-512,-avx512vpopcntdq,-avx512vp2intersect,-avx512cd,-avxvnniint8,-avx512er,-amx-int8,-kl,-avx10.1-256,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,-gfni,-avxvnniint16,-amx-fp16,-xsaveopt,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,-avx512bw,-pku,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,-bmi2,-avxifma,-avx512bitalg,-rdpru,-clwb,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,-avx512dq,-sse4a"  
* "sandybridge"
  "+avx,+aes,+sahf,+pclmul,+crc32,+sse4.1,+xsave,+sse4.2,+64bit,+cmov,+sse,+xsaveopt,+cx8,+sse3,+ssse3,+cx16,+popcnt,+mmx,+sse2,+fxsr,-prfchw,-cldemote,-xop,-xsaves,-avx512fp16,-usermsr,-sm4,-avx512ifma,-avx512pf,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,-xsavec,-avx10.1-512,-avx512vpopcntdq,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-avx10.1-256,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,-gfni,-avxvnniint16,-amx-fp16,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,-avx512bw,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,-bmi2,-fma,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,-avx512dq,-sse4a"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. We probably need to tighten that up.

I'm curious to have @gbaraldi 's input on this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a huge mess. some VMs hide features for example

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess to be clear, the mess I mean is as a whole. I think this is mergeable without too many issues but I wish there was something nicer that LLVM exposed, but there isn't currently

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset_extra_features!()
end
const BASELINE_CPU_NAME = get_cpu_name()
const allow_eval = @load_preference("allow_runtime_invalidation", false)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to check that --trim is enabled in JLOptions ?

everyone who will use it with trim will have to find out this culprit themself

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thinking is that this preference is usable by more than just JuliaC.jl, since some users may wish to be opt-in to error / warn on invalidation storms from this package. JuliaC.jl can set this preference automatically, so the end-user experience is the same.

Also technically checking JLOptions at pre-compilation time won't detect --trim properly, but something like JuliaLang/JuliaC.jl#31 would work

@tz-lom
Copy link

tz-lom commented Nov 20, 2025

As a source of the problem I understand the approach, but it actually downgrades package HostCPUFeatures to MaybeSomeOtherHostCPUFeatures, while what we really need is to use some CompileTargetCPUFeatures in LoopVectorization etc.

Pragmatically it is "easier" upgrade standard interface of HostCPUFeatures to return features for the compile target and provide API to retrieve real Host features if that what user really wants (like printing statistics on which CPU we are running now)

@topolarity
Copy link
Author

As a source of the problem I understand the approach, but it actually downgrades package HostCPUFeatures to MaybeSomeOtherHostCPUFeatures, while what we really need is to use some CompileTargetCPUFeatures in LoopVectorization etc.

That's already true if "native" is missing from your JULIA_CPU_TARGET string though - I agree that it might be a good idea to require that we are specifically operating in the non-native case though (where we already under-approximate feature flags)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants