-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[build][cmake]: Bundle ACL dynlibs and torch libgomp for CPU extension builds #28059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
cc: @fadara01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces changes to bundle Arm Compute Library (ACL) dynamic libraries for CPU extension builds on AArch64 and correctly configures the build to use PyTorch's libgomp. The approach is sound and the new logic for handling dynamic libraries in cmake/utils.cmake is a good, generic addition. However, I've identified two high-severity issues in the implementation. One is a potential security risk in the INSTALL_RPATH setting, and the other is a build robustness issue with file(GLOB) that could lead to incomplete installations. Addressing these will make the build process more secure and reliable.
cmake/utils.cmake
Outdated
| if(UNIX AND NOT APPLE) | ||
| set_target_properties(${GPU_MOD_NAME} PROPERTIES | ||
| BUILD_RPATH "\$ORIGIN;\$ORIGIN/.libs" | ||
| INSTALL_RPATH "\$ORIGIN;\$ORIGIN/.libs;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The INSTALL_RPATH has a trailing semicolon. This will result in an empty path component in the RPATH of the installed library (e.g., ...:$ORIGIN:$ORIGIN/.libs:). On some systems, an empty RPATH component is interpreted as the current working directory (.), which can be a security risk due to potential SO/DLL hijacking. It's best to remove the trailing semicolon to avoid this ambiguity and potential vulnerability.
INSTALL_RPATH "\$ORIGIN;\$ORIGIN/.libs"
| foreach(_dyn_lib_path ${_dyn_install_paths}) | ||
| get_filename_component(_dir "${_dyn_lib_path}" DIRECTORY) | ||
| get_filename_component(_base "${_dyn_lib_path}" NAME) | ||
| file(GLOB _selected LIST_DIRECTORIES FALSE "${_dir}/${_base}*") | ||
| list(REMOVE_DUPLICATES _selected) | ||
| install(FILES "${_selected}" | ||
| DESTINATION "${GPU_DESTINATION}/.libs" | ||
| COMPONENT ${GPU_MOD_NAME}) | ||
| endforeach() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using file(GLOB) at configure time to find build artifacts for installation can lead to an incomplete package. If the scons build (which generates these libraries) changes its output files (e.g., due to a version bump), the list of files to be installed will become stale. A developer running make install without re-running cmake would then install an incorrect set of files.
A more robust approach is to perform the file discovery at install time. This can be achieved by generating a script with file(GENERATE ...) and executing it at install time with install(SCRIPT ...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # Add ACL libraries to be linked as dynamic libraries | ||
| find_library(ACL_COMPUTE arm_compute PATHS "${ACL_LIB_DIR}" REQUIRED NO_DEFAULT_PATH) | ||
| find_library(ACL_COMPUTE_GRAPH arm_compute_graph PATHS "${ACL_LIB_DIR}" REQUIRED NO_DEFAULT_PATH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize ACL_LIB_DIR when ACL_ROOT_DIR is provided
The new find_library calls expect ACL_LIB_DIR to point to the directory that contains libarm_compute*.so. However, ACL_LIB_DIR is only assigned inside the branch where ACL is fetched from GitHub. When a caller provides ACL_ROOT_DIR via the environment (the documented override supported before this commit), the variable stays empty and the REQUIRED find_library invocations fail because PATHS "" with NO_DEFAULT_PATH searches nothing. This breaks builds that rely on a pre-downloaded or prebuilt ACL tree. Make sure ACL_LIB_DIR is derived from $ENV{ACL_ROOT_DIR} in both code paths before resolving the libraries.
Useful? React with 👍 / 👎.
| # Build oneDNN for GEMM kernels (only for x86-AVX512 /ARM platforms) | ||
| if ((AVX512_FOUND AND NOT AVX512_DISABLED) OR (ASIMD_FOUND AND NOT APPLE_SILICON_FOUND) OR POWER9_FOUND OR POWER10_FOUND OR POWER11_FOUND) | ||
| # Set number of parallel build processes | ||
| include(ProcessorCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's put these changes under ASIMD_FOUND as x86 might not use libgomp in pytorch (i.e. they might be using libiomp or something else).
| endmacro() | ||
|
|
||
| # | ||
| # Extract unique CUDA architectures from a list of compute capabilities codes in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of getting rid of white space changes in this PR
cmake/cpu_extension.cmake
Outdated
| # and create a local shim dir with it | ||
| vllm_prepare_torch_gomp_shim(VLLM_TORCH_GOMP_SHIM_DIR) | ||
|
|
||
| find_library(OPEN_MP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you elaborate on why any libgomp related changes are needed here?
what problem do they address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid any undefined behaviour from using OMP, oneDNN and ACL both should use the same omp version as PyTorch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atm ACL will load pytorch's libgomp as per #27415
oneDNN has to be already linking against pytorch's libgomp atm, because that's the only libgomp that I see being loaded.
Please let me know if that's not already the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is mostly true.
You get oneDNN static library (libdnnl.a) compiled against compile time omp version which now we don't set it to pytorch's version. If runtime OMP is not ABI-compatible with compile time OMP, you can get link failures etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If runtime OMP is not ABI-compatible with compile time OMP
Yeah, we should make sure that pytorch's libgomp is used for both compile time and runtime, is that not already happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any omp specific variable being set for oneDNN which implies that it will use the system path. We specify pytorch omp solely for ACL. By adding it to LD_LIBRARY_PATH we make sure that oneDNN is going to pick up the version from VLLM_TORCH_GOMP_SHIM_DIR before checking system paths as well.
…n builds Signed-off-by: Radu Salavat <[email protected]>
f721fed to
2181080
Compare
|
@bigPYJ1151 Is support for dynamic linking of libraries something of interest for vLLM? |
Hi @Radu2k You want to link OpenMP in vllm |
|
Hi @bigPYJ1151 There are 2 key points here:
|
[build][cmake]: Bundle ACL dynlibs and torch libgomp for CPU extension builds
Purpose
Bundle ACL dynamic libs
libarm_compute.soandlibarm_compute_graph.soon AArch64 into.libspackage folder. Use the same libgomp as PyTorch package for oneDNN and ACL build by adding path toLD_LIBRARY_PATH.Fixes: #28056
Test Plan
.libsdirectory<env>/lib/python<version>/site-packages/vllm/.libscontainslibarm_compute.soandlibarm_compute_graph.soreadelf -d _C.abi3.sorunpath points to$ORIGIN:$ORIGIN/.libsTest Result
.libsfolder is part of the package.libs/$ORIGIN:$ORIGIN/.libsare part of the rpathWARNING 11-04 18:30:32 [interface.py:171] Failed to import from vllm._C: ImportError('libarm_compute.so: cannot open shared object file: No such file or directory')Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.