Skip to content

Commit 5741c99

Browse files
Venkata-Durga-Raokiriti-pendyala
authored andcommitted
[ZENTORCH CORE] Update LICENCE, TPN, README.md
-- Update LICENCE, TPN, README.md for ZenDNN v5.0 release Signed-off-by: durga <[email protected]> Change-Id: Ib677fba7a4ecdd88f5819da1a6e81eca5350c7c6 Signed-off-by: durga <[email protected]>
1 parent 4c8ae49 commit 5741c99

File tree

3 files changed

+78
-37
lines changed

3 files changed

+78
-37
lines changed

LICENCE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
LICENSE for ZenDNN v5.0 Beta + Pytorch v2.3.0 Source - AMD copyrighted code
1+
LICENSE for ZenDNN v5.0 + Pytorch Source - AMD copyrighted code
22

33

44

README.md

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -33,19 +33,19 @@ Table of Contents
3333

3434
## 1.1. Overview
3535

36-
**EARLY ACCESS:** The ZenDNN PyTorch* Plugin (zentorch) extends PyTorch* with an innovative upgrade that's set to revolutionize performance on AMD hardware.
36+
The latest ZenDNN Plugin for PyTorch* (zentorch) 5.0 is here!
3737

38-
As of version 5.0, AMD is unveiling a game-changing upgrade to ZenDNN, introducing a cutting-edge plug-in mechanism and an enhanced architecture under the hood. This isn't just about extensions; ZenDNN's aggressive AMD-specific optimizations operate at every level. It delves into comprehensive graph optimizations, including pattern identification, graph reordering, and seeking opportunities for graph fusions. At the operator level, ZenDNN boasts enhancements with microkernels, mempool optimizations, and efficient multi-threading on the large number of AMD EPYC cores. Microkernel optimizations further exploit all possible low-level math libraries, including AOCL BLIS.
38+
This powerful upgrade continues to redefine deep learning performance on AMD EPYC™ CPUs, combining relentless optimization, innovative features, and industry-leading support for modern workloads.
3939

40-
The result? Enhanced performance with respect to baseline PyTorch*. zentorch leverages torch.compile, the latest PyTorch enhancement for accelerated performance. torch.compile makes PyTorch code run faster by JIT-compiling PyTorch code into optimized kernels, all while requiring minimal code changes and unlocking unprecedented speed and efficiency.
40+
zentorch 5.0 takes deep learning to new heights with significant enhancements for bfloat16 performance, expanded support for cutting-edge models like Llama 3.1 and 3.2, Microsoft Phi, and more as well as support for INT4 quantized datatype. This includes the advanced Activation-Aware Weight Quantization (AWQ) algorithm, driving remarkable accuracy in low-precision computations.
4141

42-
The _zentorch_ extension to PyTorch enables inference optimizations for deep learning workloads on AMD EPYC&trade; CPUs. It uses the ZenDNN library, which contains deep learning operators tailored for high performance on AMD EPYC&trade; CPUs. Multiple passes of graph level optimizations run on the torch.fx graph provide further performance acceleration. All _zentorch_ optimizations can be enabled by a call to torch.compile with zentorch as the backend.
42+
Combined with PyTorch's torch.compile, zentorch transforms deep learning pipelines into finely-tuned, AMD-specific engines, delivering unparalleled efficiency and speed for large-scale inference workloads.
4343

44-
The ZenDNN PyTorch plugin is compatible with PyTorch version 2.3.0.
44+
The zentorch 5.0 plugs seamlessly with PyTorch version 2.4.0, offering a high-performance experience for deep learning on AMD EPYC™ platforms.
4545

4646
## Support
4747

48-
Please note that zentorch is currently in “Early Access” mode. We welcome feedback, suggestions, and bug reports. Should you have any of the these, please contact us on zendnn.maintainers@amd.com
48+
We welcome feedback, suggestions, and bug reports. Should you have any of the these, please kindly file an issue on the ZenDNN Plugin for PyTorch Github page [here](https://github.com/amd/ZenDNN-pytorch-plugin/issues)
4949

5050
## License
5151

@@ -58,13 +58,15 @@ _zentorch_ consists of three parts. They are
5858
- Build System
5959

6060
### 1.2.1. ZenDNN Integration Code
61-
ZenDNN is integrated into _zentorch_ using CPP code which interfaces ATen API to ZenDNN's API. This code exports torch compatible API similar to ATen IR of PyTorch. The exported API is made available for usage from python code using TORCH_LIBRARY and TORCH_LIBRARY_IMPL. Integration code is linked and compiled into _zentorch_ using CppExtension provided by PyTorch.
61+
ZenDNN is integrated into _zentorch_ using CPP code which interfaces ATen API to ZenDNN's API. This code exports torch compatible API similar to ATen IR of PyTorch. The exported API is made available for usage from python code using TORCH_LIBRARY, TORCH_LIBRARY_FRAGMENT and TORCH_LIBRARY_IMPL. Integration code is linked and compiled into _zentorch_ using CppExtension provided by PyTorch.
6262

6363
The following ops are integrated as of now:
6464
- Embedding bag op
6565
- Embedding op
6666
- Matmul ops
6767
- Custom Fusion ops
68+
- Rope op
69+
- MHA op
6870

6971
### 1.2.2. The _zentorch_ custom backend to torch.compile
7072
We have registered a custom backend to torch.compile called _zentorch_. This backend integrates ZenDNN optimizations after AOTAutograd through a function called optimize. This function operates on the FX based graph at the ATEN IR to produce an optimized FX based graph as the output.
@@ -74,17 +76,16 @@ We have registered a custom backend to torch.compile called _zentorch_. This bac
7476
The static libraries for ZenDNN, AOCL BLIS and the cpp Extension modules that bind the ZenDNN operators with Python are built using `setup.py` script.
7577

7678
#### 1.2.3.1. CMake Based Build: ZenDNN , AOCL BLIS and FBGEMM
77-
CMake downloads the ZenDNN , AOCL BLIS , FBGEMM and LIBXSMM during configure stage. It generates a config.h with GIT hashes of ZenDNN , AOCL BLIS , FBGEMM and LIBXSMM. It builds ZenDNN , AOCL BLIS , FBGEMM and LIBXSMM as static libraries.
79+
CMake downloads the ZenDNN , AOCL BLIS and FBGEMM during configure stage. It generates a config.h with GIT hashes of ZenDNN , AOCL BLIS and FBGEMM. It builds ZenDNN , AOCL BLIS and FBGEMM as static libraries.
7880

7981
#### 1.2.3.2. Packaging into a Wheel File
80-
The CPP code, being an extension module, is built through CppExtension. It takes static libraries of the ZenDNN , AOCL BLIS , FBGEMM and LIBXSMM libraries. `setup.py` also adds in various attributes to the _zentorch_ for debugging and providing additional information.
82+
The CPP code, being an extension module, is built through CppExtension. It takes static libraries of the ZenDNN , AOCL BLIS and FBGEMM libraries. `setup.py` also adds in various attributes to the _zentorch_ for debugging and providing additional information.
8183

8284
## 1.3. Third Party Libraries
8385
_zentorch_ uses following libraries for its functionality.
8486
* [ZenDNN](https://github.com/amd/ZenDNN)
8587
* [AOCL BLIS](https://github.com/amd/blis)
8688
* [FBGEMM](https://github.com/pytorch/FBGEMM)
87-
* [LIBXSMM](https://github.com/libxsmm/libxsmm.git)
8889

8990
# 2. Installation
9091

@@ -96,9 +97,9 @@ _zentorch_ can be installed using binary wheel file or can be built from source
9697
```bash
9798
pip uninstall zentorch
9899
```
99-
* Install Pytorch v2.3.0
100+
* Install Pytorch v2.4.0
100101
```bash
101-
conda install pytorch==2.3.0 cpuonly -c pytorch
102+
conda install pytorch==2.4.0 cpuonly -c pytorch
102103
```
103104
* Use one of two methods to install zentorch:
104105

@@ -120,12 +121,9 @@ cd ZENTORCH_v5.0.0_Python_v3.8/
120121
pip install zentorch-5.0.0-cp38-cp38-manylinux_2_28_x86_64.whl
121122
```
122123
>Note:
123-
* While importing zentorch, if you get an undefined symbol error such as:
124-
ImportError: <anaconda_install_path>/envs/<your-env>/lib/python3.x/site-packages/
125-
zentorch/_C.cpython-3x-x86_64-linux-gnu.so : undefined symbol: <some string>,
126-
it could be due to version differences with PyTorch. Verify that you are using PyTorch version
127-
which is similar to the PyTorch version which was used to build the wheel file.
128124
* Dependent packages 'numpy' and 'torch' will be installed by '_zentorch_' if not already present.
125+
* If you get the error: ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_.a.b.cc' not found (required by <path_to_conda>/envs/<env_name>/lib/python<py_version>/site-packages/zentorch-5.0.0-pyx.y-linux-x86_64.egg/zentorch/_C.cpython-xy-x86_64-linux-gnu.so), export LD_PRELOAD as:
126+
* export LD_PRELOAD=<path_to_conda>/envs/<env_name>/lib/libstdc++.so.6:$LD_PRELOAD
129127

130128
## 2.2. From Source
131129
Run the following commands:
@@ -140,23 +138,23 @@ git checkout r5.0
140138

141139
### 2.2.1. Preparing third party repositories
142140

143-
Build setup downloads the ZenDNN, AOCL BLIS , FBGEMM and LIBXSMM repos into `third_party` folder.
141+
Build setup downloads the ZenDNN, AOCL BLIS and FBGEMM repos into `third_party` folder.
144142

145143
### 2.2.2. Linux build
146144
#### 2.2.2.1. Create conda environment for the build
147145
```bash
148-
conda create -n pt-zentorch python=3.8
146+
conda create -n pt-zentorch python=3.8 -y
149147
conda activate pt-zentorch
150148
```
151149
#### 2.2.2.2. You can install torch using 'conda' or 'pip'
152150
```bash
153151
# Pip command
154-
pip install torch==2.3.0 --index-url https://download.pytorch.org/whl/cpu
152+
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cpu
155153
```
156154
or
157155
```bash
158156
# Conda command
159-
conda install pytorch==2.3.0 cpuonly -c pytorch
157+
conda install pytorch==2.4.0 cpuonly -c pytorch -y
160158
```
161159

162160
>Note: The CPU version of torch/pytorch only supports CPU version of torchvision.
@@ -194,7 +192,7 @@ with torch.no_grad():
194192
output = compiled_model(input)
195193
```
196194

197-
>Note: If same model is optimized with `torch.compile` for multiple backends within single script, it is recommended to use `torch._dynamo.reset()` before calling the `torch.compile` on that model.
195+
>Note: If same model is optimized with `torch.compile` for multiple backends within single script, it is recommended to use `torch._dynamo.reset()` before calling the `torch.compile` on that model. This is applicable if torch version is less than 2.3.
198196
199197
>Note: _zentorch_ is able to do the zentorch op replacements in both non-inference and inference modes. But some of the _zentorch_ optimizations are only supported for the inference mode, so it is recommended to use `torch.no_grad()` if you are running the model for inference only.
200198
@@ -215,7 +213,7 @@ with torch.no_grad():
215213
```
216214

217215
## 3.4 HuggingFace Generative LLM models
218-
For HuggingFace Generative LLM models, usage of zentorch.llm.optimize is recommended. All the optimizations included in this API are specifically targeted for Generative Large Language Models from HuggingFace. If a model which is not a valid Generative Large Language Model from HuggingFace, the following warning will be displayed and zentorch.llm.optimize will act as a dummy with no optimizations being applied to the model that is passed: “Cannot detect the model transformers family by model.config.architectures. Please pass a valid HuggingFace LLM model to the zentorch.llm.optimize API.” This check confirms the presence of the "config" and "architectures" attributes of the model to get the model id. Considering the check, two scenarios the zentorch.llm.optimize can still act as a dummy function:
216+
For HuggingFace Generative LLM models, usage of zentorch.llm.optimize is recommended. All the optimizations included in this API are specifically targeted for Generative Large Language Models from HuggingFace. If a model is not a valid Generative Large Language Model from HuggingFace, the following warning will be displayed and zentorch.llm.optimize will act as a dummy function with no optimizations being applied to the model that is passed: “Cannot detect the model transformers family by model.config.architectures. Please pass a valid HuggingFace LLM model to the zentorch.llm.optimize API.” This check confirms the presence of the "config" and "architectures" attributes of the model to get the model id. Considering the check, two scenarios the zentorch.llm.optimize can still act as a dummy function:
219217
1. HuggingFace has a plethora of models, of which Generative LLMs are a subset of. So, even if the model has the attributes of "config" and "architectures", the model id might not be yet present in the supported models list from zentorch. In this case zentorch.llm.optimize will act as a dummy function.
220218
2. A model can be a valid generative LLM from HuggingFace or not, might miss the "config" and "architectures" attributes. In this case also, the zentorch.llm.optimize API will act as a dummy function.
221219

@@ -227,6 +225,7 @@ python -c 'import zentorch; print("\n".join([f"{i+1:3}. {item}" for i, item in e
227225

228226
If a model id other than the listed above are passed, zentorch.llm.optimize will not apply the above specific optimizations to the model and a warning will be displayed as follows: “Complete set of optimizations are currently unavailable for this model.” Control will pass to the zentorch custom backend to torch.compile for applying optimizations.
229227

228+
For leveraging the best performance of zentorch_llm_optimize, user has to install IPEX corresponding to the PyTorch version that is installed in the environment.
230229
The PyTorch version for performant execution of supported LLMs should be greater than or equal to 2.3.0. Recommended version for optimal performance is using PyTorch 2.4.
231230

232231
### Case #1. If output is generated through a call to direct `model`, optimize it as below:
@@ -297,7 +296,7 @@ export ZENTORCH_PY_LOG_LEVEL=DEBUG
297296
The default level of logs is **WARNING** for both cpp and python sources but can be overridden as discussed above.
298297
>NOTE: The log levels are the same as those provided by the python logging module.
299298
300-
>INFO: Since all OPs implemented in _zentorch_ are registered with torch using the TORCH_LIBRARY() and TORCH_LIBRARY_IMPL() macros in bindings, the PyTorch profiler can be used without any modifications to measure the op level performance.
299+
>INFO: Since all OPs implemented in _zentorch_ are registered with torch using the TORCH_LIBRARY(), TORCH_LIBRARY_FRAGMENT() and TORCH_LIBRARY_IMPL() macros in bindings, the PyTorch profiler can be used without any modifications to measure the op level performance.
301300
302301
## 4.3 Support for `TORCH_COMPILE_DEBUG`
303302
PyTorch offers a debugging toolbox that comprises a built-in stats and trace function. This functionality facilitates the display of the time spent by each compilation phase, output code, output graph visualization, and IR dump. `TORCH_COMPILE_DEBUG` invokes this debugging tool that allows for better problem-solving while troubleshooting the internal issues of TorchDynamo and TorchInductor. This functionality works for the models optimized using _zentorch_, so it can be leveraged to debug these models as well. To enable this functionality, users can either set the environment variable `TORCH_COMPILE_DEBUG=1` or specify the environment variable with the runnable file (e.g., test.py) as input.

0 commit comments

Comments
 (0)