You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-6Lines changed: 15 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,14 +11,17 @@ The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are
11
11
12
12
## Introduction
13
13
14
-
The builds enable CPU optimizations such as `SSE4`, `AVX2`, and `FMA`. If you have a CPU released after ~2013 then you'll benefit from them. Note that you will benefit from these even if you do all your training on GPU due to i/o pipeline optimizations. I think I've gained about 10-15% performance boost even on most straightforward supervised learning tasks. And of course in CPU only setting they give significant improvement, sometimes matching GPU speeds on smaller neural networks (especially true for laptops where even in higher end models GPUs tend to lag behind).
14
+
The builds enable various performance flags targeting modern CPUs, including SIMD support (AVX2, SSE4, FMA).
15
+
If you have a CPU released after ~2013 then you'll likely benefit from these on e.g. data pre-processing.
15
16
16
-
Additionally, build enables [XLA](https://www.tensorflow.org/xla/) - an Accelerated Linear Algebra domain-specific just-in-time compiler, and [MPI](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/mpi) - a faster way to run distributed TensorFlow than what is offered built-in.
17
+
Build also enables [XLA](https://www.tensorflow.org/xla/) - an Accelerated Linear Algebra domain-specific just-in-time compiler.
18
+
19
+
Finally, additional compute capabilities support (5.0, 6.1, 7.0) is enabled, which means these wheels should also work on older GPUs (7xx - 9xx families).
0 commit comments