@@ -132,7 +132,74 @@ Platform Info:
132132Environment:
133133 JULIA_NUM_THREADS = 8
134134```
135+ Single-threaded benchmarks on an M1 mac:
136+ ``` julia
137+ julia> N = 100 ;
138+
139+ julia> A = rand (N,N); B = rand (N,N); C = similar (A);
140+
141+ julia> @benchmark TriangularSolve. rdiv! ($ C, $ A, UpperTriangular ($ B), Val (false )) # false means single threaded
142+ BenchmarkTools. Trial: 10000 samples with 1 evaluation.
143+ Range (min … max): 21.416 μs … 34.458 μs ┊ GC (min … max): 0.00 % … 0.00 %
144+ Time (median): 21.624 μs ┊ GC (median): 0.00 %
145+ Time (mean ± σ): 21.767 μs ± 491.788 ns ┊ GC (mean ± σ): 0.00 % ± 0.00 %
146+
147+ ▃ ▆██ ▆▄ ▁ ▃▄ ▄▂ ▁ ▂▃▁ ▂
148+ ▃▇█▁███▁██▁█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▃█▁██▁███▁▆▃▁▁▆▇▁██▁█▆▅▁▄▃▁▃▃▇▁███ █
149+ 21.4 μs Histogram: log (frequency) by time 23.2 μs <
150+
151+ Memory estimate: 0 bytes, allocs estimate: 0.
152+
153+ julia> @benchmark rdiv! (copyto! ($ C, $ A), UpperTriangular ($ B))
154+ BenchmarkTools. Trial: 10000 samples with 1 evaluation.
155+ Range (min … max): 39.124 μs … 57.749 μs ┊ GC (min … max): 0.00 % … 0.00 %
156+ Time (median): 46.166 μs ┊ GC (median): 0.00 %
157+ Time (mean ± σ): 46.274 μs ± 1.766 μs ┊ GC (mean ± σ): 0.00 % ± 0.00 %
158+
159+ ▁▁▄▂▆▃█▅▇▄▇▅▃▃▁▃▁▂
160+ ▂▁▁▂▂▂▂▂▁▂▂▂▂▂▂▃▃▃▃▃▄▄▅▅▆▅▇▇████████████████████▆▇▆▆▅▆▅▅▄▃▃ ▅
161+ 39.1 μs Histogram: frequency by time 50.2 μs <
135162
163+ Memory estimate: 0 bytes, allocs estimate: 0.
164+
165+ julia> @benchmark ldiv! ($ C, LowerTriangular ($ B), $ A)
166+ BenchmarkTools. Trial: 10000 samples with 1 evaluation.
167+ Range (min … max): 48.291 μs … 57.833 μs ┊ GC (min … max): 0.00 % … 0.00 %
168+ Time (median): 49.124 μs ┊ GC (median): 0.00 %
169+ Time (mean ± σ): 49.306 μs ± 802.143 ns ┊ GC (mean ± σ): 0.00 % ± 0.00 %
170+
171+ ▁▃▅▆▇██▇██▇▇▆▅▄▂▂▁▁▁▂▁▁▁▁▁▁▁ ▁▁▁ ▃
172+ ▃████████████████████████████████████▇▆▄▂▄▃▂▃▃▄▄▃▆▅▇▇▇██▇█▇▇ █
173+ 48.3 μs Histogram: log (frequency) by time 53 μs <
174+
175+ Memory estimate: 0 bytes, allocs estimate: 0.
176+
177+ julia> @benchmark TriangularSolve. ldiv! ($ C, LowerTriangular ($ B), $ A, Val (false )) # false means single threaded
178+ BenchmarkTools. Trial: 10000 samples with 1 evaluation.
179+ Range (min … max): 34.249 μs … 40.208 μs ┊ GC (min … max): 0.00 % … 0.00 %
180+ Time (median): 34.375 μs ┊ GC (median): 0.00 %
181+ Time (mean ± σ): 34.748 μs ± 774.675 ns ┊ GC (mean ± σ): 0.00 % ± 0.00 %
182+
183+ ▆██▆▃▄▅▃ ▁▁▄▅▅▃▂▁ ▂▃▂ ▁▂ ▂
184+ ████████▁▁▃▁▁▁▁▁▃▄▃▁▁▃██████████▇▅▄▅▅▆▄▄▄▄▄▅▄▄▃▅▃▄▃▅█████▇██ █
185+ 34.2 μs Histogram: log (frequency) by time 37.1 μs <
186+
187+ Memory estimate: 0 bytes, allocs estimate: 0.
188+ ```
189+ Or
190+ ``` julia
191+ julia> @benchmark TriangularSolve. ldiv! ($ C, LowerTriangular ($ B), $ A, Val (false )) # false means single threaded
192+ BenchmarkTools. Trial: 10000 samples with 1 evaluation.
193+ Range (min … max): 23.750 μs … 30.541 μs ┊ GC (min … max): 0.00 % … 0.00 %
194+ Time (median): 23.875 μs ┊ GC (median): 0.00 %
195+ Time (mean ± σ): 23.948 μs ± 316.293 ns ┊ GC (mean ± σ): 0.00 % ± 0.00 %
196+
197+ ▃▁▆ █ ▇▆▆ ▄ ▁ ▁ ▁ ▁ ▁ ▂
198+ ▅███▆█▁███▄█▁██▇▁▄▁▁▁▁▁▃▁▁▁▁▁▁▁▃▁▁▁▃▁▁▁▁▁▆▁▇▆█▁█▁▇▆▅▁▅▁▇▆█▁█ █
199+ 23.8 μs Histogram: log (frequency) by time 25 μs <
200+
201+ Memory estimate: 0 bytes, allocs estimate: 0.
202+ ```
136203
137204For editing convenience (you can copy/paste the above into a REPL and it should automatically strip ` julia> ` s and outputs, but the above is less convenient to edit if you want to try changing the benchmarks):
138205``` julia
0 commit comments