-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
First of all, I love Tullio. It is very magical. I’ve reduced my codebase by 3×, it works on both CPU and GPU, and it also made it faster!
However, I’m still struggling to wrap my head around Tullio’s multi-line syntax.
How does one fuse multiple Tullio statements? This issue arises when writing for a GPU because every Tullio line is another CPU call.
For example, the code below calculates the means and standard deviations as two separate calls over the same window w, but ideally this could be done entirely in a single call and output a matrix:
T = CUDA.rand(1000)
w = 100
means(T, w) = @tullio μ[i] := T[i + k - 1] / w (k in 1:w)
stds(T, w, μ) = @tullio σ[i] := sqrt <| (T[i + k - 1] - μ[i])^2 / w (k in 1:w)Metadata
Metadata
Assignees
Labels
No labels