-
|
I'm using pyspark in AWS EMR and I'm interested in running our clusters with GPUs. The python UDFs run inside GPU or CPU? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
Not currently. We are working on speeding up the transfer of data from spark to python #610 specifically pandas UDFs. We are also working on scheduling so if you want to use cudf or some other GPU accelerated python processing in your pandas UDFs to speed things up you can. Both of these are still very early on and we might get some of it into the 0.2 release, but it will probably be off by default. Hopefully the 0.3 release will have a full version of the changes, but it will probably require a few changes to pyspark to be able to fully support it. We have looked at using numba to compile a UDF into PTX code that we could run on the GPU, but we have not really done too much with that yet. |
Beta Was this translation helpful? Give feedback.
-
|
Closing as answered. |
Beta Was this translation helpful? Give feedback.
Not currently.
We are working on speeding up the transfer of data from spark to python #610 specifically pandas UDFs.
We are also working on scheduling so if you want to use cudf or some other GPU accelerated python processing in your pandas UDFs to speed things up you can.
Both of these are still very early on and we might get some of it into the 0.2 release, but it will probably be off by default. Hopefully the 0.3 release will have a full version of the changes, but it will probably require a few changes to pyspark to be able to fully support it.
We have looked at using numba to compile a UDF into PTX code that we could run on the GPU, but we have not really done too much with that yet.