-
|
What is your question?
|
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
Note that Spark normally executes in a row-by-row fashion, while the RAPIDS Accelerator operates on columnar batches. Can you elaborate more on how you isolated the timing for these operations? It's easy to accidentally measure more than what was intended (i.e.: also the cost of the operations producing the input). Also the scale factor of the data is fairly low. GPUs do not excel at processing small amounts of data. You will probably see better performance by increasing the amount of data each task sees (e.g.: increasing As to which operations will perform particularly well on GPUs relative to CPUs, here's an incomplete list:
Have you hit the maximum number of GPUs available in the cluster? If so then you would be running with the same number of executors which explains the same performance. You could try running more than 4 cores per executor, as some queries can benefit from more CPU cores on the executor even when running on the GPU, especially if there are significant parts of the query that are not translated for the GPU. If you are seeing more executors being executed (and therefore more GPUs used) than before) then you may be I/O bound. Try running with just CPU executors and see if the performance changes as you scale the number of executors similarly (keeping the cores-per-executor constant). |
Beta Was this translation helpful? Give feedback.
-
|
Can you elaborate more on how you isolated the timing for these operations? OK. For example: Is this way correct? If not, how should I get the execution time of a specific operation? Have you hit the maximum number of GPUs available in the cluster? Even with the above setting, the execution efficiency of query in GPU mode is almost unchanged. Perhaps improving IO is a solution that can be explored |
Beta Was this translation helpful? Give feedback.
-
|
Spark executes in a lazy fashion, so this snippet: does not measure the time of a join. No join was actually performed during the time measured because nothing forced Spark to manifest the result of the join in any way (e.g.: writing the results of the join somewhere, collecting the results back to the driver via a Note that even if you changed the snippet to do something like this: That will force the join to execute during the section being timed, but it won't measure just the join. It will also measure the time it took to construct |
Beta Was this translation helpful? Give feedback.
-
|
Can you elaborate more on how you isolated the timing for these operations? OK. For example: Is this way correct? If not, how should I get the execution time of a specific operation? Have you hit the maximum number of GPUs available in the cluster? Even with the above setting, the execution efficiency of query in GPU mode is almost unchanged. Perhaps improving IO is a solution that can be explored
Thanks,you are so nice! |
Beta Was this translation helpful? Give feedback.
-
Thanks, @YeahNew! I'm closing this as answered. Please reopen or file a new question if there's more along these lines you'd like to discuss. |
Beta Was this translation helpful? Give feedback.
Note that Spark normally executes in a row-by-row fashion, while the RAPIDS Accelerator operates on columnar batches. Can you elaborate more on how you isolated the timing for these operations? It's easy to accidentally measure more than what was intended (i.e.: also the cost of the operations producing the input).
Also the scale factor of the data is fairly low. GPUs do not excel at processing small amounts of data. You will probably see better performance by increasing the amount of data each task sees (e.g.: increasing
spark.sql.files.maxPartitionBytes, …