Skip to content
Discussion options

You must be logged in to vote

While the mentioned buffers can actually be configured to use pinned memory by passing an allocator with a thrust::cuda::universal_host_pinned_memory_resource to the execution policy (See e.g. thrust/examples/cuda/custom_temporary_allocation.cu), I'm not sure if this solves your issue as I think thrust::reduce will still copy the result from these buffers to the host-stack and synchronize afterwards because it needs to return by value. I would also expect bad performance from using pinned memory for the device scratch space as it is not only used for storing the final result. As mentioned on Discord, cub::DeviceReduce() is the right choice in this situation.

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
1 reply
@pauleonix
Comment options

Answer selected by JigaoLuo
Comment options

You must be logged in to vote
3 replies
@pauleonix
Comment options

@JigaoLuo
Comment options

@pauleonix
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Thrust
Labels
None yet
2 participants