-
Notifications
You must be signed in to change notification settings - Fork 428
FFT Poisson Solver: Neumann and Dirichlet Boundaries #4202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5cee80d to
e68ee9b
Compare
|
Notes on the implementation of cosine and sine transform are available at https://www.overleaf.com/read/krjbcfhfgvmj#f7c9e1. |
|
I once made an optimized FFT Poisson solver for GPU in HiPACE++: https://github.com/Hi-PACE/hipace/blob/development/src/fields/fft_poisson_solver/FFTPoissonSolverDirichletFast.cpp. It does a single-rank 2D DST-I using the Fast Sine Transform algorithm from page 238 of Computational Frameworks for the Fast Fourier Transform by Charles Van Loan. This does not require expanding the domain by 2x or 4x like it is currently done in this PR for the R2R FFTs. The following Pages have similar algorithms for DST-II, DST-III, DCT-II and DCT-III. I also found that it was better to implement the R2R FFT directly in the Poisson solver instead of in the FFT wrapper so that the pre- and post-processing GPU kernels can be combined with the transposes (here ParallelCopy). |
|
That's good know. What we need is batched 1D DST and DCT. I guess that might be even easier than the 2D FFT you have implemented. |
Add support for Neumann and Dirichlet boundaries in the FFT based Poisson solver. This requires cosine and sine transforms. For CPU builds, we use FFTW for these transforms. But GPU builds, we have implemented cosine and sine transforms using the real-to-complex transform provided by cuFFT, rocFFT and oneMKL.
|
Ready for review, but let's not merge it until after the monthly release. |
Add support for Neumann and Dirichlet boundaries in the FFT based Poisson solver. This requires cosine and sine transforms. For CPU builds, we use FFTW for these transforms. For GPU builds, we have implemented cosine and sine transforms using the real-to-complex transform provided by cuFFT, rocFFT and oneMKL.