Update fft3d.cpp: add timer and FFT3D GFLOPS#729
Update fft3d.cpp: add timer and FFT3D GFLOPS#729jngkim wants to merge 1 commit intooneapi-src:mainfrom
Conversation
|
|
||
| if (comm_rank == 0) { | ||
| std::size_t volume = x * y * z; | ||
| std::size_t fft_flops = |
There was a problem hiding this comment.
Would scaling flops by number of GPUs be the proper way to do weak scaling experiment?
There was a problem hiding this comment.
I think we should use the total flops for both.
I've modified mhp/fft3d.cpp to run multiple times with --reps and compute FFT3D flops and was planning a PR. I think we can discard this one.
std::size_t volume = x * y * z;
std::size_t fft_flops = 2 * static_caststd::size_t(5. * volume * std::log2(static_cast(volume)));
Stats stats(state, 2 * sizeof(real_t) * volume, 4 * sizeof(real_t) * volume, fft_flops);
distributed_fft<real_t> fft3d(x, y, z);
fft3d.compute();
for (auto _ : state) {
for (std::size_t i = 0; i < default_repetitions; i++) {
stats.rep();
fft3d.compute();
}
}
There was a problem hiding this comment.
Sorry, by answer was off. The size of interest is in BW-limited regime and a fixed volume is fine. In any case, weak-scaling does not make a lot of sense because FFT performance widely varies.
There was a problem hiding this comment.
I see, you don't expect that small data size is limiting scalability.
Added timer and GFLOPS for the stand-alone binary for performance analysis.