Use OpenACC multithreading in pre and post process #755

wilfonba · 2024-12-13T22:19:42Z

CPU multithreading can be easily accomplished by adding !$acc directives to loops and adding the -ta=multicore command line option. Since no device-to-host memory is required, no update device (and maybe even no declare create) clauses are required, so this should be a relatively simple task. It would also require the request of multiple cores per task, but this is already part of SLURM. This feature would be particularly useful for simulations that use unified memory on GH200 and MI300A chips, where pre_process and post_process can take a significant amount of time if run on only one core. It would also potentially be useful for problems that involve STLs, which require a ray tracing step in pre_process, and when derived quantities like vorticity of Q-criterion are needed in post_process. I know this works with NVHPC, but I haven't tried it with CCE yet.

The text was updated successfully, but these errors were encountered:

sbryngelson · 2024-12-13T22:58:35Z

You can also use all of the cores on a CPU die via MPI. It's unclear whether OpenACC gives much advantage here, no? Historically, OpenMP has been used for such multithreading, but those advantages over the latest MPI implementations have mostly gone away.

wilfonba · 2024-12-15T00:08:33Z

The benefit of using multithreading over MPI is that file_per_process can be used, and domain decomposition doesn't have to be performed twice. OpenACC's multithreading showed decent speedups on the course project I finished recently.

sbryngelson · 2024-12-15T14:02:46Z

The benefit of using multithreading over MPI is that file_per_process can be used, and domain decomposition doesn't have to be performed twice. OpenACC's multithreading showed decent speedups on the course project I finished recently.

It seems reasonable... did you compare it against MPI?

wilfonba · 2024-12-16T16:55:16Z

I haven't dug that deep yet.

wilfonba added enhancement New feature or request good first issue Good for newcomers labels Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use OpenACC multithreading in pre and post process #755

Use OpenACC multithreading in pre and post process #755

wilfonba commented Dec 13, 2024

sbryngelson commented Dec 13, 2024

wilfonba commented Dec 15, 2024 •

edited

Loading

sbryngelson commented Dec 15, 2024

wilfonba commented Dec 16, 2024

Use OpenACC multithreading in pre and post process #755

Use OpenACC multithreading in pre and post process #755

Comments

wilfonba commented Dec 13, 2024

sbryngelson commented Dec 13, 2024

wilfonba commented Dec 15, 2024 • edited Loading

sbryngelson commented Dec 15, 2024

wilfonba commented Dec 16, 2024

wilfonba commented Dec 15, 2024 •

edited

Loading