Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OpenACC multithreading in pre and post process #755

Open
wilfonba opened this issue Dec 13, 2024 · 4 comments
Open

Use OpenACC multithreading in pre and post process #755

wilfonba opened this issue Dec 13, 2024 · 4 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@wilfonba
Copy link
Contributor

CPU multithreading can be easily accomplished by adding !$acc directives to loops and adding the -ta=multicore command line option. Since no device-to-host memory is required, no update device (and maybe even no declare create) clauses are required, so this should be a relatively simple task. It would also require the request of multiple cores per task, but this is already part of SLURM. This feature would be particularly useful for simulations that use unified memory on GH200 and MI300A chips, where pre_process and post_process can take a significant amount of time if run on only one core. It would also potentially be useful for problems that involve STLs, which require a ray tracing step in pre_process, and when derived quantities like vorticity of Q-criterion are needed in post_process. I know this works with NVHPC, but I haven't tried it with CCE yet.

@wilfonba wilfonba added enhancement New feature or request good first issue Good for newcomers labels Dec 13, 2024
@sbryngelson
Copy link
Member

You can also use all of the cores on a CPU die via MPI. It's unclear whether OpenACC gives much advantage here, no? Historically, OpenMP has been used for such multithreading, but those advantages over the latest MPI implementations have mostly gone away.

@wilfonba
Copy link
Contributor Author

wilfonba commented Dec 15, 2024

The benefit of using multithreading over MPI is that file_per_process can be used, and domain decomposition doesn't have to be performed twice. OpenACC's multithreading showed decent speedups on the course project I finished recently.

@sbryngelson
Copy link
Member

The benefit of using multithreading over MPI is that file_per_process can be used, and domain decomposition doesn't have to be performed twice. OpenACC's multithreading showed decent speedups on the course project I finished recently.

It seems reasonable... did you compare it against MPI?

@wilfonba
Copy link
Contributor Author

I haven't dug that deep yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Development

No branches or pull requests

2 participants