Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix support for external MPI #2970

Merged
merged 4 commits into from
Aug 16, 2024
Merged

Conversation

mathomp4
Copy link
Member

@mathomp4 mathomp4 commented Aug 16, 2024

Types of change(s)

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Trivial change (affects only documentation or cleanup)
  • Refactor (no functional changes, no api changes)

Checklist

  • Tested this change with a run of GEOSgcm
  • Ran the Unit Tests (make tests)

Description

It was found by @DusanJovic-NOAA (see below) that MAPL has broken externally-initialized MPI. The logic pretty much says "No you didn't" even if you did and, well, MPI stacks fail. Here we try and revive that functionality, though it is hard for us to test it.

The code change enforces "You better have initialized at MPI_THREAD_MULTIPLE" since, well, the current code seems to enforce that. I'm not sure if it is truly required or not. I'll consult @aoloso and @tclune to see.

Note that also I am not sure what ESMF_InitializePreMPI() does other than what the docs say and if it should/can be called in both cases.

Current GEOS-MAPL is:

  1. ESMF_InitializePreMPI
  2. MPI_Init_thread
  3. ESMF_Initialize

but the NOAA-MAPL case here will be be:

  1. MPI_Init_thread
  2. MPI_Query_thread
  3. ESMF_Initialize

The ESMF docs say:

This method is only needed for cases where MPI is initialized explicitly by user code. In most typical cases ESMF_Initialize() is called before MPI is initialized, and takes care of all the internal initialization, including MPI.

There are circumstances where it is necessary or convenient to initialize MPI before calling into ESMF_Initialize(). This option is supported by ESMF, and for most cases no special action is required on the user side. However, for cases where ESMF_CompSetVM() methods are used to move processing elements (PEs), i.e. CPU cores, between persistent execution threads (PETs), ESMF uses POSIX signals between PETs. In order to do so safely, the proper signal handlers must be installed before MPI is initialized. This is accomplished by calling ESMF_InitializePreMPI() from the user code prior to the MPI initialization.

So, I mean, it seems wrong to call ESMF_InitializePreMPI() in the NOAA case because MPI_Init_thread has already been called. But I'm not sure what (if anything?) breaks in MAPL if one does not call ESMF_InitializePreMPI...

Related Issue

ufs-community/ufs-weather-model#2399 (comment)

@mathomp4 mathomp4 added the 0 Diff The changes in this pull request have verified to be zero-diff with the target branch. label Aug 16, 2024
@mathomp4
Copy link
Member Author

In my testing, this seems to work. With the help of @bena-nasa I managed a cheeky hack in Cap to pre-initialize MPI and it seemed good as well. Release time.

@mathomp4 mathomp4 marked this pull request as ready for review August 16, 2024 14:52
@mathomp4 mathomp4 requested review from a team as code owners August 16, 2024 14:52
@mathomp4 mathomp4 merged commit e8b8e5e into release/v2.46 Aug 16, 2024
10 of 23 checks passed
@mathomp4 mathomp4 deleted the hotfix/mathomp4/mpi-init-2.46 branch August 16, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 Diff The changes in this pull request have verified to be zero-diff with the target branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant