Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented allreduce & Allreduce for Communicator #92

Merged
merged 22 commits into from
Jan 7, 2025

Conversation

gmao-ckung
Copy link
Collaborator

@gmao-ckung gmao-ckung commented Dec 13, 2024

Description

  • Adds an MPI allreduce and Allreduce function under the NDSL Communicator class that enables the summation of one, two, and three-dimensional Quantities across ranks into a single Quantity that's also synced across all ranks while an operation is applied during sync (SUM, MAX, etc.)
  • This branch also forbids anything but a derivative of the ndsl.communicator.comm_abc.Comm object to be given to a Communicator to enforce a layer between the actual MPI implementation and the run code. Numerous fixes to undo passing mpi4py directly to Communicator throughout the base,

⚠️ The above point corrects a widely use bad behavior and might break people systems ⚠️

Fixes # (issue)
Adds MPI Allreduce sum functionality to Communicator class.

How Has This Been Tested?
A unit test test_mpi_allreduce_sum.py has been included under tests/mpi that creates a one, two, and three-dimensional quantity of integers on all ranks, passes the quantity to the allreduce sum function, and check whether the resulting quantity is equal to the original quantity multiplied by the number of ranks.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • New check tests, if applicable, are included

)
return all_reduce_quantity

def all_reduce_sum(self, quantity: Quantity):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a new quantity here? The MPI spec is to have send & recv buffers so mpi4py is already stepping out of that. In-place would be weird, but temp memory floating around like this means we can control it either (buffer or it or otherwise).

I see a few alternative:
1/ Pass an optional recv buffer that can be control from outside
2/ Have an option to be in-place
3/ Mark this for a future NDSL-wide memory pool system (which we are going to get at some point)

@twicki thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, since I saw that gather and scatter created their own Quantities in their operation, I figured that was the way to go about it. My sense is that an in-place where the allreduce_sum data replaces the data from the input Quantity to the allreduce is more elegant in that it doesn't require a memory allocation. It looks like a setter for Quantity is needed for this to happen since Python complains about needing a setting if quantity.data = reduced_quantity_data is run, and I'm not sure if that will cause other potential issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the gather and scatter are missing this too. I let Tobias way in, but we probably at least offer the opportunity to pass a temporary buffer by hand so it can be manages by the user.

As for the in-place, when accessing .data you should be in numpy/cupy land and it should be good to go so that is odd

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the Quantity object to allow setters for the various Quantity metadata and data properties. all_reduce_sum now allows an output quantity to be passed that will contain the result of the all_reduce being performed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having setters in Quantity modifying properties of QuantityMetadata, I created a routine duplicate_metadata that performs the copying of QuantityMetadata properties from one to the other.

Copy link
Collaborator

@FlorianDeconinck FlorianDeconinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall, we need to make sure we are consistent if we return a new quantity and we need to debate memory management

@FlorianDeconinck FlorianDeconinck changed the title MPI Allreduce Sum function for Quantities Implemented allreduce & Allreduce for Communicator Dec 22, 2024
@FlorianDeconinck FlorianDeconinck self-requested a review January 7, 2025 14:04
@FlorianDeconinck FlorianDeconinck merged commit 9f5e50c into develop Jan 7, 2025
5 checks passed
@FlorianDeconinck FlorianDeconinck deleted the feature/mpi_allreduce_sum branch January 7, 2025 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants