Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MeshAdaptor performance #265

Open
barche opened this issue May 8, 2014 · 8 comments
Open

MeshAdaptor performance #265

barche opened this issue May 8, 2014 · 8 comments

Comments

@barche
Copy link
Member

barche commented May 8, 2014

Partitioning a 3D tetrahedral mesh with 14M nodes on 64 cores, the step

MeshAdaptor: rebuild glb_to_loc maps in dictionaries

is taking almost a day so far. We should look into improving the performance.

@wdeconinck
Copy link
Member

That's terrible. 14M nodes is not even that much...

@wdeconinck
Copy link
Member

So is it following piece of code that is the culprit?
https://github.com/coolfluid/coolfluid3/blob/master/cf3/mesh/Dictionary.cpp#L413-L420
It uses cf3::common::Map() internally, and should be fast...
It first allocates memory in a vector, then fills in all elements in given order.
Then sort_keys() will just do a sort of 14M points... This should be fast and has no MPI communication.
Is there perhaps another piece of code which is slowing it down?
Do you have a test?

@barche
Copy link
Member Author

barche commented Jun 25, 2014

I'm not sure. I did a quick profiling, but map operations didn't stand out there. The code you mention should run O(N*log(N)), so I doubt that this is the cause.

As a test, you could use any sufficiently large 3D Gmsh msh file and load it in parallel.

@barche
Copy link
Member Author

barche commented Jun 25, 2014

Adding some more debug comments, it became clear that most time is actually spent in fix_node_ranks, i.e.:
https://github.com/coolfluid/coolfluid3/blob/master/cf3/mesh/MeshAdaptor.cpp#L1052-L1096

@wdeconinck
Copy link
Member

Okay, I will soon have a look

1 similar comment
@wdeconinck
Copy link
Member

Okay, I will soon have a look

@wdeconinck
Copy link
Member

This routine looks indeed quite expensive and candidate for optimisation. Did it matter how many cores you threw at it?

@barche
Copy link
Member Author

barche commented Jun 30, 2014

I didn't really check with the large case, but it seemed to take longer as the number of cores increased. I think that would be consistent with the way the communication happens in that part of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants