Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ArrayPartition from RecursiveArrayTools.jl #118

Merged
merged 24 commits into from
Jun 21, 2024
Merged

Conversation

JoshuaLampert
Copy link
Owner

This simplifies the code as no wrapping between different data structures is necessary anymore.
I opted for a VectorOfArray instead of an ArrayPartition because, as I understand, VectorOfArrays allow for something like q[i, :] or view(q, i, :), i.e. picking the values at one node for all variables, which would be more cumbersome for an ArrayPartition if I understand correctly. Are there any drawbacks of a VectorOfArray compared to ArrayPartition, @ranocha?
Unfortunately, there is still a huge performance bottleneck. Interestingly, this is outside of rhs!, see the timings below

On main
julia> trixi_include("examples/bbm_bbm_1d/bbm_bbm_1d_basic.jl", tspan = (0.0, 10.0));
[...]
 ──────────────────────────────────────────────────────────────────────────────
      DispersiveSWE                 Time                    Allocations      
                           ───────────────────────   ────────────────────────
    Tot / % measured:           202ms /  97.7%            210MiB /  99.6%    

Section             ncalls     time    %tot     avg     alloc    %tot      avg
──────────────────────────────────────────────────────────────────────────────
rhs!                5.60k    185ms   93.7%  33.0μs    204MiB   97.5%  37.3KiB
 dv hyperbolic      5.60k   48.8ms   24.8%  8.72μs    113MiB   54.0%  20.6KiB
 deta elliptic      5.60k   46.0ms   23.3%  8.21μs     0.00B    0.0%    0.00B
 dv elliptic        5.60k   44.3ms   22.5%  7.92μs     0.00B    0.0%    0.00B
 deta hyperbolic    5.60k   43.1ms   21.9%  7.69μs   90.3MiB   43.2%  16.5KiB
 ~rhs!~             5.60k   2.39ms    1.2%   427ns    879KiB    0.4%     161B
 source terms       5.60k   98.0μs    0.0%  17.5ns     0.00B    0.0%    0.00B
analyze solution        94   12.4ms    6.3%   132μs   5.18MiB    2.5%  56.5KiB
──────────────────────────────────────────────────────────────────────────────
On this branch
julia> trixi_include("examples/bbm_bbm_1d/bbm_bbm_1d_basic.jl", tspan = (0.0, 10.0));
[...]
  ──────────────────────────────────────────────────────────────────────────────
      DispersiveSWE                 Time                    Allocations      
                           ───────────────────────   ────────────────────────
    Tot / % measured:           5.64s /   4.3%           5.42GiB /   4.5%    

Section             ncalls     time    %tot     avg     alloc    %tot      avg
──────────────────────────────────────────────────────────────────────────────
rhs!                5.60k    225ms   93.2%  40.2μs    248MiB   98.3%  45.4KiB
 deta hyperbolic    5.60k   62.2ms   25.7%  11.1μs   90.3MiB   35.7%  16.5KiB
 dv hyperbolic      5.60k   60.6ms   25.1%  10.8μs    113MiB   44.7%  20.6KiB
 deta elliptic      5.60k   54.7ms   22.7%  9.77μs   22.6MiB    8.9%  4.12KiB
 dv elliptic        5.60k   42.9ms   17.8%  7.65μs   22.6MiB    8.9%  4.12KiB
 ~rhs!~             5.60k   4.64ms    1.9%   829ns   3.67KiB    0.0%    0.67B
 source terms       5.60k    117μs    0.0%  20.8ns     0.00B    0.0%    0.00B
analyze solution        94   16.4ms    6.8%   174μs   4.38MiB    1.7%  47.7KiB
──────────────────────────────────────────────────────────────────────────────
Do you have any idea where this might come from, @ranocha?

I'm not sure whether or not to consider this PR as breaking since it deletes the wrap_array, which was more or less internally. However, it would, e.g., break https://github.com/JoshuaLampert/2024_dispersive_shallow_water/blob/f62c32de647c2f9d6425130302d6ce7ae1d7de0a/code/traveling_wave/solution_k_0_8.jl#L38.

Closes #111.

Copy link

codecov bot commented Jun 17, 2024

Codecov Report

Attention: Patch coverage is 99.23077% with 1 line in your changes missing coverage. Please review.

Project coverage is 96.93%. Comparing base (3e35932) to head (0f5182c).

Files Patch % Lines
src/equations/svaerd_kalisch_1d.jl 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #118      +/-   ##
==========================================
+ Coverage   96.75%   96.93%   +0.18%     
==========================================
  Files          17       17              
  Lines        1233     1176      -57     
==========================================
- Hits         1193     1140      -53     
+ Misses         40       36       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ranocha
Copy link
Collaborator

ranocha commented Jun 18, 2024

This simplifies the code as no wrapping between different data structures is necessary anymore.

Thanks!

I opted for a VectorOfArray instead of an ArrayPartition because, as I understand, VectorOfArrays allow for something like q[i, :] or view(q, i, :), i.e. picking the values at one node for all variables, which would be more cumbersome for an ArrayPartition if I understand correctly. Are there any drawbacks of a VectorOfArray compared to ArrayPartition, @ranocha?

How often do you use this functionality? I would expect that this kind of slicing returns a Vector, i.e., it allocates. In Trixi.jl, we have our own accessor functions like get_node_vars to get SVectors instead without allocations.

Unfortunately, there is still a huge performance bottleneck. Interestingly, this is outside of rhs!, see the timings below
Do you have any idea where this might come from, @ranocha?

Maybe something weird is going on in the ODE solver? Could you maybe post a profile of the solve call?

I'm not sure whether or not to consider this PR as breaking since it deletes the wrap_array, which was more or less internally.

While it's technically not breaking since no public API was changed, the change is big enough (and some people like you may have used this internal API). Since I don't really see a big cost of making a breaking release, I would label it as breaking.

@JoshuaLampert
Copy link
Owner Author

Unfortunately, there is still a huge performance bottleneck. Interestingly, this is outside of rhs!, see the timings below
Do you have any idea where this might come from, @ranocha?

Maybe something weird is going on in the ODE solver? Could you maybe post a profile of the solve call?

The profile shows that the time is spent within broadcasting of vector_of_array.jl. I think I'm hitting SciML/RecursiveArrayTools.jl#199. I will try to use an ArrayPartition instead and define a get_node_vars function.

Since I don't really see a big cost of making a breaking release, I would label it as breaking.

👍

@JoshuaLampert
Copy link
Owner Author

I switched to an ArrayPartition in 6044989 and everything looks fine 👍.

@JoshuaLampert JoshuaLampert changed the title Use VectorOfArray from RecursiveArrayTools.jl Use ArrayPartition from RecursiveArrayTools.jl Jun 18, 2024
@JoshuaLampert JoshuaLampert requested a review from ranocha June 18, 2024 09:39
Copy link
Collaborator

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me 👍

I have not looked at everything in detail - should I?

I think it would be nice to add some tests for allocations of a full solve call to catch possible regressions like the one for VectorOfArrays

@ranocha
Copy link
Collaborator

ranocha commented Jun 18, 2024

Do you have a timing similar to the one in your first post for the latest version?

@JoshuaLampert
Copy link
Owner Author

I have not looked at everything in detail - should I?

No, I think you don't need to look at everything in detail. Maybe a brief look in src/solver.jl and src/semidiscretization.jl would be good since there are the most notable changes.

I think it would be nice to add some tests for allocations of a full solve call to catch possible regressions like the one for VectorOfArrays

I've added such tests in e94e833. Maybe something along those lines could also be interesting for a potential future package TrixiTest.jl, cf. trixi-framework/TrixiBase.jl#9?

Do you have a timing similar to the one in your first post for the latest version?

Sure!

this branch with ArrayPartition
 ──────────────────────────────────────────────────────────────────────────────
      DispersiveSWE                 Time                    Allocations      
                           ───────────────────────   ────────────────────────
    Tot / % measured:           191ms /  94.6%            254MiB /  99.6%    

Section             ncalls     time    %tot     avg     alloc    %tot      avg
──────────────────────────────────────────────────────────────────────────────
rhs!                5.60k    169ms   93.7%  30.2μs    248MiB   98.3%  45.4KiB
 deta hyperbolic    5.60k   42.8ms   23.7%  7.64μs   90.3MiB   35.7%  16.5KiB
 deta elliptic      5.60k   42.1ms   23.3%  7.52μs   22.6MiB    8.9%  4.12KiB
 dv elliptic        5.60k   41.6ms   23.0%  7.42μs   22.6MiB    8.9%  4.12KiB
 dv hyperbolic      5.60k   40.9ms   22.7%  7.31μs    113MiB   44.7%  20.6KiB
 ~rhs!~             5.60k   1.74ms    1.0%   311ns   3.67KiB    0.0%    0.67B
 source terms       5.60k   87.9μs    0.0%  15.7ns     0.00B    0.0%    0.00B
analyze solution        94   11.5ms    6.3%   122μs   4.38MiB    1.7%  47.7KiB
──────────────────────────────────────────────────────────────────────────────

@JoshuaLampert
Copy link
Owner Author

The failing test on mac looks pretty weird and unrelated - but it failed consistently in this PR.

@JoshuaLampert JoshuaLampert requested a review from ranocha June 21, 2024 08:30
Copy link
Collaborator

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@ranocha
Copy link
Collaborator

ranocha commented Jun 21, 2024

The CI failure is some homebrew failing stuff, so it's not your fault. I would just go ahead and merge this PR 👍

@JoshuaLampert JoshuaLampert merged commit 109fa71 into main Jun 21, 2024
9 of 10 checks passed
@JoshuaLampert JoshuaLampert deleted the vectorofarray branch June 21, 2024 11:33
@ranocha ranocha mentioned this pull request Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use ArrayPartition from RecursiveArrayTools.jl
2 participants