Data Sizes #158

ax3l · 2023-05-30T23:40:38Z

Thank you for the JOSS submission in openjournals/joss-reviews#5375 .

This is a follow-up question to #156.

In the design of this package, what are the envisioned data sizes for phase space data to be processed? Up to the size of a laptop RAM/single node?

I was looking at
https://bwheelz36.github.io/ParticlePhaseSpace/new_data_loader.html

and am wondering if not most of the operations here are map-reduce operations and could be implemented to stream over arbitrary data sizes, e.g., if large simulation data is being processed?

I did some experiments on processing such data with Dask:
openPMD/openPMD-api#963 (comment)

and wonder if something similar could be used as the backend here to scale up? 🚀

bwheelz36 · 2023-05-31T00:33:54Z

Hi @ax3l - yes, as you have noticed this code really is intended to work on data that can be loaded into memory. A work around would be to load and process data one 'chunk' at a time - this is already possible with the IAEA DataLoader, and could be (but has not been) implemented for other data loaders.

Early on I considered whether I should try and be more abstract framework for much larger datasets, and I had a look at polars instead of pandas, which could have enabled a 'lazy' evaluation. But ultimately, all the data I work with easily fits into memory (albeit my workstation has 128 Gb of RAM :-P) and I was sort of trying to solve problems I didn't have, so I just decided to keep it simple...

DASK looks interesting. At first glance - it seems more geared towards parallelizing operations, rather than memory management?

ax3l · 2023-06-05T15:50:31Z

Thank you for the details!

Yes, I think with pandas you might already have some support for chunked operations and upgrades to the mentioned backends could enable this in the future.

For DASK: parallelization includes memory management; often limited shared memory per node is the driving reason why one parallelizes :)

add limitations to docs as per #158

bwheelz36 · 2023-06-06T04:47:24Z

Hey @ax3l - I won't realistically be addressing this concern any time soon, but I think it is a very valid point - as such, I've added a page to the docs called limitations, which details this as what I think is the major limitation of this code at present...

ax3l · 2023-06-25T21:05:15Z

This is perfect and great scoping guidance for users and potential future directions! Thanks a lot.

I am closing this as part of the JOSS review, but feel free to reopen it if you like to keep it as a issue for tracking potential future developments/contributions.

ax3l mentioned this issue May 30, 2023

Additional Data Formats? #156

Open

bwheelz36 pushed a commit that referenced this issue Jun 6, 2023

add limitations to docs as per #158

786a604

bwheelz36 added a commit that referenced this issue Jun 6, 2023

Merge pull request #162 from bwheelz36/doc_update

3ec6e85

add limitations to docs as per #158

This was referenced Jun 10, 2023

Paper: Add limitations/future improvements #163

Closed

[REVIEW]: ParticlePhaseSpace: A python package for streamlined import, analysis, and export of particle phase space data openjournals/joss-reviews#5375

Closed

ax3l closed this as completed Jun 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Sizes #158

Data Sizes #158

ax3l commented May 30, 2023 •

edited

Loading

bwheelz36 commented May 31, 2023

ax3l commented Jun 5, 2023

bwheelz36 commented Jun 6, 2023

ax3l commented Jun 25, 2023 •

edited

Loading

Data Sizes #158

Data Sizes #158

Comments

ax3l commented May 30, 2023 • edited Loading

bwheelz36 commented May 31, 2023

ax3l commented Jun 5, 2023

bwheelz36 commented Jun 6, 2023

ax3l commented Jun 25, 2023 • edited Loading

ax3l commented May 30, 2023 •

edited

Loading

ax3l commented Jun 25, 2023 •

edited

Loading