-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add number of samples to read per chunk #308
Comments
Hi, if you're referring to the channel = tdms_file[group_name][channel_name]
chunk_size = 1024
for chunk_start in range(0, len(channel), chunk_size):
chunk_data = channel[chunk_start:chunk_start + chunk_size] Although under the hood this still reads a full chunk of raw data from the required segments and then trims any extra values off. Does this work for you, or if not, can you explain your use-case some more? |
Hi!, I'm trying to read per chunk too, but maybe as you wrote, to read a slice it need to read a full chunk. I have 7.2k^3 samples, are close to 54gb of raw data of float64 numbers. Every chunk uses close to 54gb of ram..... maybe is just a coincidence it close to the same value as the file size. If I try to read the full file python crashes, but chunk by chunk works.... well if I set 30GB of swap and use my 33gb of ram. I notice there must be something that "must" be read because when I try getting smaller slices always uses the same amount of ram, if you don't have enough ram numpy will stop says you don't have enough, there I know nptdms was always trying to store the same length even when I change the slice, still is lower than the size data. But use 54gb of ram is still too much, and not very efficient..... I want to do some statistics on the data, get distributions per min, 5 mins, or where the data is more or less persistent... still is hard actually do this. |
Right, it sounds like your file might just have one big chunk. And if the data is interleaved this is even worse as all channel data is read rather than skipping over data for other channels. So rather than always reading a full chunk, it sounds like we need to add the ability to read subsets of data from chunks, which isn't a trivial change. The place to start would be I don't have a lot of time to spend on npTDMS at the moment, but would be happy to accept a PR for this if you wanted to implement this. |
Hi, first sorry, I was writting the issue and for some reason was posted before I write it!
And, actually there is code to read data in chunks, I think would be great be able to pass the number of samples we want to read per chunk.
thx!
The text was updated successfully, but these errors were encountered: