-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading huge amount of data takes a lot of time #249
Comments
Hi The time it takes to read data will depend a lot on the TDMS file structure and the type of data being read, eg. timestamp data reading is more complicated than reading plain floats, and data with many small segments will take longer to read. Interleaved data will also take longer than non-interleaved data as the data points for a single channel are not contiguous. It's hard to say without an example file to use for profiling why your specific files would take a long time to read. Eg. if I read data from a file with a similar number of data points but only a single channel of floats, this takes < 1 second on my machine. Are you able to provide one of your files for testing? There isn't a way to get the positions of all data points, and I'm doubtful that this would allow you to read the data any faster. numpy/numpy#13319 indicates that using |
I've pushed a change to the |
Hi,
file: larger.tdms
file: smaller.tdms
If you don't see any downside to this feature branch, then the numbers are definitely in favor and would be much appreciated. |
Hi @axel-kah, thanks for testing this. I also found some reasonable speed ups in my tests and don't see a reason not to make this change so will merge that branch. |
A comment that might help the original poster: I have used 2 GB tdms files without any problems with nptdms. |
I have the same issue with a 260 MB file, one group and 499 channels 86400 rows long floats. On my hp elite x2 laptop it takes ~90 sec per channel. Probably it's "bad" file issue.
I also don't use built in *.as_dataframe() method. I create the DataFrame from the list returned from read_TDMS_Data_Parallel() function:
|
I have ~1GB TDMS files I would like to read in as dataframes. I am currently using the built-in as.dataframe() method like so to read in only certain channels:
I would like to read them in parallel using the multi-processing package. What does the "read_TDMS_Data" function look like in your "read_TDMS_Data_Parallel" function ? |
@spri902 sorry for late reply.. tdms_data_path and gr_name I had to define as global because There are 499 channels in my TDMS file and I read only some of them, that are relevant. That is why I'm using predefined
|
My TDMS files are indeed very fragmented. |
Hello,
I don't know why, but when I try to read out a huge amount of data, it'll takes several minutes.
`from nptdms import TdmsFile
tdms_file = TdmsFile.open("file.tdms", "r")
that takes a lot of time:
l_data = tdms_file[group_name][channel_name][0:5000000]`
This takes a lot of time as well:
`import tempfile
with tempfile.TemporaryDirectory() as temp_memmap_dir:
tdms_file = TdmsFile.read("file.tdms", memmap_dir=temp_memmap_dir)`
Is there a way to get the offset (adress of pointer) of every element in the file?
like:
p_element = tdms_file[group_name][channel_name][0].tell()
The text was updated successfully, but these errors were encountered: