-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data read #85
base: main
Are you sure you want to change the base?
Add data read #85
Conversation
@stephprince when you get a chance ,could you please do a first code review of this PR to make sure this is heading in the right direction. I now have a first outline of one possible solution for how we might implement read. There is still a lot more work to be done before this PR is ready, but it would be useful if you could take a look before I go any further with this approach. I would start by looking at:
|
@stephprince I just added a documentation page as well, which hopefully helps explain the current proposed design for read so we can review and discuss. |
We decide not to do this and to continue testing only for the merged version. We decided to add a note in the developer docs to clarify this behavior. |
Add read for neurodata_types, e.g., Container, TimeSeries
@stephprince I synced the branch with the main branch. However, Windows tests are currently failing due to |
Windows tests are working again. I had to add boost multi-array to the windows Action to fix the include error for |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #85 +/- ##
==========================================
+ Coverage 90.32% 90.87% +0.55%
==========================================
Files 39 53 +14
Lines 1830 3289 +1459
Branches 126 229 +103
==========================================
+ Hits 1653 2989 +1336
- Misses 167 290 +123
Partials 10 10 ☔ View full report in Codecov by Sentry. |
* Add unit test for HDFIO::getH5ObjectType * Fix behavior of HDF5IO::getH5ObjectType when object does not exist * Add unit tests for HDF5IO::getNativeType * Add unit test for HDF5IO::getH5Type * Add unit test for HDF5IO::objectExists * Add unit tests for HDF5IO::attributeExists * add unit test for HDF5IO::getGroupObjects * Add unit test for HDF5IO::getStorageObjectType * Fix reading of arrays and select dtypes for HDF5IO::readAttribute * Add initial unit tests for HDF5IO::readAttribute * Fix reading of string array attributes via HDF5IO::readAttribute * Fix reading of reference attributes * Add HDF5IO::readAttribute unit test for invalid and double attribute * Add HDF5IO::readAttriute unit test for reading attribute from a dataset * Fix handling of non-existent file ReadOnly and ReadWrite mode * Add unit tests for HDF5IO::open for file modes * Add additonal unit tests for HDF5IO::readDataset * Fix element selection in HDF5IO::readDataset * Add unit test for reading hyperslap with HDF5IO::readDataset * Fix hyperslap selection with block in HDF5IO::readDataset * Expand unit tests for HDF5IO::readDataset with hyperslaps * Add note with TODO item to the tests * Improve coverage of numeric types in readDataset * Fix determining string size on read * Set string size when creating the dataset * Fix error in commmented unit test * Fix writing of fixed length string datasets * Add test writing fixed length string data * Fix reading of fixed length strings * Fix writing of variable length string datasets * Fix writing of variable length string datasets * Fix reading of variable length string dataset * Fix reading of empty string datasets * Minor enhancement to check for address santizer * Create RecordingData::writeStrinData to simplify writing of strings * Update tests/testHDF5IO.cpp Co-authored-by: Steph Prince <40640337+stephprince@users.noreply.github.com> * Update tests/testHDF5IO.cpp Co-authored-by: Steph Prince <40640337+stephprince@users.noreply.github.com> * Fix formatting --------- Co-authored-by: Steph Prince <40640337+stephprince@users.noreply.github.com>
This PR is to try and implement the proposed approach for data read from #83 to see if this approach is viable. This PR is experimental right now and should not be merged.
2. Proposed Implementation for reading data arrays
BaseReadData
ReadDatasetWrapper
andReadAttributeWrapper
classes for reading data. This is modified from the proposal, which suggested a singleBaseReadData
class for reading any array (datasets, attributes) from a file.BaseRecordingData
to inherit fromReadDataWrapper
because the two are not compatible right now.BaseReadData
uses the io and the path to access the data, whereasBaseRecordingData
leaves that choice to the I/O backend and stores the references to the dataset. We may or may not want to change this.BaseIO
BaseIO
(or more accurately I remove them) because: 1) I wanted to use the shared_ptr to the I/O rather than a raw pointer, which I can't get from BaseIO, and 2) with the ReadDatasetWrapper this is more approbriately done in the Container class directly.HDF5IO
BaseReadData
forHDF5
but left read logic toHDF5IO
itself so that theReadDatasetWrapper
can remain generic. To make this more manageable, I definedReadAttributeWrapper
separatelyHDF5ReadDataSet
andHDF5ReadAttribute
wrappers can call can call for readgetObjectType
method for getting the storage object type (Group, Dataset, Attribute) for a given pathContainer
io
object on theContainer
so that we can callio->readDataset
andio->readAttribute
in the read methodsNWB types:
TimeSeries
,ElectricalSeries
etc.Container
classes and replace them with access methods that returnBaseReadData
objects instead. This allows for reading in both read and write mode and avoids keeping data in memory that we have already written to disk. For example, inTimeSeries
, these variables would need to change to properties:aqnwb/src/nwb/base/TimeSeries.hpp
Lines 91 to 140 in e873d95
BaseReadData
for missing fields3. Proposed implementation for reading whole
Containers
(e.g., to read anElectricalSeries
)Container
that owns the respective objects, e.g.,NWBFile
owningElectricalSeries
objects to retrieve the objectContainer
to create an instance of the specificContainer
type using only theio
andpath
for theContainer
as input. The specificContainer
classes, such asTimeSeries
will then need to implement a corresponding constructor that usesio
andpath
as input.Step 1: Define the Template Factory Method in
Container
Step 2: Implement the constructors on the specific
Container
classes (e.g.,TimeSeries
)4. Proposed implementation for reading untyped groups (e.g.,
/acquisition
)I'm not sure we'll need do this, since a group by itself does not define data. To access the contents we could define access methods on the parent
Container
class (e.g.,NWBFile
) that owns the untyped group to access its contents.TODO
ReadDatasetWrapper
,ReadAttributeWrapper
(See Add read for neurodata_types, e.g., Container, TimeSeries #91)ReadDatasetWrapper
andReadAttributeWrapper
(I think this could be useful)Items moved to new issues
REGISTER_SUBCLASS_WITH_TYPENAME
#115Next steps
DEFINE_FIELDS
specified for read can be read. As part of writing the tests, we should also double-check for each type, that all fields that AqNWB writes, that they are also exposed for read.TimeSeries
See Add read tests forTimeSeries
#124ElectricalSeries
SpikeEventSeries
Device
NWBFile
. NOTE: For reading the NWBFile container, there is the special case thatNWBFile
requires the path to the "/" so this may require some additional logic in the tests.aqnwb/src/nwb/NWBFile.cpp
Lines 39 to 44 in bfa740a
ElectrodeTable
. NOTE: ElectrodeTable is currently not aneurodata_type
and usesREGISTER_SUBCLASS_WITH_TYPENAME
. I.e., on read this will return aDynamicTable
instead of anElectrodeTable
right now. See Support read with classes usingREGISTER_SUBCLASS_WITH_TYPENAME
#115ElectrodeGroup
DynamicTable
Data
VectorData
ElementIdentifiers
src/Utils.cpp , mergePaths()
functionBaseIO::findTypes
(possibly as part of the HDF5IO tests)HDF5IO
-specific functions (i.e., not inherited fromBaseIO
:HDF5IO::getH5ObjectType
. Implemented in Add HDF5IO tests for add_read PR #118HDF5IO::getNativeType
. Implemented in Add HDF5IO tests for add_read PR #118HDF5IO::getH5Type
. Implemented in Add HDF5IO tests for add_read PR #118HDF5IO
functions (which are core functions new inBaseIO
). While some of these are partially covered by theDEFINE_FIELDS
read tests, it will be useful to more fully test these separatelyHDF5IO::readDataset
(incl. with start, count, stride, and block parameters set). This is to test that reading subsets of larger arrays works as expected. Implemented in Add HDF5IO tests for add_read PR #118HDF5IO::open(FileMode mode)
test that file modes are being set correctly. Implemented in Add HDF5IO tests for add_read PR #118HDF5IO::readAttribute
. Implemented in Add HDF5IO tests for add_read PR #118HDF5IO::getStorageObjectType
. Implemented in Add HDF5IO tests for add_read PR #118HDF5IO::getGroupObjects
. Implemented in Add HDF5IO tests for add_read PR #118HDF5IO::attributeExists
. Implemented in Add HDF5IO tests for add_read PR #118HDF5IO::objectExists
. Implemented in Add HDF5IO tests for add_read PR #118HDF5IO::readDataset
to make sure it supports all supported types of BaseDataType, in partiuclar also strings and update theTEST_CASE("HDF5IO; read dataset", "[hdf5io]")
andTEST_CASE("HDF5IO; read dataset subset", "[hdf5io]")
(See Add HDF5IO tests for add_read PR #118 )HDF5IO
internal functions. These may not require separate dedicated tested, but may be covered via the tests of the public core interfaces functions. However, we should check that they are indeed covered:HDF5IO:: readDataHelper
(note, there are two variants of this function)HDF5IO::readStringDataHelper
(note there are two variants of this function)HDF5IO::getAttribute
cd build/dev/tests/data
and runnwbinspector . --threshold BEST_PRACTICE_VIOLATION
)colNames
fromDynamicTable
constructor toinitialize
RegisteredType
. This should be largely covered by the other tests, but we should check.ReadIO
. This should be largely covered by the other tests, but we should check.DEFINE_FIELD
create virtual functions and resolve this comment https://github.com/NeurodataWithoutBorders/aqnwb/pull/85/files#r1894223365 NOTE: I created Allow virtual function for DEFINE_FIELD #117 for this and added a note to the docs to example this. I think it is fine to address this later.DynamicTable
may require additional discussion on how to read columns. I think right now we would read individualVectorData
columns manually'boost/multi_array.hpp': No such file or directory