A DataHerb Core Service to Bundle the Datasets into Flora.
DataHerb is an open data initiative to make the access of open datasets easier.
- A DataHerb or Herb is a dataset. A dataset comes with the data files, and the metadata of the data files.
- A DataHerb Leaf or Leaf is a data file in the DataHerb.
- A Flora is the combination of all the DataHerbs.
In many data projects, finding the right datasets to enhance your data is one of the most time consuming part. DataHerb adds flavor to your data project.
We desigined the following workflow to share and index datasets.
This repository is being used for listing of datasets (Listings in DataHerb flora repository).
Simply create a yml
file in the flora
folder to link to your dataset repository. Your dataset repository should have a .dataherb
folder and a metadata.yml
file in it.
The indexing part will be done by GitHub Actions.
There are three components to build the dataset index.
- dataherb-flora: Index datasets using yml files.
- dataherb-metadata-aggregator: Aggregrates all information about the datasets and create database.
- dataherb.github.io: Builds the website using the database.
Some packages are also created to make the access and creation of the datasets easier. Refer to the website for the details.