This is your new Kedro project with Kedro-Viz and PySpark setup, which was generated using kedro 0.19.6
.
Take a look at the Kedro documentation to get started.
In order to get the best out of the template:
- Don't remove any lines from the
.gitignore
file we provide - Make sure your results can be reproduced by following a data engineering convention
- Don't commit data to your repository
- Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in
conf/local/
Declare any dependencies in requirements.txt
for pip
installation.
To install them, run:
pip install -r requirements.txt
You can run your Kedro project with:
kedro run
Have a look at the files src/tests/test_run.py
and src/tests/pipelines/data_science/test_pipeline.py
for instructions on how to write your tests. Run the tests as follows:
pytest
To configure the coverage threshold, look at the .coveragerc
file.
To see and update the dependency requirements for your project use requirements.txt
. Install the project requirements with pip install -r requirements.txt
.
Further information about project dependencies
Note: Using
kedro jupyter
orkedro ipython
to run your notebook provides these variables in scope:catalog
,context
,pipelines
andsession
.Jupyter, JupyterLab, and IPython are already included in the project requirements by default, so once you have run
pip install -r requirements.txt
you will not need to take any extra steps before you use them.
To use Jupyter notebooks in your Kedro project, you need to install Jupyter:
pip install jupyter
After installing Jupyter, you can start a local notebook server:
kedro jupyter notebook
To use JupyterLab, you need to install it:
pip install jupyterlab
You can also start JupyterLab:
kedro jupyter lab
And if you want to run an IPython session:
kedro ipython
To automatically strip out all output cell contents before committing to git
, you can use tools like nbstripout
. For example, you can add a hook in .git/config
with nbstripout --install
. This will run nbstripout
before anything is committed to git
.
Note: Your output cells will be retained locally.
Further information about building project documentation and packaging your project