Jupytext is a Python package that provides two-way conversion between Jupyter notebooks and several other text-based formats like Markdown documents or scripts.
The text representation only contains the part of the notebook that you wrote (not the outputs). You get a cleaner diff history. Thanks to the two-way conversion, you can also act on the text file and then propagate the changes to the original .ipynb
file. Refactor your code or merge multiple contributions easily!
Open the notebook that you want to version control. Pair the notebook to a script or a Markdown file using the Jupytext Commands in JupyterLab.
Save the notebook, and you get two copies of the notebook: the original *.ipynb
file, together with its paired text representation.
Notebooks that contain more text than code are best represented as Markdown documents. These are conveniently edited in IDEs and are also well rendered on GitHub.
Saving notebooks as scripts is an appropriate choice when you want to act on the code (refactor the code, import it in another script or notebook, etc). Use the percent
format if you prefer to get explicit cell markers (compatible with VScode, PyCharm, Spyder, Hydrogen...). And if you prefer to get the minimal amount of cell markers, go for the light
format.
Go to our demo folder and see how our sample World population
notebook is represented in each format.
Yes! When you're done, reload the notebook in Jupyter. There, you will see the updated input cells combined with the matching output cells from the .ipynb
file.
Closing the notebook in Jupyter while you refactor it in another editor will help you avoid the message Untitled.ipynb has changed on disk. However, you don't really need to close the notebook. You can simply use Reload Notebook from disk to load the latest edits once you're done with the other editor. When you reload the notebook, the kernel variables are preserved (and the outputs too if the notebook is paired to an .ipynb
file), so you can continue your work where you left it.
The .ipynb
file contains the full notebook. The paired text file only contains the input cells and selected metadata. When the notebook is loaded by Jupyter, input cells are loaded from the text file, while the output cells and the filtered metadata are restored using the .ipynb
file. When the notebook is saved in Jupyter, the two files are updated to match the current content of the notebook.
Certainly. Open your pre-existing scripts or Markdown files as notebooks with the Open as Notebook menu in JupyterLab.
Output cells appear in the browser when you execute the notebook, but they are not written to the disk when you save the notebook.
The output cells are lost when you reload the notebook - if you want to avoid this, just pair the text file to an .ipynb
file.
If you want to convert text formats to notebooks programmatically, use one of
jupytext --to ipynb *.md # convert all .md files to notebooks with no outputs
jupytext --to ipynb --execute *.md # convert all .md files to notebooks and execute them
jupytext --set-formats ipynb,md --execute *.md # convert all .md files to paired notebooks and execute them
Conversions the other way use a similar format
jupytext --to md *.ipynb # convert all .ipynb files to .md files
That's possible! See how to activate or deactivate cells.
Unless you want to version the outputs, you should version only the text representation. The paired .ipynb
file can safely be deleted. It will be recreated locally the next time you open the notebook (from the text file) and save it.
Note that if you version both the .md
and .ipynb
files, you can configure git diff
to ignore the diffs on the .ipynb
files.
The synchronization between the two files happens when you reload and save the notebook in Jupyter, or when you explicitly run jupytext --sync
. If you want to force the synchronization on every commit, you could use jupytext
as a pre-commit hook.
By default, Jupyter tries to save your notebooks every 2 minutes. If you have edited the text representation in another editor, it will detect that and ask you if you want to either overwrite, or reload the notebook from disk.
You should simply click on Reload.
Note you can deactivate Jupyter's autosave function with the Autosave Document setting in JupyterLab (search for autosave in the advanced settings editor).
That happens if you have edited both the notebook and the paired text file at the same time... If you know which version you want to keep, save it and reload the other. If you want to compare and merge both versions, backup the text file (with e.g. git stash
), save the notebook, and merge the updated paired file with the backup (with e.g. git stash pop
). Then, refresh the notebook in Jupyter.
This happens if you have edited the .ipynb
file outside of Jupyter. This is a safeguard to avoid overwriting the notebook with an outdated text file.
In this case, a manual action is requested. Remove the paired .md
or .py
file if it is outdated, otherwise, edit and save it to update the file timestamp.
Jupytext is compatible with JupyterHub (execute pip install jupytext --user
to install it in user mode) and with Binder (add jupytext
to the project requirements).
If you use another editor than Jupyter Lab, you probably can't get Jupytext there. However, you can still use Jupytext at the command line to manually sync the two representations of the notebook:
jupytext --set-formats ipynb,py:light notebook.ipynb # Pair a notebook to a light script
jupytext --sync notebook.ipynb # Sync the two representations
Indeed, you could substitute every .ipynb
file in the project history with its Jupytext Markdown representation.
Technically this is available in just one command, which results in a complete rewrite of the history. Please experiment that in a branch, and think twice before pushing the result...
git filter-branch --tree-filter 'jupytext --to md */*.ipynb && rm -f */*.ipynb' HEAD
See the result and the cleaner diff history in the case of the Python Data Science Handbook.