The snooty parser has the following key parts:
- Drivers
main.py
language_server.py
- Parser
parser.py
rstparser.py
gizaparser/
- Types & Tools
flutter.py
types.py
util.py
Snooty drivers instantiate a parser and use it to interact with input reStructuredText and YAML files, and to create output artifacts and report diagnostics.
These drivers instantiate a parser.Project
object.
main.py
defines the main command-line Snooty interface. See CLI usage at head of file.
language_server.py
defines a
Language Server
for use with IDEs such as Visual Studio Code.
The parser.Project
class is the main driver-agnostic interface to
Snooty. It reads a snooty.toml
file to configure the project, and
parses each file with rstparser.Parser
. The general expected format of
the parsed directory is as follows:
reStructuredText directory
┣snooty.toml
┣source
┃ ┗images
┃ ┗image files here
┃ ┗includes
┃ ┗fact-some-content.rst
┃ ┗page-name
┃ ┗sub-page-name.txt
┃ ┗index.txt
┃ ┗page-name.txt
rstparser.Parser
is responsible for transforming input reStructuredText artifacts
(.rst & .txt) into our JSON AST format. It instantiates a visitor object
(unnecessarily parameterized; it's always parser.JSONVisitor
); creates
a docutils parser; passes the markup into it; and uses the visitor to
create the AST. The parent parser.Project
then calls the configured
callbacks to notify the backend of the parsed page.
The parser transforms Giza-style YAML files using the gizaparser
package. This uses the flutter
library to deserialize the YAML files
into Python classes, and check types to ensure there are no errors.
docutils
-interfacing components of the parser.
Each module in this package contains the infrastructure to parse a category
of Giza YAML file. The gizaparser.nodes
module contains generally-applicable
helper classes.
Tool to load arbitrary JSON/YAML/TOML data into a class and report schema violations.
Definitions of high level configuration components that make up the parser, ie. ProjectConfig
Node type definitions that the parser generates for front end consumption.
Helper functions to help the Parser in navigating through the reStructuredText. Also includes helper global classes ie. logger and cacher
-
Install Poetry
-
Set up the project's dependencies.
poetry install
-
Make your changes to the source code.
-
Run
make test
andmake format
to check that the tests pass and fix your formatting. This will also install the prerequirements defined in pyproject.toml. -
You can activate a shell where the
snooty
command is available by running:poetry shell snooty build <docs_property_path>
To run tests for a specific file:
poetry run pytest snooty/test_<file>.py
Install Coverage. After running tests via make format test
, run:
coverage html
This will generate an HTML representation of code coverage throughout the repo that can be viewed in the browser.
To run all linting, use make lint
. To format source code, use make format
.
To release snooty, do the following:
-
Make sure you are on the
main
branch. -
Ensure that the "Unreleased" section of CHANGELOG.md is up-to-date and commit any changes you've made.
-
Run
make cut-release BUMP_TO_VERSION=<new_version>
.The new version number should follow semantic versioning:
MAJOR.MINOR.PATCH
. For example,make cut-release BUMP_TO_VERSION=0.1.2
. Refer tosnooty/__init__.py
for the current version number.This will create commit(s) that update the version across changelogs and project config files example
This will also create a new tag named
v<new_version>
and push it to your origin, causing Github Actions to trigger the release process. After several minutes (you can monitor its progress at https://github.com/mongodb/snooty-parser/actions), a new release should be created with binaries for supported platforms.You can instruct the
cut-release
target to avoid pushing the tag by passing thePUSH_TO=""
option. For example,make cut-release BUMP_TO_VERSION=0.1.2 PUSH_TO=""
. -
Push your branch.
-
Go to https://github.com/mongodb/snooty-parser/releases/ to locate the newly-created release.
-
Copy the appropriate section from CHANGELOG.md into the release description, check the This is a pre-release checkbox, and create the release.
If there is an error, use git reset --hard <previous_commit_hash>
to revert any
commits that might have been made, and
git tag --delete v<version>; git push --delete origin v<version>
to remove the
tag if it was created.
- Transforming docutils nodes into our AST (parser.JSONVisitor) is currently a wretched mess.
- Flutter is currently a fork to add support for line numbers. We need to figure out a cleaner way of doing this so we can merge it into the upstream codebase.
- reStructuredText
- A markup language identified with the Python ecosystem.
- Abstract Syntax Tree (AST)
- A tree of nodes which reflect the syntactic structure of an unparsed textual document.
- Postprocessor
- The component of snooty which performs global link analysis and other forms of processing that spans multiple files.
- docutils
- The canonical parsing library for reStructuredText. We use our own vendored fork called
tinydocutils
, modified to be statically typed and less reliant on method dispatch through string manipulation. - Giza
- The primary entry point of the docs' original tech stack: it would download assets, generate reStructuredText from YAML files (hence the
gizaparser/
directory for compatibility), and invoke Sphinx. Giza is no longer used, and only lives on in our support for some of its.yaml
files. - Sphinx
- The primary unofficially official documentation toolchain for reStructuredText. The Snooty parser is effectively a from-scratch blackbox reimplementation.
- Project
- A directory with a
snooty.toml
file and a source directory containing reStructuredText source files. A project typically corresponds to a distinct site. - Page
- A page is a full self-contained document, typically corresponding to a
.txt
file. - Include
- A document fragment meant to be included in a Page or in other include files. Typically these are created
by
.rst
or giza.yaml
files. - Intersphinx
- A protocol by which project data can be shared with other projects. This data is encoded in an
objects.inv
file, and facilitates cross-project links. - Role
- A reStructuredText syntactic construct for custom inline behavior, roughly equivalent to a
<span>
in HTML. For example::ref:`A Link `
- Directive
- A reStructuredText syntactic construct for custom block behavior, roughly equivalent to a
<div>
in HTML. For example:.. note:: A note about something.