- Python 3.7
- For parity with Debian GNU/Linux 10 (buster)
- Django 3.2
Both versions are specified in the Pipfile
.
This project is not intended to serve the license and deed pages directly. Though if it's deployed on a public server it could do that, performance would probably not be acceptable.
Instead, a command line tool can be used to save all the rendered HTML pages for licenses and deeds as files. Then those files are used as part of the real creativecommons.org site, just served as static files. See details farther down.
The creativecommons/cc-licenses-data project repository should be cloned into a directory adjacent to this one:
PARENT_DIR
├── cc-licenses
└── cc-licenses-data
If it is not cloned into the default location, the Django
DATA_REPOSITORY_DIR
django configuration setting, or the
DATA_REPOSITORY_DIR
environment variable can be used to configure its
location.
Use the following instructions to start the project with Docker compose.
- Initial Setup
- Ensure the Data Repository, above, is in place
- Install Docker (Install Docker Engine | Docker Documentation)
- Create Django local settings file
cp cc_licenses/settings/local.example.py cc_licenses/settings/local.py
- Build the containers
docker-compose build
- Run database migrations
docker-compose exec app ./manage.py migrate
- Clear data in the database
docker-compose exec app ./manage.py clear_license_data
- Load legacy HTML in the database
docker-compose exec app ./manage.py load_html_files
- Run the containers
docker-compose up
The commands above will create 3 docker containers:
- app (127.0.0.1:8000): this Djano application
- Any changes made to Python will be detected and rebuilt transparently as long as the development server is running.
- db: PostgreSQL database backend for this Django application
- static (127.0.0.1:8080): a static web server serving creativecommons/cc-licenses-data/docs.
- Development Environment
- Ensure the Data Repository, above, is in place
- Install dependencies
- Linux:
sudo apt-get install pandoc postgresql postgresql-contrib python3.7 python3.7-dev python3-pip
pip3 install pipenv
- macOS: via Homebrew:
brew install pandoc pipenv postgresql python@3.7
- Linux:
- Install Python 3.7 environment and modules via pipenv to create a
virtualenv
- Linux:
pipenv install --dev --python /usr/bin/python3.7
- macOS: via Homebrew:
pipenv install --dev --python /usr/local/opt/python@3.7/libexec/bin/python
- Linux:
- Install pre-commit hooks
pipenv run pre-commit install
- Configure Django and PostgreSQL
- Create Django local settings file
cp cc_licenses/settings/local.example.py cc_licenses/settings/local.py
-
Start PostgrSQL server
- It's completely fine to not make a specific postgresql account. But if you do wish to create a different user account for the project, Please refer to PostgreSQL: Documentation: Installation
- Linux:
sudo service postgresql start
- macOS:
brew services run postgres
-
Create project database
- Linux:
sudo createdb -E UTF-8 cc_licenses
- macOS:
createdb -E UTF-8 cc_licenses
- Linux:
-
Load database schema
pipenv run ./manage.py migrate
- Run development server (127.0.0.1:8000)
pipenv run ./manage.py runserver
- Any changes made to Python will be detected and rebuilt transparently as long as the development server is running.
NOTE: The rest of the documentation assumes Docker. If you are using a
manual setup, use pipenv run
instead of docker-compose exec app
for the
commands below.
- Python Guidelines — Creative Commons Open Source
- Black: the uncompromising Python code formatter
- Coverage.py: Code coverage measurement for Python
- flake8: a python tool that glues together pep8, pyflakes, mccabe, and third-party plugins to check the style and quality of some python code.
- isort: A Python utility / library to sort imports.
- pre-commit: A framework for managing and maintaining multi-language pre-commit hooks.
The coverage tests and report are run as part of pre-commit and as a GitHub Action. To run it manually:
- Ensure the Data Repository, above, is in place
- Ensure Docker Compose Setup, above, is complete
- Coverage test
docker-compose exec app coverage run manage.py test --noinput --keepdb
- Coverage report
docker-compose exec app coverage report
If you encounter an error: Error building trees
error from pre-commit when
you commit, try adding your files (git add <FILES>
) prior to committing them.
The license metadata is in a database. The metadata tracks which licenses exist, their translations, their ports, and their characteristics like what they permit, require, and prohibit.
The metadata can be downloaded by visiting URL path:
127.0.0.1:8000
/licenses/metadata.yaml
There are two main models (Django terminology for tables) in
licenses/models.py
:
LegalCode
Licenses
A License can be identified by a unit
(ex. by
, by-nc-sa
, devnations
)
which is a proxy for the complete set of permissions, requirements, and
prohibitions; a version
(ex. 4.0
, 3.0)
, and an optional jurisdiction
for ports. So we might refer to the license by it's identifier "BY 3.0 AM"
which would be the 3.0 version of the BY license terms as ported to the Armenia
jurisdiction. For additional information see: Legal Tools Namespace -
creativecommons/cc-licenses-data: CC Licenses data (static HTML, language
files, etc.).
There are three places legal code text could be:
- gettext files (
.po
and.mo
) in the creativecommons/cc-licenses-data repository (legal tools with full translation support):- 4.0 Licenses
- CC0
- django template
(
legalcode_licenses_3.0_unported.html
):- Unported 3.0 Licenses (English-only)
html
field (in theLegalCode
model):- Everything else
The text that's in gettext files can be translated via transifex at Creative Commons localization. For additional information the Django translation domaions / Transifex resources, see How the license translation is implemented, below.
Documentation:
The process of getting the text into the site varies by license.
Note that once the site is up and running in production, the data in the site will become the canonical source, and the process described here should not need to be repeated after that.
The implementation is the Django management command load_html_files
, which
reads from the legacy HTML legal code files in the
creativecommons/cc-licenses-data repository, and populates the
database records and translation files.
load_html_files
uses BeautifulSoup4 to parse the legacy HTML legal
code:
import_zero_license_html
for CC0 Public Domain tool- HTML is handled specificially (using tag ids and classes) to populate translation strings and to be used with specific HTML formatting when displayed via template
import_by_40_license_html
for 4.0 License tools- HTML is handled specificially (using tag ids and classes) to populate translation strings and to be used with specific HTML formatting when displayed via a template
import_by_30_unported_license_html
for unported 3.0 License tools (English-only)- HTML is handled specificially to be used with specific HTML formatting when displayed via a template
simple_import_license_html
for everything else- HTML is handled generically; only the title and license body are
identified. The body is stored in the
html
field of theLegalCode
model
- HTML is handled generically; only the title and license body are
identified. The body is stored in the
This process will read the HTML files from the specified directory, populate
LegalCode
and License
modelss, and create .po
files in
creativecommons/cc-licenses-data.
- Ensure the Data Repository, above, is in place
- Ensure Docker Compose Setup, above, is complete
- Clear data in the database
docker-compose exec app ./manage.py clear_license_data
- Load legacy HTML in the database
docker-compose exec app ./manage.py load_html_files
- Optionally (and only as appropriate):
- commit
.po
file changes in creativecommons/cc-licenses-data - Translation Update Process, below
- Generate Static Files, below
- commit
- Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation
- Quick start guide — polib 1.1.1 documentation
To upload/download translation files to/from Transifex, you'll need an account
there with access to these translations. Then follow the Authenticiation |
Introduction to the Transifex API | Transifex Documentation: to get
an API token, and set TRANSIFEX["API_TOKEN"]
in your environment with its
value.
The creativecommons/cc-licenses-data repository should be cloned
next to this cc-licenses
repository. (It can be elsewhere, then you need to
set DATA_REPOSITORY_DIR
to its location.) Be sure to clone using a URL that
starts with git@github...
and not https://github...
, or you won't be able
to push to it.
In production, the check_for_translation_updates
mangement command should be
run hourly. See Check for Translation
Updates, below.
Also see Publishing changes to git repo, below.
Babel is used for localization information.
Documentation:
Django Translation uses two sets of files in the creativecommons/cc-licenses-data repository (the Data Repository, above):
legalcode/
.po
and.mo
internationalization and localization files for Legal Codes- The file names and corresponding Transifex resource are different for each
tool.
- Formula:
- unit +
_
+ version +_
+ jurisdiction - strip out any periods (
.
)
- unit +
- Examples:
by-nd_40
by-nc-sa_30_es
zero_10
- Formula:
locale/
.po
and.mo
internationalization and localization files for Deeds and UX- The file names and corresponding Transifex resource slug are all
deeds_ux
(DEEDS_UX_RESOURCE_SLUG
in the settings).
The internationalization and localization file details:
.mo
machine object files- generated by the
compilemessages
command (see Translation Update Process, below) - ingested by this application and used by the
publish
command (see Generate Static Files, below)
- generated by the
.po
portable object files- generated by the
check_for_translation_updates
command (see Check for Translation Updates, below)legalcode/
: initially generated by theload_html_files
command (see Import Process, above)locale/
: initially generated by themakemessages
command
- ingested by the
compilemessages
command (see Translation Update Process, below)
- generated by the
Documentation:
The language codes used within this application and for the internationalization and localization directory structure are Django language codes.
Definitions:
- Django language codes are lowercase IETF language
tags
- Examples:
de-at
,oc-aranes
,sr-latn
,zh-hant
- Examples:
- Transifex langauge codes are POSIX locales
- Examples:
de_AT
,oc@aranes
,sr@latin
,zh_Hant
- Examples:
- Legacy language codes include:
- POSIX locales
- Examples (see above)
- convential IETF language tags
- Examples:
sr-Latn
,zh-Hant
- Examples:
- POSIX locales
Mappings:
- Legacy language codes are mapped to Django language codes by by the
load_html_files
command (see Import Process, above). - Django language codes are mapped to Transifex langauge codes by the
check_for_translation_updates
command (see Check for Translation Updates, below). - Django language codes are mapped to Legacy language codes by the
publish
command (see Generate Static Files, below) to create redirects.
Documentation:
- Django Language Codes:
127.0.0.1:8000
/dev/status/
Translation Statusdjango/django
:django/conf/global_settings.p
: Lines 50-148
- Transifex Language Codes:
127.0.0.1:8000
/dev/status/
Translation Status- Languages on Transifex
- References:
- IETF language tag - Wikipedia
- RFC5646 Tags for Identifying Languages
- ISO 639-1 - Wikipedia Codes for the representation of names of languages—Part 1: Alpha-2 code
- ISO 639-2 - Wikipedia Codes for the representation of names of languages — Part 2: Alpha-3 code
- ISO 3166-1 - Wikipedia Codes for the representation of names of countries and their subdivisions – Part 1: Country codes
- ISO 15924 - Wikipedia Codes for the representation of names of scripts
- RFC5646 Tags for Identifying Languages
- POSIX platforms - Locale (computer software) - Wikipedia (POSIX Locale)
- IETF language tag - Wikipedia
The hourly run of check_for_translation_updates
looks to see if any of the
translation files in Transifex have newer last modification times than we know
about. It performs the following process (which can also be done manually:
- Ensure the Data Repository, above, is in place
- Within the creativecommons/cc-licenses-data (the Data
Repository):
- Checkout or create the appropriate branch.
- For example, if a French translation file for BY 4.0 has changed, the
branch name will be
cc4-fr
.
- For example, if a French translation file for BY 4.0 has changed, the
branch name will be
- Download the updated
.po
file from Transifex - Do the Translation Update Process (below)
- This is important and easy to forget, but without it, Django will keep using the old translations
- Commit that change and push it upstream.
- Checkout or create the appropriate branch.
- Within this
cc-licenses
repository:- For each branch that has been updated, Generate Static Files (below). Use the options to update git and push the changes.
- GitPython Documentation — GitPython 3.1.18 documentation
- Requests: HTTP for Humans™ — Requests 2.26.0 documentation
This Django Admin command must be run any time the .po
files are created or
changed.
- Ensure the Data Repository, above, is in place
- Ensure Docker Compose Setup, above, is complete
- Compile translation messages (update
.mo
files)docker-compose exec app ./manage.py compilemessages
We've been calling this process "publishing", but that's a little misleading,
since this process does nothing to make its results visible on the Internet. It
only updates the static files in the doc
directory of the
creativecommons/cc-licenses-data repository (the Data
Repository, above).
This process will write the HTML files in the cc-licenses-data clone directory
under docs/
. It will not commit the changes (--nogit
) and will not push any
commits (--nopush
is implied by --nogit
).
- Ensure the Data Repository, above, is in place
- Ensure Docker Compose Setup, above, is complete
- Compile translation messages (update
.mo
files)docker-compose exec app ./manage.py publish --nogit --branch=main
When the site is deployed, to enable pushing and pulling the licenses data repo
with GitHub, create an ssh deploy key for the cc-licenses-data repo with write
permissions, and put the private key file (not password protected) somewhere
safe (owned by www-data
if on a server), and readable only by its owner
(0o400). Then in settings, make TRANSLATION_REPOSITORY_DEPLOY_KEY
be the full
path to that deploy key file.