This repo contains the backend code for:
- https://analytics.pulpproject.org/ - The production site is the main branch
- https://dev.analytics.pulpproject.org/ - The dev site is the dev branch
At a high level, the metrics data flows like this:
- Pulpcore gathers and posts analytics daily data from each installation
- The analytics site receives and stores the data without summarization
- Once a day the data is summarized via a django command called on a cron job. This also cleans old raw data after some time.
- The charts on the site are visualized from the summary data
Pulpcore installations gather the metrics and submit them to either the dev or prod site depending on what the version strings of the pulp components are. If all version strings are GA releases its sent to the production sent, otherwise it's sent to the dev site. See the get_analytics_posting_url() code.
Analytics payload is submitted to the server via a Protocol Buffer definition, which is defined here. The pulpcore code gathers the analytics data and constructs the analytics payload in this module.
The protocol buffer definition is compiled locally with the commands below and checked-in here in this repo and here in pulpcore.
sudo dnf install protobuf # Install it anyway you want
cd analytics.pulpproject.org # The command below assumes you are in the root dir
protoc --python_out pulpanalytics/ ./analytics.proto # Copy this to pulpcore also
The analytics data POST is handled here using the protobuf object. The pieces are then saved as model instances all of which foreign key to a single System object which stores the datetime of submission.
Summarization occurs when an openshift cron-job in the dev or prod site calls the following command
every 24 hours: ./manage.py summarize
. This executes
this code.
The summarize command uses a separate protobuf definition which can be compiled with commands below and stored here.
sudo dnf install protobuf # Install it anyway you want
cd analytics.pulpproject.org # The command below assumes you are in the root dir
protoc --python_out pulpanalytics/ ./summary.proto # This only lives on the server side (this repo)
A summary is produced for each 24-hour period and stores it as json data in a DailySummary instance. How each analytics metric is summarized is beyond the scope of this document, look at the code and the proposals for each analytics metric (which should outline summarization).
Visualizing is done using Chart.js and is handled by this get view which uses this template. This goal of this code is to read all summary data and collate it into Chart.js data structures.
- Create (or activate) a virtualenv for your work to live in:
python3 -m venv analytics.pulpproject.org
source analytics.pulpproject.org/bin/activate
- Clone and Install Dependencies
git clone https://github.com/pulp/analytics.pulpproject.org.git
cd analytics.pulpproject.org
pip install -r requirements.txt
- Start the database
I typically use the official postgres container with podman to provide the database locally with the commands below (taken from this article).
Fetch the container with: podman pull docker.io/library/postgres
. Afterwards you can see it listed
with podman images
.
Start the container with: podman run -dt --name my-postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 postgres
Connect to the db with psql
using podman exec -it my-postgres bash
. Then connect with user
postgres
which is the default of the postgres container. Here's a full example:
[bmbouter@localhost analytics.pulpproject.org]$ podman exec -it my-postgres bash
root@f70daa2ab15f:/# psql --user postgres
psql (14.5 (Debian 14.5-1.pgdg110+1))
Type "help" for help.
postgres=# \dt
Did not find any relations.
postgres=#
- Set the
APP_KEY
The app uses an environment variable APP_KEY
to specify the Django SECRET_KEY
here.
You need to set a random string as the APP_KEY
.
export APP_KEY="ceb0c58c-5789-499a-881f-410aec5e1003"
Note: The APP_KEY
is just a random string here.
If using the default values of the postgresql image this isn't needed, but optionally, if you want to specify db connection info that also happens as environment variables here. If you did want to set them you could do it like:
export DB_DATABASE="postgres"
export DB_USERNAME="postgres"
export DB_PASSWORD="postgres"
export DB_HOST="localhost"
- Apply Migrations
Apply migrations with ./manage.py migrate
- Create a superuser (if you want to use the Admin site)
./manage.py createsuperuser
- Run the server
./manage.py runserver 0.0.0.0:8000
You can then load the page at http://127.0.0.1:8000/
or the Admin site at
http://127.0.0.1:8000/admin/
.
Note, the 0.0.0.0:8000
is optional if you only want to receive requests on localhost, but with
pulp typically running in an oci_env environment, you likely want
to have it listen on all interfaces.
Right now to test a local Pulp installation, you need to in-code modify Pulp to post data to your local telemetry installation. This is done by applying this diff:
diff --git a/pulpcore/app/tasks/telemetry.py b/pulpcore/app/tasks/telemetry.py
index 3ca9c0fb4..e4c2c30f1 100644
--- a/pulpcore/app/tasks/telemetry.py
+++ b/pulpcore/app/tasks/telemetry.py
@@ -19,7 +19,8 @@ logger = logging.getLogger(__name__)
PRODUCTION_URL = "https://analytics.pulpproject.org/"
-DEV_URL = "https://dev.analytics.pulpproject.org/"
+# DEV_URL = "https://dev.analytics.pulpproject.org/"
+DEV_URL = "http://host.containers.internal:8000/"
def get_telemetry_posting_url():
Additionally, ensure your telemetry environment is listening on all interfaces by having
0.0.0.0:8000
in your runserver
command, e.g. ./manage.py runserver 0.0.0.0:8000
.
Summarize data by calling ./manage.py summarize
.
This will not summarize data posted "today" because it's not a full summary yet, so for testing it can be helpful to backdate data.
Stop and delete it with the commands below. Then restart the container and reapply migrations.
podman stop my-postgres
podman rm my-postgres
The normal workflow is:
- Develop your changes locally and open a PR against the
dev
branch. - Merge the PR, and after about 5ish minutes, your changes should show up at
https://dev.analytics.pulpproject.org/
. - Test your changes on the
https://dev.analytics.pulpproject.org/
site. - Open a PR that merges
dev
intomain
. When this is merged after 5ish minutes your changes should show up onhttps://analytics.pulpproject.org/
.
It can be useful to export data from the production or development sites into a local development environment. This is especially useful when developing summarization from production raw data, or when developing visualization of production visualized data. This is a two-step process: 1) export the data from the production site. 2) import it into your local dev environment.
This will work for either analytics.pulpproject.org (prod) or dev.analytics.pulpproject.org (dev).
You will need openshift access to the ./manage.py
environment to be able to do this.
- Login to openshift with the
oc
client - Select the site you want to use, e.g. production by running:
oc project prod-analytics-pulpproject-org
- Login to the production pod with oc using
oc exec dc/pulpanalytics-app -ti -- bash
- Export the database using
./manage.py dumpdata --output /tmp/data.json pulpanalytics
- Move the file to your local machine by using something like
oc rsync pulpanalytics-app-12-kxttd:/tmp/data.json /tmp/
. Note, the pod name changes each time, so you'll need to get that from openshift when you go to run this command.
- Apply migrations to the same point as the remote DB using
./manage.py migrate
- Import the data using:
./manage.py loaddata /tmp/data.json
If testing summarization, you might want to go into the admin interface and delete some recent
DailySummary
objects to cause your ./manage.py summarize
to run your local summarization code.