For Airflow Summit 2022, by @marcosmarcm and @evantahler from Airbyte
Links:
This project configures a sample data stack orchestrated by Airflow, using Airbyte to Extract and Load data, and dbt to Transform it.
-
Install Docker Desktop and Python 3 (if you are on MacOS, you already have Python 3).
-
Create
{HOME}/.octavia
and add the following credentials for using a local postgres database managed by Docker:
POSTGRES_HOST=host.docker.internal
POSTGRES_PASSWORD=password
POSTGRES_USERNAME=demo_user
POSTGRES_DATABASE=postgres
- Create the profile dbt in
{HOME}/.dbt/profiles.yaml
config:
partial_parse: true
printer_width: 120
send_anonymous_usage_stats: false
use_colors: true
normalize:
outputs:
prod:
dbname: postgres
host: host.docker.internal
pass: password
port: 5432
schema: public
threads: 8
type: postgres
user: demo_user
target: prod
- Run the whole data stack using
./tools/start.sh
. This will install local requirements (PyYAML) and run everything though Docker. The script will exit when complete, but the Docker containers will remain running.
In your browser:
- Visit http://localhost:8080/ to see the Airflow UI (user:
airflow
, password:airflow
) and your completed DAG. - Visit http://localhost:8000/ to see the Airbyte UI and your completed Sync.
- Visit your local postgres database (
localhost:5432
) with theusername=demo_user
andpassword=password
to see the staged and transformed data.
Run ./tools/stop.sh
to stop the Docker containers.
This repository is tested using Github Actions.