Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update project structure #25

Merged
merged 5 commits into from
Apr 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
name: Run dbt Cloud CI job

name: dbt Cloud CI
on:
pull_request:
branches:
- main

- staging
jobs:
run_snowflake:
name: dbt Cloud PR CI Snowflake
Expand Down
55 changes: 55 additions & 0 deletions .github/workflows/prod_cd.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: dbt Cloud Deploy

on:
push:
branches:
- main

jobs:
run_snowflake:
name: dbt Cloud Deploy Prod Snowflake
runs-on: macos-latest

env:
DBT_ACCOUNT_ID: 188483
DBT_PROJECT_ID: 283328
DBT_PR_JOB_ID: 409009
DBT_API_KEY: ${{ secrets.DBT_CLOUD_API_KEY }}
DBT_JOB_CAUSE: "GitHub Actions Request"
DBT_JOB_BRANCH: main

steps:
- uses: "actions/checkout@v4"
- uses: "actions/setup-python@v5"
with:
python-version: "3.12"
- name: Install uv
run: python3 -m pip install uv
- name: Install deps
run: uv pip install -r requirements.txt --system
- name: Run dbt Cloud job
run: python3 .github/workflows/scripts/dbt_cloud_run_job.py

run_bigquery:
name: dbt Cloud Deploy Prod BigQuery
runs-on: macos-latest

env:
DBT_ACCOUNT_ID: 188483
DBT_PROJECT_ID: 275557
DBT_PR_JOB_ID: 553247
DBT_API_KEY: ${{ secrets.DBT_CLOUD_API_KEY }}
DBT_JOB_CAUSE: "GitHub Actions Request"
DBT_JOB_BRANCH: main

steps:
- uses: "actions/checkout@v4"
- uses: "actions/setup-python@v5"
with:
python-version: "3.12"
- name: Install uv
run: python3 -m pip install uv
- name: Install deps
run: uv pip install -r requirements.txt --system
- name: Run dbt Cloud job
run: python3 .github/workflows/scripts/dbt_cloud_run_job.py
55 changes: 55 additions & 0 deletions .github/workflows/staging_cd.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: dbt Cloud Deploy Staging

on:
push:
branches:
- staging

jobs:
run_snowflake:
name: dbt Cloud Deploy Staging Snowflake
runs-on: macos-latest

env:
DBT_ACCOUNT_ID: 188483
DBT_PROJECT_ID: 283328
DBT_PR_JOB_ID: 565266
DBT_API_KEY: ${{ secrets.DBT_CLOUD_API_KEY }}
DBT_JOB_CAUSE: "GitHub Actions Request"
DBT_JOB_BRANCH: main

steps:
- uses: "actions/checkout@v4"
- uses: "actions/setup-python@v5"
with:
python-version: "3.12"
- name: Install uv
run: python3 -m pip install uv
- name: Install deps
run: uv pip install -r requirements.txt --system
- name: Run dbt Cloud job
run: python3 .github/workflows/scripts/dbt_cloud_run_job.py

run_bigquery:
name: dbt Cloud Deploy Staging BigQuery
runs-on: macos-latest

env:
DBT_ACCOUNT_ID: 188483
DBT_PROJECT_ID: 275557
DBT_PR_JOB_ID: 560539
DBT_API_KEY: ${{ secrets.DBT_CLOUD_API_KEY }}
DBT_JOB_CAUSE: "GitHub Actions Request"
DBT_JOB_BRANCH: main

steps:
- uses: "actions/checkout@v4"
- uses: "actions/setup-python@v5"
with:
python-version: "3.12"
- name: Install uv
run: python3 -m pip install uv
- name: Install deps
run: uv pip install -r requirements.txt --system
- name: Run dbt Cloud job
run: python3 .github/workflows/scripts/dbt_cloud_run_job.py
9 changes: 0 additions & 9 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,3 @@ repos:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
- repo: https://github.com/sqlfluff/sqlfluff
rev: "3.0.3"
hooks:
- id: sqlfluff-fix
additional_dependencies:
[
"dbt-metricflow[snowflake,bigquery,postgres]~=0.6.0",
"sqlfluff-templater-dbt~=3.0.3",
]
113 changes: 89 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,19 @@

This is a sandbox project for exploring the basic functionality and latest features of dbt. It's based on a fictional restaurant called the Jaffle Shop that serves [jaffles](https://en.wikipedia.org/wiki/Pie_iron). Enjoy!

## Table of contents

1. [Create new repo from template](#create-new-repo-from-template)
2. [Platform setup](#platform-setup)
1. [dbt Cloud IDE](<#dbt-cloud-ide-(most-beginner-friendly)>)
2. [dbt Cloud CLI](<#dbt-cloud-cli-(if-you-prefer-to-work-locally)>)
3. [Project setup](#project-setup)
1. [With `task`](#with-task)
2. [Manually](#manually)
4. [Advanced options](#advanced-options)
1. [Working with a larger dataset](#working-with-a-larger-dataset)
2. [Pre-commit and SQLFluff](#pre-commit-and-sqlfluff)

## Create new repo from template

1. <details>
Expand All @@ -12,11 +25,11 @@ This is a sandbox project for exploring the basic functionality and latest featu
![Click 'Use this template'](/.github/static/use-template.gif)
</details>

2. Follow the steps to create a new repository.
2. Follow the steps to create a new repository. You should choose the option to copy all branches. The project is set up with `staging` as the default branch, a best practice we want to model for you. In a setup with a Write-Audit-Publish (WAP) flow, you have a `main` branch that serves production data (like downstream dashboards) and is tied to a Production Environment in dbt Cloud, and a `staging` branch that serves a clone of that data and is tied to a Staging Environment in dbt Cloud. You then branch off of `staging` to add new features or fix bugs, and merge back into `staging` when you're done. When you're ready to deploy to production, you merge `staging` into `main`. Staging is meant to be more-or-less a mirror of production, but safe to test breaking changes, so you can verify changes in a production-like environment before deploying them fully.

## Platform setup

1. Set up a dbt Cloud account and follow Step 4 in the [Quickstart instructions for your data platform](https://docs.getdbt.com/quickstarts), to connect your platform to dbt Cloud, then follow one of the two paths below to set up your development environment.
1. Set up a dbt Cloud account (if you don't have one already, if you do, just create a new project) and follow Step 4 in the [Quickstart instructions for your data platform](https://docs.getdbt.com/quickstarts), to connect your platform to dbt Cloud, then follow one of the two paths below to set up your development environment.

### dbt Cloud IDE (most beginner friendly)

Expand All @@ -27,51 +40,103 @@ This is a sandbox project for exploring the basic functionality and latest featu
### dbt Cloud CLI (if you prefer to work locally)

> [!NOTE]
> If you'd like to use the dbt Cloud CLI, but are a little intimidated by the terminal, we've included a task runner called, fittingly, `task`. It's a simple way to run the commands you need to get started with dbt. You can install it by following the instructions [here](https://taskfile.dev/#/installation). We'll call out the `task` based alternative to each command below. You can also run `task setup` to perform all the setup commands at once.
> If you'd like to use the dbt Cloud CLI, but are a little intimidated by the terminal, we've included configuration for a _task runner_ called, fittingly, `task`. It's a simple way to run the commands you need to get started with dbt. You can install it by following the instructions [here](https://taskfile.dev/#/installation). We'll call out the `task` based alternative to each command below.

1. Run `git clone [new repo name]` (or `gh repo clone [repo owner]/[new repo name]` if you prefer GitHub's excellent CLI) to clone your new repo from the first step to your local machine.

1. Run `git clone [new repo name]` to clone your new repo to your local machine.
2. [Follow the steps on this page](https://cloud.getdbt.com/cloud-cli) to install and set up a dbt Cloud connection with the dbt Cloud CLI.

2. [Follow Step 1 on this page](https://cloud.getdbt.com/cloud-cli) to install the dbt Cloud CLI, we'll do the other steps in a second.
> [!TIP]
> If you're using `task`, once you have dbt Cloud CLI setup, you can run `task setup` to skip all the rest of this and run all the setup commands in one easy command. We recommend it!

3. Set up a virtual environment and activate it. I like to call my virtual environment `.venv` and add it to my `.gitignore` file (we've already done this if you name your virtual environment '.venv') so that I don't accidentally commit it to the repository, but you can call it whatever you want.
3. Set up a virtual environment and activate it. I like to call my virtual environment `.venv` and add it to my `.gitignore` file (we've already done this if you name your virtual environment '`.venv`') so that I don't accidentally commit it to the repository, but you can call it whatever you want, just make sure you `.gitignore` it.

```shell
python3 -m venv .venv # create a virtual environment
# create a virtual environment
python3 -m venv .venv
# activate the virtual environment
source .venv/bin/activate
OR
task venv # create a virtual environment

source .venv/bin/activate # activate the virtual environment
# create a virtual environment
task venv
```

4. Install the project's requirements into your virtual environment.

```shell
python3 -m pip install -r requirements.txt # install the project's requirements
# upgrade pip (always a good idea to do first!)
python3 -m pip install --upgrade pip
# install the project's requirements
python3 -m pip install -r requirements.txt
OR
task install # install the project's requirements
# install the project's requirements
task install
```

5. [Follow steps 2 and 3 on this page](https://cloud.getdbt.com/cloud-cli) to setup dbt Cloud CLI's connection to dbt Cloud, only if you haven't already done so (we handled step 1 above and will do step 4 together next).

6. Double check that your `dbt_project.yml` is set up correctly by running `dbt list`. You should get back a list of models and tests in your project.
5. Double checkthat your `dbt_project.yml` is set up correctly by running `dbt list`. You should get back a list of models and tests in your project.

## Project setup

Once your development platform of choice is set up, use the following steps to get the project ready for whatever you'd like to do with it.
Once your development platform of choice and dependencies are set up, use the following steps to get the project ready for whatever you'd like to do with it.

1. Run `dbt build` to load the sample data into your raw schema, build your models, and test your project.
### With `task`

2. Delete the `jaffle-data` directory now that the raw data is loaded into the warehouse. It will be loaded into a `raw_jaffle_shop` schema in your warehouse. That both `dev` and `prod` targets are set up to use. Take a look at the `generate_schema_name` macro in the `macros` directory to if you're curious how this is done.
1. Run `task gen` to generate a year of synthetic data for the Jaffle Shop.

2. Run `task build` to seed the generated data into your warehouse and build the project.

3. Run `task clean` to delete the generated data to avoid re-seeding the same data repeatedly for no reason.

#### OR

1. Run `task build`.
### Manually

## Pre-commit and linting with SQLFluff
> [!NOTE]
> dbt Cloud CLI has a limit on the size of seed files that can be uploaded to your data warehouse. Seeds are _not_ meant for data loading in production, they're meant for small reference tables, we just use them for convenience here. If you want to generate more than the default 1 year of `jafgen` data, you'll need to use dbt Core to seed the data. We'll cover how to do this soon.

This project uses a tool called [pre-commit](https://pre-commit.com/) to automatically run a suite of of processes on your code, like linters and formatters, when you commit. If it finds an issue and updates a file, you'll need to stage the changes and commit them again (the first commit will not have gone through because pre-commit found and fixed an issue). The outcome of this is that your code will be more consistent automatically, and everybody's changes will be running through the same set of processes. We recommend it for any project. You can see the configuration for pre-commit in the `.pre-commit-config.yaml` file. You can run the checks manually with `pre-commit run --all-files` to see what it does without making a commit.
1. In your activated virtual environment with dependencies installed, run `jafgen` to generate a year of synthetic data for the Jaffle Shop, no arguments are necessary for the defaults.

The most important pre-commit hook that runs in this project is [SQLFluff](https://sqlfluff.com/), which will lint your SQL code. It's configured with the `.sqlfluff` file in the root of the project. You can also run this manually, either to lint your code or to fix it automatically (which also functions loosely as a fairly relaxed formatter), with `pre-commit run sqlfluff-lint` or `pre-commit run sqlfluff-fix` respectively, but if you don't, it will still run whenever you commit to ensure the committed code is consistent.
2. Run `dbt deps` to install the dbt packages configured in the `packages.yml` file.

> [!NOTE]
> SQLFluff's dbt templater relies on dbt Core, which conflicts with dbt Cloud CLI for the time being. Thankfully, pre-commit installs its hooks into isolated environments, so you can still use SQLFluff with dbt Cloud CLI via pre-commit, but you can't call SQLFluff directly. The dbt Labs team is actively working on a solution for this issue.
3. Run `dbt seed` to seed the generated data into your warehouse.

4. Delete the generated data to avoid re-seeding the same data repeatedly for no reason, slowing down your build process.

```shell
rm -rf jaffle-data
```

5. Run `dbt build` to build and test the project, make sure you deleted the generated data first or you'll be re-seeding the same data.

## Advanced options

### Working with a larger dataset

[`jafgen`](https://github.com/dbt-labs/jaffle-shop-generator) is a simple tool for generating synthetic Jaffle Shop data that is maintained on a volunteer-basis by dbt Labs employees. This project is more interesting with a larger dataset generated and uploaded to your warehouse. 6 years is a nice amount to fully observe trends like growth, seasonality, and buyer personas that exist in the data. Uploading this amount of data requires a few extra steps, but we'll walk you through them. If you have a preferred way of loading CSVs into your warehouse or an S3 bucket, that will also work just fine, the generated data is just CSV files.

1. Make sure your virtual environment is activated and you have the dependencies installed, this will install the `jafgen` CLI tool.
2. `pip install dbt-core dbt-[your warehouse adapter]`. For example, if you're using BigQuery, you would run `pip install dbt-core dbt-bigquery`. dbt Core is required temporarily to seed the larger files, we'll uninstall it in the final step to avoid conflicts over the `dbt` command.
3. Because you have an active virtual environment, this new install of `dbt` should take precedence in your [`$PATH`]($PATH`). If you're not familiar with the `PATH` environment variable, just think of this as the order in which your computer looks for commands to run. What's important is that it will look in your active virtual environment first, so when you run `dbt`, it will use the `dbt` you just installed in your virtual environment.
4. Create a `profiles.yml` file in the root of your project. This file is already `.gitignore`d so you can keep your credentials safe. If you'd prefer you can also set up a `profiles.yml` file at the `~/.dbt/profiles.yml` path instead for extra security.
5. [Add a profile for your warehouse connection in this file](https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles#connecting-to-your-warehouse-using-the-command-line) and add this configuration to your `dbt_project.yml` file as a top-level key called `profile` e.g. `profile: my-profile-name`.
6. Run a `jafgen [integer of years to generate]` e.g. `jafgen 4`, then run a `dbt seed`. Depending on how much data you choose to generate this might take several minutes, we don't recommend generating more than 10 years of data as this is untested and may take a _really_ long time to generate and seed.
7. `pip uninstall dbt-core dbt-[your warehouse adapter]` to remove the dbt Core installation. This is a temporary installation to allow you to seed the data, you don't need it for the rest of the project which will use the dbt Cloud CLI. You can then delete your `profiles.yml` file and the configuration in your `dbt_project.yml` file. If you want to keep your dbt Core installation, you can, but you'll need to be mindful of conflicts between the two installations which both use the `dbt` command.

### Pre-commit and SQLFluff

There's an optional tool included with the project called `pre-commit`.

[pre-commit](https://pre-commit.com/) automatically runs a suite of of processes on your code, like linters and formatters, when you commit. If it finds an issue and updates a file, you'll need to stage the changes and commit them again (the first commit will not have gone through because pre-commit found and fixed an issue). The outcome of this is that your code will be more consistent automatically, and everybody's changes will be running through the same set of processes. We recommend it for any project.

You can see the configuration for pre-commit in the `.pre-commit-config.yaml` file. It's installed as part of the project's `requirements.txt`, but you'll need to opt-in to using it by running `pre-commit install`. This will install _git hooks_ which run when you commit. You can also run the checks manually with `pre-commit run --all-files` to see what it does without making a commit.

At present the following checks are run:

- `ruff` - an incredibly fast linter and formatter for Python, in case you add any Python models
- `check-yaml` - which validates YAML files
- `end-of-file-fixer` - which ensures all files end with a newline
- `trailing-whitespace` - which trims trailing whitespace from files

At present, the popular SQL linter and formatter SQLFluff doesn't play nicely with the dbt Cloud CLI, so we've omitted it from this project _for now_. If you'd like auto-formatting and linting for SQL, check out the dbt Cloud IDE!

We have kept a `.sqlfluff` config file to show what that looks like, and to future proof the repo for when the Cloud CLI support linting and formatting.
19 changes: 16 additions & 3 deletions Taskfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,35 @@ tasks:
venv:
cmds:
- python3 -m venv .venv
silent: true

install:
cmds:
- source .venv/bin/activate && python3 -m pip install --upgrade pip
- source .venv/bin/activate && python3 -m pip install -r requirements.txt --progress-bar off
- source .venv/bin/activate && python3 -m pip install --upgrade pip --progress-bar off > /dev/null
- source .venv/bin/activate && python3 -m pip install -r requirements.txt --progress-bar off > /dev/null
silent: true

gen:
cmds:
- source .venv/bin/activate && jafgen
silent: true

build:
cmds:
- dbt deps
- dbt seed
- rm -rf jaffle-data
- dbt run
- dbt test

clean:
cmds:
- rm -rf jaffle-data
silent: true

setup:
cmds:
- task: venv
- task: install
- task: gen
- task: build
- task: clean
9 changes: 3 additions & 6 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,14 @@ version: "3.0.0"
require-dbt-version: ">=1.5.0"

dbt-cloud:
project-id: 283328 # Put your project id here
project-id: 275557 # Put your project id here

# If you want to run SQLFluff pre-commit hooks you'll need
# to set up a working profile it can use and list it below
profile: default

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["data-tests"]
seed-paths: ["jaffle-data"]
seed-paths: ["jaffle-data", "seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

Expand All @@ -29,8 +27,7 @@ vars:

seeds:
jaffle_shop:
+enabled: "{{ target.name != 'prod' }}"
+schema: jaffle_shop_raw
+schema: raw

models:
jaffle_shop:
Expand Down
Loading
Loading