Skip to content

Commit

Permalink
Merge pull request #74 from dbt-labs/feature/generate_model_ctes
Browse files Browse the repository at this point in the history
Feature/generate_model_import_ctes
  • Loading branch information
joellabes authored Oct 6, 2022
2 parents caa657d + 8dd5091 commit 0d7ce15
Show file tree
Hide file tree
Showing 11 changed files with 665 additions and 6 deletions.
110 changes: 107 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,22 @@
Macros that generate dbt code, and log it to the command line.

# Contents
* [generate_source](#generate_source-source)
* [generate_base_model](#generate_base_model-source)
* [generate_model_yaml](#generate_model_yaml-source)
- [dbt-codegen](#dbt-codegen)
- [Contents](#contents)
- [Installation instructions](#installation-instructions)
- [Macros](#macros)
- [generate_source (source)](#generate_source-source)
- [Arguments](#arguments)
- [Usage:](#usage)
- [generate_base_model (source)](#generate_base_model-source)
- [Arguments:](#arguments-1)
- [Usage:](#usage-1)
- [generate_model_yaml (source)](#generate_model_yaml-source)
- [Arguments:](#arguments-2)
- [Usage:](#usage-2)
- [generate_model_import_ctes (source)](#generate_model_import_ctes-source)
- [Arguments:](#arguments-3)
- [Usage:](#usage-3)

# Installation instructions
New to dbt packages? Read more about them [here](https://docs.getdbt.com/docs/building-a-dbt-project/package-management/).
Expand Down Expand Up @@ -164,3 +177,94 @@ models:
```

4. Paste the output in to a schema.yml file, and refactor as required.

## generate_model_import_ctes ([source](macros/generate_model_import_ctes.sql))
This macro generates the SQL for a given model with all references pulled up into import CTEs, which you can then paste back into the model.

### Arguments:
* `model_name` (required): The model you wish to generate SQL with import CTEs for.
* `leading_commas` (optional, default = false): Whether you want your commas to be leading (vs trailing).

### Usage:
1. Create a model with your original SQL query
2. Copy the macro into a statement tab in the dbt Cloud IDE, or into an analysis file, and compile your code

```
{{ codegen.generate_model_import_ctes(
model_name = 'my_dbt_model'
) }}
```

Alternatively, call the macro as an [operation](https://docs.getdbt.com/docs/using-operations):

```
$ dbt run-operation generate_model_import_ctes --args '{"model_name": "my_dbt_model"}'
```

3. The new SQL - with all references pulled up into import CTEs - will be logged to the command line

```
with customers as (
select * from {{ ref('stg_customers') }}
),
orders as (
select * from {{ ref('stg_orders') }}
),
payments as (
select * from {{ ref('stg_payments') }}
),
customer_orders as (
select
customer_id,
min(order_date) as first_order,
max(order_date) as most_recent_order,
count(order_id) as number_of_orders
from orders
group by customer_id
),
customer_payments as (
select
orders.customer_id,
sum(amount) as total_amount
from payments
left join orders on
payments.order_id = orders.order_id
group by orders.customer_id
),
final as (
select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order,
customer_orders.most_recent_order,
customer_orders.number_of_orders,
customer_payments.total_amount as customer_lifetime_value
from customers
left join customer_orders
on customers.customer_id = customer_orders.customer_id
left join customer_payments
on customers.customer_id = customer_payments.customer_id
)
select * from final
```

4. Replace the contents of the model's current SQL file with the compiled or logged code
3 changes: 3 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,6 @@ clean-targets:
seeds:
+schema: raw_data
+quote_columns: false

vars:
my_table_reference: table_c
4 changes: 4 additions & 0 deletions integration_tests/models/model_without_any_ctes.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
select *, 2 as col2
from {{ ref('model_without_import_ctes') }} as m
left join (select 2 as col_a from {{ ref('data__a_relation') }}) as a on a.col_a = m.id
where id = 1
55 changes: 55 additions & 0 deletions integration_tests/models/model_without_import_ctes.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
This is my model!
*/

{{ config(
materialized='table',
) }}

-- I love this cte
with my_first_cte as (
select
a.col_a,
b.col_b
from {{ ref('data__a_relation') }} as a
left join {{ ref("data__b_relation") }} as b
on a.col_a = b.col_a
left join {{ ref('data__a_relation') }} as aa
on a.col_a = aa.col_a
),
my_second_cte as (
select
1 as id
from codegen_integration_tests__data_source_schema.codegen_integration_tests__data_source_table
union all
select
2 as id
from {{ source('codegen_integration_tests__data_source_schema', 'codegen_integration_tests__data_source_table') }}
-- union all
-- select
-- 3 as id
-- from development.codegen_integration_tests__data_source_schema.codegen_integration_tests__data_source_table
-- union all
-- select
-- 4 as id
-- from {{ var("my_table_reference") }}
-- union all
-- select
-- 5 as id
-- from {{ var("my_other_table_reference", "table_d") }}
)
-- my_third_cte as (
-- select
-- a.col_a,
-- b.col_b
-- from `raw_relation_1` as a
-- left join "raw_relation_2" as b
-- on a.col_a = b.col_b
-- left join [raw_relation_3] as aa
-- on a.col_a = aa.col_b
-- left join 'raw_relation_4' as ab
-- on a.col_a = ab.col_b
-- left join 'my_schema'.'raw_relation_5' as ac
-- on a.col_a = ac.col_b
-- )
select * from my_second_cte
146 changes: 146 additions & 0 deletions integration_tests/tests/test_generate_model_import_ctes.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
{% set actual_model_with_import_ctes = codegen.generate_model_import_ctes(
model_name = 'model_without_import_ctes'
)
%}

{% set expected_model_with_import_ctes %}
/*
This is my model!
*/

{% raw %}{{ config(
materialized='table',
) }}{% endraw %}

with codegen_integration_tests__data_source_schema_codegen_integration_tests__data_source_table as (

select * from codegen_integration_tests__data_source_schema.codegen_integration_tests__data_source_table
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

data__a_relation as (

select * from {% raw %}{{ ref('data__a_relation') }}{% endraw %}

),

data__b_relation as (

select * from {% raw %}{{ ref("data__b_relation") }}{% endraw %}

),

development_codegen_integration_tests__data_source_schema_codegen_integration_tests__data_source_table as (

select * from development.codegen_integration_tests__data_source_schema.codegen_integration_tests__data_source_table
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

my_other_table_reference as (

select * from {% raw %}{{ var("my_other_table_reference", "table_d") }}{% endraw %}
-- CAUTION: It's best practice to use the ref or source function instead of a var

),

my_schema_raw_relation_5 as (

select * from 'my_schema'.'raw_relation_5'
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

my_table_reference as (

select * from {% raw %}{{ var("my_table_reference") }}{% endraw %}
-- CAUTION: It's best practice to use the ref or source function instead of a var

),

raw_relation_1 as (

select * from `raw_relation_1`
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

raw_relation_2 as (

select * from "raw_relation_2"
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

raw_relation_3 as (

select * from [raw_relation_3]
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

raw_relation_4 as (

select * from 'raw_relation_4'
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

source_codegen_integration_tests__data_source_table as (

select * from {% raw %}{{ source('codegen_integration_tests__data_source_schema', 'codegen_integration_tests__data_source_table') }}{% endraw %}
-- CAUTION: It's best practice to create staging layer for raw sources

),

-- I love this cte
my_first_cte as (
select
a.col_a,
b.col_b
from data__a_relation as a
left join data__b_relation as b
on a.col_a = b.col_a
left join data__a_relation as aa
on a.col_a = aa.col_a
),
my_second_cte as (
select
1 as id
from codegen_integration_tests__data_source_schema_codegen_integration_tests__data_source_table
union all
select
2 as id
from source_codegen_integration_tests__data_source_table
-- union all
-- select
-- 3 as id
-- from development_codegen_integration_tests__data_source_schema_codegen_integration_tests__data_source_table
-- union all
-- select
-- 4 as id
-- from my_table_reference
-- union all
-- select
-- 5 as id
-- from my_other_table_reference
)
-- my_third_cte as (
-- select
-- a.col_a,
-- b.col_b
-- from raw_relation_1 as a
-- left join raw_relation_2 as b
-- on a.col_a = b.col_b
-- left join raw_relation_3 as aa
-- on a.col_a = aa.col_b
-- left join raw_relation_4 as ab
-- on a.col_a = ab.col_b
-- left join my_schema_raw_relation_5 as ac
-- on a.col_a = ac.col_b
-- )
select * from my_second_cte
{% endset %}

{{ assert_equal (actual_model_with_import_ctes | trim, expected_model_with_import_ctes | trim) }}
Loading

0 comments on commit 0d7ce15

Please sign in to comment.