Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/generate_model_import_ctes #74

Merged
merged 21 commits into from
Oct 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
571b3ed
add generate_model_import_ctes macro and integration test
graciegoheen Aug 11, 2022
3a78210
update readme
graciegoheen Aug 11, 2022
538a46d
Update README.md
graciegoheen Aug 11, 2022
e9c54f4
adjusted test example for postgres
graciegoheen Aug 11, 2022
dd4dfc6
added support for config blocks
graciegoheen Aug 12, 2022
c07edba
added support for comments
graciegoheen Aug 12, 2022
c563e23
added comma handling for sql with and without any CTEs
graciegoheen Aug 12, 2022
8cbd058
Use unique instead of set to be compatible with more versions of dbt
graciegoheen Aug 12, 2022
0963596
change CTE name to avoid duplicates
graciegoheen Aug 12, 2022
45a844b
fix for replacing without reliance on abc
graciegoheen Aug 12, 2022
df7c512
option to use raw_sql or raw_code depending on dbt version
graciegoheen Aug 12, 2022
01dac05
commented out database example for integration tests
graciegoheen Aug 12, 2022
66bddbb
Merge branch 'feature/generate_model_ctes' of https://github.com/dbt-…
graciegoheen Aug 12, 2022
0df6010
Added regex matching for from var()
graciegoheen Aug 15, 2022
b0d4e33
Pickup raw references enclosed by single quote
graciegoheen Aug 15, 2022
9b7a33e
Fixed missing commas issue
graciegoheen Sep 9, 2022
58ad058
comma fix
graciegoheen Sep 9, 2022
ca67cea
Added integration test to check for commas on sql without import ctes
graciegoheen Sep 28, 2022
f536083
update readme
graciegoheen Sep 29, 2022
2ff64be
Added space between import CTEs
graciegoheen Sep 30, 2022
df54f84
update README
graciegoheen Oct 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 107 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,22 @@
Macros that generate dbt code, and log it to the command line.

# Contents
* [generate_source](#generate_source-source)
* [generate_base_model](#generate_base_model-source)
* [generate_model_yaml](#generate_model_yaml-source)
- [dbt-codegen](#dbt-codegen)
- [Contents](#contents)
- [Installation instructions](#installation-instructions)
- [Macros](#macros)
- [generate_source (source)](#generate_source-source)
- [Arguments](#arguments)
- [Usage:](#usage)
- [generate_base_model (source)](#generate_base_model-source)
- [Arguments:](#arguments-1)
- [Usage:](#usage-1)
- [generate_model_yaml (source)](#generate_model_yaml-source)
- [Arguments:](#arguments-2)
- [Usage:](#usage-2)
- [generate_model_import_ctes (source)](#generate_model_import_ctes-source)
- [Arguments:](#arguments-3)
- [Usage:](#usage-3)
graciegoheen marked this conversation as resolved.
Show resolved Hide resolved

# Installation instructions
New to dbt packages? Read more about them [here](https://docs.getdbt.com/docs/building-a-dbt-project/package-management/).
Expand Down Expand Up @@ -164,3 +177,94 @@ models:
```

4. Paste the output in to a schema.yml file, and refactor as required.

## generate_model_import_ctes ([source](macros/generate_model_import_ctes.sql))
This macro generates the SQL for a given model with all references pulled up into import CTEs, which you can then paste back into the model.

### Arguments:
* `model_name` (required): The model you wish to generate SQL with import CTEs for.
* `leading_commas` (optional, default = false): Whether you want your commas to be leading (vs trailing).

### Usage:
1. Create a model with your original SQL query
2. Copy the macro into a statement tab in the dbt Cloud IDE, or into an analysis file, and compile your code

```
{{ codegen.generate_model_import_ctes(
model_name = 'my_dbt_model'
) }}
```

Alternatively, call the macro as an [operation](https://docs.getdbt.com/docs/using-operations):

```
$ dbt run-operation generate_model_import_ctes --args '{"model_name": "my_dbt_model"}'
```

3. The new SQL - with all references pulled up into import CTEs - will be logged to the command line

```
with customers as (

select * from {{ ref('stg_customers') }}

),

orders as (

select * from {{ ref('stg_orders') }}

),

payments as (

select * from {{ ref('stg_payments') }}

),

customer_orders as (

select
customer_id,
min(order_date) as first_order,
max(order_date) as most_recent_order,
count(order_id) as number_of_orders
from orders
group by customer_id

),

customer_payments as (

select
orders.customer_id,
sum(amount) as total_amount
from payments
left join orders on
payments.order_id = orders.order_id
group by orders.customer_id

),

final as (

select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order,
customer_orders.most_recent_order,
customer_orders.number_of_orders,
customer_payments.total_amount as customer_lifetime_value
from customers
left join customer_orders
on customers.customer_id = customer_orders.customer_id
left join customer_payments
on customers.customer_id = customer_payments.customer_id

)

select * from final
```

4. Replace the contents of the model's current SQL file with the compiled or logged code
3 changes: 3 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,6 @@ clean-targets:
seeds:
+schema: raw_data
+quote_columns: false

vars:
my_table_reference: table_c
4 changes: 4 additions & 0 deletions integration_tests/models/model_without_any_ctes.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
select *, 2 as col2
from {{ ref('model_without_import_ctes') }} as m
left join (select 2 as col_a from {{ ref('data__a_relation') }}) as a on a.col_a = m.id
where id = 1
55 changes: 55 additions & 0 deletions integration_tests/models/model_without_import_ctes.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
This is my model!
*/

{{ config(
materialized='table',
) }}

-- I love this cte
with my_first_cte as (
select
a.col_a,
b.col_b
from {{ ref('data__a_relation') }} as a
left join {{ ref("data__b_relation") }} as b
on a.col_a = b.col_a
left join {{ ref('data__a_relation') }} as aa
on a.col_a = aa.col_a
),
my_second_cte as (
select
1 as id
from codegen_integration_tests__data_source_schema.codegen_integration_tests__data_source_table
union all
select
2 as id
from {{ source('codegen_integration_tests__data_source_schema', 'codegen_integration_tests__data_source_table') }}
-- union all
-- select
-- 3 as id
-- from development.codegen_integration_tests__data_source_schema.codegen_integration_tests__data_source_table
-- union all
-- select
-- 4 as id
-- from {{ var("my_table_reference") }}
-- union all
-- select
-- 5 as id
-- from {{ var("my_other_table_reference", "table_d") }}
)
-- my_third_cte as (
-- select
-- a.col_a,
-- b.col_b
-- from `raw_relation_1` as a
-- left join "raw_relation_2" as b
-- on a.col_a = b.col_b
-- left join [raw_relation_3] as aa
-- on a.col_a = aa.col_b
-- left join 'raw_relation_4' as ab
-- on a.col_a = ab.col_b
-- left join 'my_schema'.'raw_relation_5' as ac
-- on a.col_a = ac.col_b
-- )
select * from my_second_cte
146 changes: 146 additions & 0 deletions integration_tests/tests/test_generate_model_import_ctes.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
{% set actual_model_with_import_ctes = codegen.generate_model_import_ctes(
model_name = 'model_without_import_ctes'
)
%}

{% set expected_model_with_import_ctes %}
/*
This is my model!
*/

{% raw %}{{ config(
materialized='table',
) }}{% endraw %}

with codegen_integration_tests__data_source_schema_codegen_integration_tests__data_source_table as (

select * from codegen_integration_tests__data_source_schema.codegen_integration_tests__data_source_table
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

data__a_relation as (

select * from {% raw %}{{ ref('data__a_relation') }}{% endraw %}

),

data__b_relation as (

select * from {% raw %}{{ ref("data__b_relation") }}{% endraw %}

),

development_codegen_integration_tests__data_source_schema_codegen_integration_tests__data_source_table as (

select * from development.codegen_integration_tests__data_source_schema.codegen_integration_tests__data_source_table
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

my_other_table_reference as (

select * from {% raw %}{{ var("my_other_table_reference", "table_d") }}{% endraw %}
-- CAUTION: It's best practice to use the ref or source function instead of a var

),

my_schema_raw_relation_5 as (

select * from 'my_schema'.'raw_relation_5'
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

my_table_reference as (

select * from {% raw %}{{ var("my_table_reference") }}{% endraw %}
-- CAUTION: It's best practice to use the ref or source function instead of a var

),

raw_relation_1 as (

select * from `raw_relation_1`
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

raw_relation_2 as (

select * from "raw_relation_2"
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

raw_relation_3 as (

select * from [raw_relation_3]
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

raw_relation_4 as (

select * from 'raw_relation_4'
-- CAUTION: It's best practice to use the ref or source function instead of a direct reference

),

source_codegen_integration_tests__data_source_table as (

select * from {% raw %}{{ source('codegen_integration_tests__data_source_schema', 'codegen_integration_tests__data_source_table') }}{% endraw %}
-- CAUTION: It's best practice to create staging layer for raw sources

),

-- I love this cte
my_first_cte as (
select
a.col_a,
b.col_b
from data__a_relation as a
left join data__b_relation as b
on a.col_a = b.col_a
left join data__a_relation as aa
on a.col_a = aa.col_a
),
my_second_cte as (
select
1 as id
from codegen_integration_tests__data_source_schema_codegen_integration_tests__data_source_table
union all
select
2 as id
from source_codegen_integration_tests__data_source_table
-- union all
-- select
-- 3 as id
-- from development_codegen_integration_tests__data_source_schema_codegen_integration_tests__data_source_table
-- union all
-- select
-- 4 as id
-- from my_table_reference
-- union all
-- select
-- 5 as id
-- from my_other_table_reference
)
-- my_third_cte as (
-- select
-- a.col_a,
-- b.col_b
-- from raw_relation_1 as a
-- left join raw_relation_2 as b
-- on a.col_a = b.col_b
-- left join raw_relation_3 as aa
-- on a.col_a = aa.col_b
-- left join raw_relation_4 as ab
-- on a.col_a = ab.col_b
-- left join my_schema_raw_relation_5 as ac
-- on a.col_a = ac.col_b
-- )
select * from my_second_cte
{% endset %}

{{ assert_equal (actual_model_with_import_ctes | trim, expected_model_with_import_ctes | trim) }}
Loading