Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spike] [Snowflake] Support complex types unit testing #9131

Conversation

jtcohen6
Copy link
Contributor

@jtcohen6 jtcohen6 commented Nov 21, 2023

To get unit tests working on Snowflake at all, we need case-insensitive comparisons in a few places.

To get "complex data types" working on Snowflake in particular, I needed to:

  • escape single quotes in strings
  • hack extend snowflake__safe_cast to also do JSON-parsing for me, as a treat:
{% macro snowflake__safe_cast(field, type) %}
  {#-- Hack for now --#}
  {% if type|lower in ['array', 'variant'] %}
    cast(parse_json({{field}}) as {{type}})
  {% else %}
    try_cast({{field}} as {{type}})
  {% endif %}
{% endmacro %}

Then, this "just works":

-- models/my_upstream_model.sql
select
  ['a','b','c'] as tested_column

-- models/my_model.sql
select
    tested_column
from {{ ref('my_upstream_model')}}
unit_tests:
  - name: test_array
    model: my_model
    given:
      - input: ref('my_upstream_model')
        rows:
          - tested_column: "['a','b','c']"
    expect:
      rows:
        - tested_column: "['a','b','c']"
  - name: test_object
    model: my_model
    given:
      - input: ref('my_upstream_model')
        rows:
          - tested_column: "{'a': 'b', 'c': 123}"
    expect:
      rows:
        - tested_column: "{'a': 'b', 'c': 123}"
  - name: test_combined
    model: my_model
    given:
      - input: ref('my_upstream_model')
        rows:
          - tested_column: "[{'a': 'b', 'c': [1,2,3]}]"
    expect:
      rows:
        - tested_column: "[{'a': 'b', 'c': [1,2,3]}]"

@jtcohen6 jtcohen6 requested a review from MichelleArk November 21, 2023 22:54
@cla-bot cla-bot bot added the cla:yes label Nov 21, 2023
@jtcohen6 jtcohen6 changed the title Case-insensitive comparisons [spike] [Snowflake] Support complex types unit testing Nov 21, 2023
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@@ -22,15 +22,15 @@
-- Build actual result given inputs
with dbt_internal_unit_test_actual AS (
select
{% for expected_column_name in expected_column_names %}{{expected_column_name}}{% if not loop.last -%},{% endif %}{%- endfor -%}, {{ dbt.string_literal("actual") }} as actual_or_expected
{% for expected_column_name in expected_column_names %}{{expected_column_name}}{% if not loop.last -%},{% endif %}{%- endfor -%}, {{ dbt.string_literal("actual") }} as {{ adapter.quote("actual_or_expected") }}
Copy link
Contributor Author

@jtcohen6 jtcohen6 Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a column name is quoted, Snowflake will preserve its case. The alternative is to lowercase column names in the agate result before we access it here:

lambda row: row["actual_or_expected"] == actual_or_expected

Otherwise, we get one of these:

22:45:30  Unhandled error while executing target/run/my_dbt_project/models/my_model.yml/models/my_model__test_my_model.sql
'actual_or_expected'
22:45:30  1 of 1 ERROR my_model__test_my_model ........................................... [ERROR in 24.16s]
22:45:30
22:45:30  Finished running 1 unit_test in 0 hours 0 minutes and 24.17 seconds (24.17s).
22:45:30
22:45:30  Completed with 1 error and 0 warnings:
22:45:30
22:45:30    'actual_or_expected'
22:45:30

@@ -10,7 +10,7 @@
{% set columns_in_relation = adapter.get_column_schema_from_query(get_empty_subquery_sql(sql)) %}
{%- set column_name_to_data_types = {} -%}
{%- for column in columns_in_relation -%}
{%- do column_name_to_data_types.update({column.name: column.data_type}) -%}
{%- do column_name_to_data_types.update({column.name|lower: column.data_type}) -%}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is naively assuming that the user has provided their columns in all lowercase. We'd obviously want to lower on both sides of the comparison.

Copy link

codecov bot commented Nov 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8197fa7) 86.92% compared to head (e2be7ba) 86.87%.

Additional details and impacted files
@@                          Coverage Diff                           @@
##           support-complex-types-unit-testing    #9131      +/-   ##
======================================================================
- Coverage                               86.92%   86.87%   -0.05%     
======================================================================
  Files                                     181      181              
  Lines                                   27156    27156              
======================================================================
- Hits                                    23604    23592      -12     
- Misses                                   3552     3564      +12     
Flag Coverage Δ
integration 83.81% <ø> (-0.10%) ⬇️
unit 64.72% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -67,7 +68,7 @@ union all
{%- for column_name, column_value in row.items() -%}
{% set row_update = {column_name: column_value} %}
{%- if column_value is string -%}
{%- set row_update = {column_name: safe_cast(dbt.string_literal(column_value), column_name_to_data_types[column_name]) } -%}
{%- set row_update = {column_name: safe_cast(dbt.string_literal(dbt.escape_single_quotes(column_value)), column_name_to_data_types[column_name]) } -%}
Copy link
Contributor Author

@jtcohen6 jtcohen6 Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that it's:

select cast(parse_json('[\'a\',\'b\',\'c\']') as ARRAY) AS tested_column

Rather than:

select cast(parse_json('['a','b','c']') as ARRAY) AS tested_column

@jtcohen6
Copy link
Contributor Author

Closing since @MichelleArk has pulled these commits into #9102

@jtcohen6 jtcohen6 closed this Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant