Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use CTEs in data test with dbt-sqlserver - Database Error #2609

Closed
1 of 5 tasks
b-per opened this issue Jul 2, 2020 · 1 comment · Fixed by #2712
Closed
1 of 5 tasks

Can't use CTEs in data test with dbt-sqlserver - Database Error #2609

b-per opened this issue Jul 2, 2020 · 1 comment · Fixed by #2712
Labels
adapter_plugins Issues relating to third-party adapter plugins bug Something isn't working

Comments

@b-per
Copy link
Contributor

b-per commented Jul 2, 2020

Describe the bug

Using CTEs in data tests while using the dbt-sqlserver adapter raises SQL errors. This is due to the fact that data tests are wrapped around a SELECT COUNT(*) https://github.com/fishtown-analytics/dbt/blob/3af8a22d1319431115e7394f1235294613d36b35/core/dbt/task/test.py#L35 and SQL Server doesn't allow the CTEs inside other requests (it needs to start with WITH).

I has also been reported in the dbt-sqlserver repo here but this seems to be actually due to the different behaviour of data tests compared to schema tests in dbt.

Steps To Reproduce

  • Configure dbt to run with dbt-sqlserver
  • Create a test in the folder tests with a CTE (example: WITH src AS (SELECT COUNT(*) AS cnt FROM {{ ref('my table') }}) ) SELECT * FROM src )
  • Run dbt test --data
  • A database error is raised (see logs below)

Expected behavior

The test should run without any SQL error

Screenshots and log output

From dbt.log

020-07-02 08:39:12,208505 (Thread-1): On test.qwdv.cte: select count(*) as errors from (

WITH src AS (
  <my_select_query>
)

SELECT * FROM src
) sbq
2020-07-02 08:39:12,213714 (Thread-1): Database error: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'WITH'. (156) (SQLExecDirectW)")
2020-07-02 08:39:12,213930 (Thread-1): On test.qwdv.cte: ROLLBACK

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: sqlserver)

The output of dbt --version:

installed version: 0.15.2
   latest version: 0.17.0

The operating system you're using:
Mac OS

The output of python --version:
Python 3.8.3

Additional context

Is there any reason why the data tests are wrapped around a SELECT COUNT(*) and could the behavior be changed to the same as schema tests?

@b-per b-per added bug Something isn't working triage labels Jul 2, 2020
@jtcohen6 jtcohen6 removed the triage label Jul 6, 2020
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 6, 2020

Thanks for raising this issue here, @camfrout!

From some brief searching, it appears that SQLServer doesn't support nested with statements, i.e. CTEs within CTEs. (Spark and Presto don't, either.) This would mean that, in addition to data tests that have CTEs, SQLServer isn't happy about ephemeral models that have CTEs—is that right?

In any case, it sounds like we need to enable different implementations on different databases. We should move this SQL, currently defined as a python string (!), into a Jinja adapter macro that can be implemented differently on different plugins. We should do the same with built-in schema tests (see #2415).

In dbt:

{% macro get_count_from_test(test) %}
  {{ adapter_macro('get_count_from_test', test) }}
{% endmacro %}

{% macro default__get_count_from_test(test) %}
  select count(*) from ({{ test.injected_sql }}) sbq
{% endmacro %}

In dbt-sqlserver, which doesn't support nested with in CTEs or subqueries, I could imagine needing to implement this very differently. Here's some pseudo-code:

{% macro sqlserver__get_count_from_test(test) %}

  create temp table {{ test.name }}__dbt_test_tmp as (
      {{ test.injected_sql }}
  );

  select count(*) from {{ test.name }}__dbt_test_tmp;

{% endmacro %}

Addenda

Rationalizing and reconciling schema + data tests is on the 1.0.0 roadmap. Here's the trade-off we're facing:

  • It is beneficial to have users encode all logic (including select count(*)) directly in the test definition, as they do for schema tests today, because we avoid the syntactical differences that crop up on different databases (= this very issue).
  • It is also beneficial to have tests return a set of records (select *), as they do for data tests today, so that we can develop tooling to store those test results in the database (see Storing test results in the database #2593).

I'm leaning toward the latter. The adapter macro above would be a prerequisite for that change as well.

Also, we might take this opportunity to change the name of test.injected_sql.

@jtcohen6 jtcohen6 added this to the Marian Anderson milestone Jul 29, 2020
@jtcohen6 jtcohen6 added the adapter_plugins Issues relating to third-party adapter plugins label Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapter_plugins Issues relating to third-party adapter plugins bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants