Can't use CTEs in data test with dbt-sqlserver - Database Error #2609

b-per · 2020-07-02T08:54:36Z

Describe the bug

Using CTEs in data tests while using the dbt-sqlserver adapter raises SQL errors. This is due to the fact that data tests are wrapped around a SELECT COUNT(*) https://github.com/fishtown-analytics/dbt/blob/3af8a22d1319431115e7394f1235294613d36b35/core/dbt/task/test.py#L35 and SQL Server doesn't allow the CTEs inside other requests (it needs to start with WITH).

I has also been reported in the dbt-sqlserver repo here but this seems to be actually due to the different behaviour of data tests compared to schema tests in dbt.

Steps To Reproduce

Configure dbt to run with dbt-sqlserver
Create a test in the folder tests with a CTE (example: WITH src AS (SELECT COUNT(*) AS cnt FROM {{ ref('my table') }}) ) SELECT * FROM src )
Run dbt test --data
A database error is raised (see logs below)

Expected behavior

The test should run without any SQL error

Screenshots and log output

From dbt.log

020-07-02 08:39:12,208505 (Thread-1): On test.qwdv.cte: select count(*) as errors from (

WITH src AS (
  <my_select_query>
)

SELECT * FROM src
) sbq
2020-07-02 08:39:12,213714 (Thread-1): Database error: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'WITH'. (156) (SQLExecDirectW)")
2020-07-02 08:39:12,213930 (Thread-1): On test.qwdv.cte: ROLLBACK

System information

Which database are you using dbt with?

The output of dbt --version:

installed version: 0.15.2
   latest version: 0.17.0

The operating system you're using:
Mac OS

The output of python --version:
Python 3.8.3

Additional context

Is there any reason why the data tests are wrapped around a SELECT COUNT(*) and could the behavior be changed to the same as schema tests?

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2020-07-06T16:38:56Z

Thanks for raising this issue here, @camfrout!

From some brief searching, it appears that SQLServer doesn't support nested with statements, i.e. CTEs within CTEs. (Spark and Presto don't, either.) This would mean that, in addition to data tests that have CTEs, SQLServer isn't happy about ephemeral models that have CTEs—is that right?

In any case, it sounds like we need to enable different implementations on different databases. We should move this SQL, currently defined as a python string (!), into a Jinja adapter macro that can be implemented differently on different plugins. We should do the same with built-in schema tests (see #2415).

In dbt:

{% macro get_count_from_test(test) %}
  {{ adapter_macro('get_count_from_test', test) }}
{% endmacro %}

{% macro default__get_count_from_test(test) %}
  select count(*) from ({{ test.injected_sql }}) sbq
{% endmacro %}

In dbt-sqlserver, which doesn't support nested with in CTEs or subqueries, I could imagine needing to implement this very differently. Here's some pseudo-code:

{% macro sqlserver__get_count_from_test(test) %}

  create temp table {{ test.name }}__dbt_test_tmp as (
      {{ test.injected_sql }}
  );

  select count(*) from {{ test.name }}__dbt_test_tmp;

{% endmacro %}

Addenda

Rationalizing and reconciling schema + data tests is on the 1.0.0 roadmap. Here's the trade-off we're facing:

It is beneficial to have users encode all logic (including select count(*)) directly in the test definition, as they do for schema tests today, because we avoid the syntactical differences that crop up on different databases (= this very issue).
It is also beneficial to have tests return a set of records (select *), as they do for data tests today, so that we can develop tooling to store those test results in the database (see Storing test results in the database #2593).

I'm leaning toward the latter. The adapter macro above would be a prerequisite for that change as well.

Also, we might take this opportunity to change the name of test.injected_sql.

b-per added bug Something isn't working triage labels Jul 2, 2020

jtcohen6 removed the triage label Jul 6, 2020

jtcohen6 added this to the Marian Anderson milestone Jul 29, 2020

jtcohen6 mentioned this issue Jul 29, 2020

Remove double underscore in epheremal model generated CTE (__dbt__CTE__...) #2660

Closed

5 tasks

jtcohen6 added the adapter_plugins Issues relating to third-party adapter plugins label Jul 29, 2020

beckjake mentioned this issue Aug 18, 2020

Feature: adapter cte generation #2712

Merged

4 tasks

beckjake closed this as completed in #2712 Aug 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't use CTEs in data test with dbt-sqlserver - Database Error #2609

Can't use CTEs in data test with dbt-sqlserver - Database Error #2609

b-per commented Jul 2, 2020

jtcohen6 commented Jul 6, 2020

Can't use CTEs in data test with dbt-sqlserver - Database Error #2609

Can't use CTEs in data test with dbt-sqlserver - Database Error #2609

Comments

b-per commented Jul 2, 2020

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

jtcohen6 commented Jul 6, 2020

Addenda