-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-86] [Feature] Write All Store Test Failures Into One Table with an Unstructured JSON/VARIANT/SUPER Column #4613
Comments
Another possible related thought on this. What if the table into which test results are stored actually became a node in the DAG? Given that tests are already nodes in the DAG, these tables/models would become nodes downstream of each test. And then it would be possible to have models which in turn are built on top of the these failure storage tables which would be very helpful for the purposes of data quality management. That idea could be combined with my original proposal in this ticket to allow larger combined tables, with the semi-structured format suggested, to then be used by downstream analytics. If the failure storage tables were treated as similar to snapshots, they also then wouldn't be truncated/dropped-and-recreated on the next dbt run. Which would also make it easier to see, if when dbt is re-run, the same error is repeated again or whether it no longer exists. |
Hi @codigo-ergo-sum! A few ideas:
Would either or those first two ideas be suitable (or close to suitable) for your use case? |
Oh, and regarding adding the stored failures as a node in the DAG-- I love the idea! We'll have to see where #4624 takes us |
FYI we are getting around this by setting up the returned failed test record columns to be all nearly identical, with only minor variations between them (also using BQ and batched runs). That way we can just simply run a SELECT * FROM |
I'm going to try using on-run-end post-hook combined with a macro, and then we also just got JSON support enabled for our BQ environment today (it's a special thing that has to be turned on by request.) So I'll report back :). |
@codigo-ergo-sum how'd you go on your test for this? We're looking to start cleaning up our schemas and having something like this would be incredibly valuable :) I started to store all our testing failures but we sparsely use it so it's getting hard to traverse the schema it stores it to :( |
So I did this initial take for BigQuery, basing off of code in this issue (which was for Snowflake): #4099:
A few thoughts:
So we'd have the Time permitting, I'll try to poke at this more next week. |
+1 this is correct way of storing test results instead current one.
|
Have a look at https://docs.getdbt.com/reference/resource-configs/limit - I think you'll find it does what you need
Do you mean printing to the console, or just accessing the compiled code in general? All compiled tests are stored in your target directory where you can look at them - https://docs.getdbt.com/docs/faqs/checking-logs. If you wanted the whole compiled code to be printed to the console, I'd suggest opening a separate issue for that. |
Also see my comment on this closed ticket that is definitely related: #2593 (comment) |
@codigo-ergo-sum Have you made any more progress with this using on-run-end hook? |
+1 is there any progress on this? |
Given the planned introduction of unit testing in 1.8, it'd be great to see this put on the roadmap for 1.9! |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Still interested in this. |
same |
same
|
Is there an existing feature request for this?
Describe the Feature
Right now store-failures in dbt writes out the rows returned for each failed test into a separate table if
--store-failures
is specified. This feature is definitely helpful from a developer perspective to be able to immediately see what rows failed in a test by querying the returned table.It is somewhat less useful in its current incarnation in a production environment where jobs and tests may run many times per day and the operations team would like to have a unified, easily queryable database-persisted record of all tests. Even better perhaps would be a more automated way of pushing test failures back to users.
One way of doing this would be to create one table where all test failures are written to. The big obstacle to this on some platforms, such as BigQuery, has been the lack of a semi-structured native table/column format. Snowflake already has VARIANT and Redshift now has SUPER. This is important because different failed tests on different tables are naturally going to have different schemas that can't easily be reconciled in one table where columns have to be declared ahead of time.
BigQuery has now just added a JSON datatype: https://cloud.google.com/bigquery/docs/reference/standard-sql/json-data
What if we now had a table with a set of columns where all test failures would be stored:
Maybe even more interesting would be the ability to group different failed tests into different long-term tables for storage. Something like this in dbt_project.yml:
Then I could go back and write a report which issues a query to the database like this:
SELECT * FROM not_so_sensitive_test_failures_table WHERE owner = 'data_curation@mycompany.com' AND failed_test_timestamp > (CURRENT_TIMESTAMP - INTERVAL 24 HOURS)
What say ye?
Describe alternatives you've considered
Writing our own macros to parse INFORMATION_SCHEMA to find failed tests in the schema that dbt creates for storing failures and then unifying everything that dbt had put in there into one JSON-style table. But doing it right in dbt would be a lot more elegant.
Who will this benefit?
Anybody trying to store test failures over time and then push out responsibility for correction outside of the data engineering team.
Are you interested in contributing this feature?
Yes - I poked around in the code for 10 minutes in dbt-core and didn't obviously see where to start editing but definitely interested
Anything else?
No response
The text was updated successfully, but these errors were encountered: