unit testing in dbt

How to set up a unit testing framework for your dbt projects

Betsy Varghese
Cognizant Servian
Published in
4 min readJun 16, 2021

--

dbt (data build tool) is a powerful tool in the ELT pipeline that allows you to perform transformations on data loaded in your warehouse. In this article, we’ll discuss custom testing within dbt. We’ll be using GCP services such as BigQuery, Cloud Build and Cloud Source Repositories.

So you’ve set up your dbt project and created your models. How do you test them? In other words, how do you ensure that your models are performing exactly the right kind of transformations that you intend? This is where unit testing comes in.

The idea is to provide input files that resemble the type of data your model will be using and output files that resemble the results of your models. You want to tell dbt to choose this input file as the source for your model and then compare the results of that model with your provided output file. For that comparison to be performed, we’re going to have to define tests inside of our schema.yml. We’ll walk through an example.

The simplest dataset

My project, dbt-unit-testing, is set up with connections to my BigQuery dataset into which I’ve loaded the Iris dataset available from Kaggle. This dataset has been loaded into my project into a table named iris (creative!). My model does something very basic; it creates a new column that sums all the measurements in a row and stores them in a column named “Sum” (even more creative!). This is what it looks like:

Simple model

Suuuper simple. Here’s how we test this model.

Seed your files

Create input and expected output CSV files. Name them using the following format:

  1. Input file: test_<model_name>.csv
  2. Output file: <model_name>_expected.csv

As an example, here are my input and output files:

Input file
Output file

Place these files under the “data” folder in your project directory. When you seed files in dbt, it is always a safe bet to provide the column data types for your files. You can do this inside of your dbt_project.yml, like so:

Add a section containing seed configs to your dbt_project.yml

This helps ensure that dbt parses seed files correctly. Set the seed schema to a dataset separate from your models, or you’ll find that your dataset gets cluttered very quickly. If you like, you can place your testing seeds in a folder (say, UnitTesting) so that you can control the schema of all the seeds within that folder.

Targets for tests

Think of targets inside of dbt as separate environments. You might have a dev and prod target already defined inside of your profiles.yml. We’re going to add another one called “unit-test”:

unit-test target

Notice the dataset defined inside of the “unit-test target”. This sets the default schema to “iris-testing” when the “unit-test” target is specified.

Macros for tests

dbt provides ref and source macros to help create dependencies between models and establish lineage. We will rewrite these to accept different sources based on the target flag we pass through when we run the test command. I’ve named them ref_for_test and source_for_test, but you can name them anything really. Add these to your macros folder.

ref_for_test
source_for_test

Here’s what these macros do:

  1. Create a relation containing the model name originally passed to the macro (normal_source_relation, normal_ref_relation)
  2. Check to see if the target is “unit-test”. If it is, then “test_” is appended to the identifier (think model name) of the relation calling the macro. This means that if iris_modifier calls the ref_for_test macro in the unit-test environment, the macro returns a relation for test_iris_modifier
  3. If the target is not “unit-test”, the original relation (normal_source_relation, normal_ref_relation) is passed back to the model

Replace the ref and source macros in your models with these. When the target is “unit-test”, they’ll point towards the input files that we’ve seeded in.

Defining the test

dbt-utils is a package provided by Fishtown Analytics that contains useful macros that we will use in our unit testing adventure. Add it to your packages.yml:

We’re interested in the equality schema test provided, as we’re going to use it to compare our expected output with the results of our model. This test is defined within our schema.yml:

schema test

Execution

Now that we’ve set up all of the different elements required to perform our unit tests, we can look at the actual dbt commands/actions involved.

  1. Run dbt deps to install packages specified inside of your packages.yml
    dbt deps
  2. Run dbt seed to load the test files into the testing dataset
    dbt seed
  3. Run the model to be tested against the unit-test target
    dbt run --target unit-test --model <model_name>
  4. Run the test against the unit-test target
    dbt test --target unit-test --model <model_name>

Running this locally, I get the following result:

Test results

The test has passed and all the testing models are neatly placed within a testing dataset. Success!

Wrapping up

It’s easy to use this framework to build a solid unit-testing foundation for your dbt project. In the next part, we’ll try our hand at building a CI/CD pipeline for unit-testing, along with a look at how we can test incremental models.

--

--