Understanding DBT Runtime Environment
We’ve all been there. Our models are running. All tests pass. Yet, we keep seeing these bizarre log messages every time we run DBT. The more macros and Jinja we use, the more messages show up, even from models we aren’t running. Sometimes, the same message appears twice. How about when the messages don’t appear in our local environment but show up in DBT Cloud or Kubernetes. What’s going on? Why so chatty DBT?
To better understand why we see unexpected log messages, we need to understand the three stages of a DBT run. In this post, we will look at several examples of logging in code and the results printed to the console. Then, we will learn what happens when DBT runs and how we can manipulate the runtime environment with {% if execute %}. So get VS Code up and running so you can follow along.
Confusing DBT F’ing circumstance 1: Log messages from models we are not running
On the right, we have as simple a model as exists in DBT. Just a select statement. No refs, no sources, no from clause. Easy peasy, right? On the left, another simple model with a few log statements in places you might use a macro. It’s clear that with_log_message should print log messages to the console. No_log_message should not print anything. What happens when we compile no_log_message?
Wait, what? I compiled no_log_message, why are we seeing output from with_log_message? That’s strange. Let’s try again.
That’s better, right? I mean, the log messages are gone. But why did they appear in the first place? Why didn’t they appear the second time I compiled the model? Now, run it one more time, but first delete the target directory. Can you guess what will happen?
Confusing DBT F’ing circumstance 2: Duplicate log messages
Let’s take a step back and just run with_log_message after deleting the target directory to clear any caching. Starts off good. We see our log messages. DBT found two models.
1 of 1 Start…
Hold on. Wait a minute. The model logs our messages before they run? And now they are logging again. The same damn messages. This is too much. I’m going back to SSIS! Just kidding, I’d never do that.
Go ahead and run the same command again. By now, I think you know what will happen. We’ll only see the log messages once. There is a pattern. It’s almost like DBT does some sort of pre processing before running the models. It does in fact do some pre processing. We’ll get to the why in a minute, but let’s look at one more what before we do.
Enlightening DBT circumstance 1: {% if execute %}
This model contains two log messages inside an if/else block. Notice {% if execute %}. Now, look at the output. Both messages are logged to the console, but at different times. One before the model starts compiling and one after. Looks like we have solved the pattern. There is some phase called execute and some pre-phase that occurs before execute. Except for a few specific use cases, we do not want our jinja code running in this pre-phase.
I know you’re all like, “Just get on with it and tell me what the hell is happening. I’m a busy analytics engineer, you know.” And I’m all like, “Stop yelling at me. These examples are important.” Yes, DBT does go through three primary stages and two execute phases when we compile, run or build models.
Enlightening DBT Running in Three Stages
To better understand these stages, take a peek in the target directory. You’ll see a few files and folders.
The most important file for this topic is manifest.json. This is the file that holds all the information about your DAG, including models, snapshots, seeds, macros, tests, etc. DBT builds this file during what I call the inspection phase. It inspects your project and collects information about everything including dependencies between all the resources. It needs this information before compiling and running, so it can determine the order to run things.
Phase 1 — Inspect
- Inspects the entire DBT project saving relevant metadata to target/manifest.json
- Evaluates refs and sources (Requires some Jinja compilation, so don’t put these inside {% if execute %} blocks)
- Determines dependencies between resources
- execute = false
Step 2 — Compile
- Compiles our Jinja code to SQL (/target/compiled directory)
- Probably some other stuff I don’t know about
- execute = true
Step 3 — Run
- Applies materialization logic to the compiled code (/target/run directory)
- Applies dependencies from manifest.json to run our models, snapshots, seeds and tests in the required order.
- Stores results in run_results.json
- execute = true
Enlightening DBT Conclusion
For such a simple tool, DBT can be maddening at times, but with a little understanding, it all starts to make sense. DBT follows three stages, inspect, compile, and run. Knowing how to use the execute variable is key to running our code at the right time in these stages. Wrapping our jinja code inside {% if execute %} blocks will not only clean up the console output, but save a world of headaches when our code runs before it should. So stop asking “Why so chatty DBT?” and start asking, “Why so awesome DBT?”