Are You Using Elementary for DBT?
Elementary is a dbt package and CLI that provides a lot of infrastructure most of us would need to implement on our own. Project metadata mart? Check. Test-level Slack integration. Yup. Test and run dashboard. You betcha. Think of Elementary as the other stuff needed to maintain a data pipeline. When someone asks, “How do I manage all this stuff?” Channel your inner Sherlock and say, “It’s Elementary my dear.”
Here are some of my favorite features of Elementary.
Artifact Metadata (Data Mart About Your Dbt Project)
At the heart of Elementary is a data mart containing all the information about your dbt project. If it is a thing in dbt, then Elementary creates a dimension table for it. Models, tests, runs, test results, exposures, metrics, etc. It’s all there, automatically maintained, and included in a single dbt package.
While I’d love to create a tutorial on setting up the Elementary dbt package, it’s unnecessary. Add it to packages.yml, set the DB and schema in dbt_project.yml, then run “dbt run -s elementary”. That’s it. From there, it just automatically works. Take a look at the docs to see which artifacts are stored. To pique your curiosity, here is a list.
Having this metadata store makes operational tasks trivial. Let’s say I want to see if my model run time is changing significantly. A simple query will tell me. Even this SQL92-trained data engineer can figure that out.
with stats as (
name as model_name,
avg(execution_time) over(partition by name) as mean_execution_time,
stddev_pop(execution_time) over(partition by name) as stddev_execution_time
where name = 'accounts'
(execution_time - mean_execution_time) / stddev_execution_time as stddev_from_mean
order by execute_started_at
The Elementary dashboard provides data faceted over multiple views. Funny story. I never really used the dashboard when first introduced to Elementary. It seemed rudimentary. We would create our own, better, dashboard. Yeah, of course we would. Well, as is often the case, a VP was looking at it every day and asking questions I could not answer.
Fast forward a bit, and the Elementary dashboard is the first thing I look at in the morning right after checking Airflow to ensure our nightly jobs ran. The folks at Elementary Data cut all the cruft and provided a dashboard with the most important information needed to monitor a dbt project. Everything is simple and intuitive.
A Few Dashboard Examples
Look at this. Latest test results, including the test query and failing rows! This is a game changer when we get to Slack alerts and interacting with business partners.
Test results over time for a specific model and test
As you can see, there is quite a lot there and I skipped a couple views. Why create our own dashboard if this one can be created and shared to Azure, Slack or S3 with a simple command. “edr report”, by the way.
Slack Alerts — Tying It All Together
Much of Elementary helps data/analytics engineers manage their dbt pipeline. Slack alerts help you communicate with your team, management and upstream partners. Just like the rest of Elementary’s features, these alerts offer a ton of configurability at the individual test level.
Imagine dealing with account data that is managed by your customer success team. You often find data quality issues when a nightly job runs. Wouldn’t it be great to run data-quality tests on the source data before you run the pipeline. With Elementary, you can run these tests and send results to the Slack channel for customer success. It could include enough information to determine the failing rows and tag key people.
This is what the alert might look like. It’s not perfect, but for such little effort it is pretty darn good. Putting a link to the Elementary dashboard in the description would allow them to see the failed rows in table format.
I know. You’re ready for this thing to wrap up. So, instead of writing about every feature, how about just a list of a few more things Elementary does.
- Several different anomaly tests
- Advanced schema tests
- End-to-end pipeline view (Airflow, GitHub Actions, Looker and Tableau)
- Python tests
- Probably more I’ve forgetting about
- Test lineage
- Fantastic support through Slack or GitHub issues..
It’s Elementary my dear. If I had to choose a single dbt package, Elementary would be the clear choice. Combined with the CLI, this tool provides most of what is needed to manage the operational aspects of a dbt pipeline. In short, Elementary makes your life easier with minimal setup and configuration.