Welcome to our new article! 👋 We’ll show the way to combine rapidly and effectively dbt with Good Information utilizing a sequence of Python scripts. Within the earlier publish, How To Construct A Fashionable Information PipelineOn this article, we supplied a information on the way to construct a stable information pipeline that solves typical issues that analytics engineers face. Then again, this new article describes extra in-depth integration with dbt as a result of as we wrote within the article GoodData and dbt Metricswe predict that dbt metrics are good for easy use circumstances however for superior analytics, you want a extra stable software like GoodData.
Even supposing our resolution is tightly coupled with GoodData, we wish to present a basic information on the way to combine with dbt! Let’s begin 🚀.
Very first thing first — why would you wish to combine with dbt? Earlier than you begin to write your individual code, it’s a good strategy to do analysis of current dbt plugins first. It’s a identified proven fact that the dbt has a really robust neighborhood with a variety of information professionals. In case your use case just isn’t very unique or proprietary to your resolution, I might wager that there already exists an identical plugin,
One instance is price a thousand phrases. Few months in the past, we had been creating our first prototype with dbt and jumped into an issue with referential integrity constraints. We had mainly two choices:
- Write a customized code to resolve the issue.
- Discover a plugin that will resolve the issue.
Happily, we discovered a plugin dbt Constraints Package deal after which the answer was fairly easy:
Lesson realized: Seek for an current resolution first, earlier than writing any code. In case you nonetheless wish to combine dbt, let’s transfer to the following part.
Implementation: How To Combine With dbt?
Within the following sections, we cowl crucial elements of integration with DBT. If you wish to discover the entire implementation, take a look at the repository,
Earlier than we begin writing customized code, we have to do some setup. First necessary step is to create a profile file,
It’s mainly a configuration file with the database connection particulars. Attention-grabbing factor right here is the partition between dev and prod. In case you discover the repository, you’ll find that there’s a CI/CD pipeline (described in How To Construct A Fashionable Information Pipeline, The dev and prod environments guarantee that each stage within the pipeline is executed with the best database.
The subsequent step is to create a typical python package deal. It permits us to run the proprietary code inside the dbt surroundings.
the entire dbt-gooddata package deal is in GitLab. Throughout the package deal, we are able to then run instructions like:
Transformation was essential for our use case. The output of dbt are materialized tables within the so-called output stage schema. The output stage schema is the purpose the place GoodData connects however with the intention to efficiently begin to create analytics (metrics, reviews, dashboards), we have to do a couple of issues first, like hook up with information supply (output stage schema), or – what’s the most attention-grabbing half — convert dbt metrics to GoodData metrics.
Let’s begin with the fundamentals. In GoodData, we now have an idea referred to as the Bodily Information Mannequin (PDM) that describes the tables of your database and represents how the precise information is organized and saved within the database. Based mostly on the PDM, we additionally create a Logical Information Mannequin (LDM) which is an summary view of your information in GoodData. The LDM is a set of logical objects and their relationships that characterize the info objects and their relationships in your database by the PDM.
If we use extra easy phrases that are frequent in our business — PDM is tightly coupled with a database, LDM is tightly coupled with analytics (GoodData). Virtually every little thing you do in GoodData (metrics, reviews) is predicated on the LDM. Why will we use the LDM idea? Think about you alter one thing in your database, for instance, the title of a column. If GoodData didn’t have the extra LDM layer, you would wish to alter the column title in each place (each metric and each report, and so forth.). With LDM, you solely change one property of the LDM, and the modifications are routinely propagated all through your analytics. There are different advantages too, however we won’t cowl them right here — you’ll be able to examine them in the documentation,
We coated just a little concept, let’s verify the extra attention-grabbing half. How will we create PDM, LDM, Metrics, and so forth.? from dbt generated output stage schemas? To start with, a schema description is the last word supply of reality for us:
You may see that we use dbt commonplace issues like date_type however we additionally launched metadata that helps us with changing issues from dbt to GoodData. For the metadata, we created information lessons that information us in utility code:
The information lessons can be utilized in strategies the place we create LDM objects (for instance, date datasets):
You may see that we work with metadata which helps us to transform issues accurately. We use the end result from the strategy make_date_datasets, along with different outcomes, to create a LDM in GoodData by its API, or extra exactly with the assistance of GoodData Python SDK,
For individuals who wish to additionally discover how we convert dbt metrics to GoodData metrics, you’ll be able to verify the entire implementation,
We perceive that the earlier chapter might be overwhelming. Earlier than the demonstration, let’s simply use one picture to point out the way it works for higher understanding.
Demonstration: Generate Analytics From dbt
For the demonstration, we skip the extract half and begin with transformation, which implies that we have to run dbt:
The result’s output stage schema with the next construction:
Now, we have to get this output to GoodData to begin analyzing information. Usually, you would wish to do a couple of guide steps both within the UI or utilizing API / GoodData Python SDK. Due to integration described within the implementation part, just one command must be run:
Listed here are the logs from the profitable run:
The ultimate result’s a efficiently created Logical Information Mannequin (LDM) in GoodData:
The final step is to deploy dbt metrics to GoodData metrics. The command is just like the earlier one:
Listed here are the logs from the profitable run:
Now, we are able to verify how the dbt metric was transformed to a GoodData metric:
Crucial factor is that you would be able to now use the generated dbt metrics and construct extra advanced metrics in GoodData. You may then construct reviews and dashboards and, as soon as you’re proud of the end result, you’ll be able to retailer the entire declarative analytics utilizing one command and model in git:
For these of you who like automation, you’ll be able to take inspiration from our article the place we describe the way to automate information analytics utilizing CI/CD,
The article describes our strategy to integration with DBT. It’s our very first prototype and with the intention to productize it, we would wish to finalize a couple of issues after which publish the mixing as a stand alone plugin. We hope that this text can function an inspiration in your firm, if you happen to determine to combine with dbt. In case you take one other strategy, we would love to listen to that! Thanks for studying!
If you wish to strive it by yourself, you’ll be able to register for the GoodData trial and play with it by yourself.