Martin:
SET TIME ZONE
timestamp(n)
with precision higher than 3 in MySQLUPDATE
privilegeManfred:
SET TIME ZONE
is a feature from our guest Marius from last time!connector.name
from hive-hadoop2
to hive
More info at https://trino.io/docs/current/release/release-359.html.
This week we are going a little out of order from our usual sequence on this show. The question really gets to the heart of the concept of the week. We’ll cover this first then jump into the concept.
This question was asked on StackOverflow:
It seems dbt only works for a single database. If my data is in a different database, will that still work? For example, if my datalake is using delta, but I want to run dbt using Redshift, would dbt still work for this case?
Our guest Victor replied:
You can use Trino with dbt to connect to multiple databases in the same project.
The GitHub example project https://github.com/victorcouste/trino-dbt-demo contains a fully working setup, that you can replicate and adapt to your needs.
dbt is a transformation workflow tool that lets teams quickly and collaboratively deploy analytics code, following software engineering best practices like modularity, CI/CD, testing, and documentation. It enables anyone who knows SQL to build production-grade data pipelines.
When referring to dbt, it can mean two slightly different things. dbt core is the open source framework that provides the SQL compiler and framework to manage your SQL workflow. You can interact with it via a command line interface. In addition, dbtlabs offers the fully managed SaaS product dbt Cloud. You can use it to handle all of your dbt projects from development to deployment in a single browser based tool. It provides useful features like a full IDE to develop and test code, orchestration, logging, and alerting. At the moment, dbt Cloud is not available for Trino users.
The framework allows you to check the quality of results, document the lineage, manage the changes/versions in the SQL scripts and orchestrate the queries, like a CI/CD framework but for your data. dbt is not an extract and load tool. The focus is on transforming what is already in your data warehouse/data lake.
Check out these links to learn more:
Trino is the execution SQL engine and dbt is the framework to manage your SQL statements. dbt won’t execute the SQL itself, rather it pushes all of the compute down to the SQL engine. This SQL engine can be Trino, or an engine included in the data source like the database itself. Using Trino as the SQL execution engine allows you to use the same SQL dialect for all connected data sources. This includes data sources that natively do not support SQL like object storage systems, Kafka, Elasticsearch, and many others.
Transformations done by dbt are in general used to clean and prepare data for analytics purposes. It’s often used to go from the raw data to a ready-to-use data for reporting and analysis. dbt creates database objects like tables or views to be consumed by business users and analytics tools.
On the other hand, even if Trino can also execute SQL to create tables and views, these SQL queries are not managed but just executed. Trino doesn’t have, like dbt, all the framework to version, audit, document and orchestrate SQL script and execution. Trino is more used to execute SQL SELECT statements generated by users or BI tools to analyze data in an interactive way.
Trino and dbt are complementary when you need to access different sources from a single SQL query or when you need to run SQL query with good performance on object storage systems like S3, GCS, ADLS, or HDFS.
It’s where Trino can complement dbt, as dbt can only access a single data warehouse connection in a SQL query. In dbt there is no way to query multiple storage systems at the same time.
Trino is recognized for great performance with object storage/data lake processing. With dbt it can transform and prepare data at scale. Trino also allows you to run dbt on a traditional, on-premise data warehouse where normally dbt only runs on a modern cloud data warehouse like Snowflake, BigQuery, or Redshift.
dbtlabs offers a good tutorial which covers the fundamental topics of dbt for you to learn:
Then you have other resources like seeds, macros, tests, sources, snapshots.
Victor shows us a demo from his blog post that inspired this episode.
If you looked at the code, you may have noticed that the code used an adapter
called db-presto-trino
. This adapter derives from the outdated presto naming and is
still there for interaction with legacy Presto clusters. Although it can work
it uses an outdated python client to interact with Trino and there is an open
issue to create an official dbt-trino
adapter
that uses the updated trino-python-client.
If you want to help with this, reach out on the issue itself and join the
#db-presto-trino
channel on the dbt Slack.
https://community.getdbt.com/
After the show Marius Grama, started work on dbt-trino in his own repository. Thanks for the quick turnaround Marius!
The PR of the week, was committed by Ayush Bilala(Twitter), (LinkedIn), a Staff Software Engineer at Walmart Global Tech.
This fixes issue 8263 by adding
a new configuration for the Big Query connector, bigquery.views-cache-ttl
to allow configuring the cache expiration for BigQuery views.
Thanks Ayush!
News
Trino Meetup groups
Latest training from David, Dain, and Martin(Now with timestamps!):
If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.
Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.