353 is right around the corner. Last show we said this would be a small release. While there was a correctness issue we resolved, there didn’t seem to be much demand to get it out quick as we initially thought. So it was decided to continue adding more features to 353. It should be coming out shortly!
What is the general data flow from a connected data source?
What clients are provided by Trino project?
What other clients are there?
What happens in the Python world?
Disclaimer: I am not a Pythonista or Pythoneer.
Beyond that, it will vary from application to application.
Let’s find out from our guests how this hangs together in Apache Superset, since it is using Python.
In this week’s pull request https://github.com/apache/superset/pull/13105 that was graciously added by dungdm93.
The first thing we need to understand about this addition is the concept of a database engine in Superset. A database engine handles a lot of the custom interactions between various databases and maps them to the interface that Superset understands. If certain concepts are missing in a certain database, like time granularity or SQL syntax, the database engine for that database indicated to Superset that this is not available. As a result the option does not show in Superset, or a concise error message is reported. By default, database engines use the base.py methods, but each engine, like Trino, add the custom mappings with a specific engine implementation, trino.py.
The pull request adds a few basic custom changes to enable Trino usage with Superset. One change ensures that complex timestamps from Trino are truncated to a format that Superset is able to support during time aggregation operations.
This opens a vast amount of functionality for using Trino and Superset. We wanted to feature this because it goes to show how a small code change, even one that is not in the Trino repository, can have a vast effect on those using Superset and Trino.
Thank you so much to dungdm93 for making this change and further linking Trino into a fantastic project like Apache Superset!
To put this PR to the test, we need to connect Apache Superset to Trino as our datasource.
First, you need to follow these instructions to install Docker (if you don’t already have it installed), and then clone the Superset repository:
$ git clone https://github.com/apache/superset.git
Next, you need to set up the database driver for Trino. Navigate to the root directory of the local Superset repository you just downloaded and run the following.
echo "sqlalchemy-trino" >> ./docker/requirements-local.txt
This tells Superset scripts to install the sqlalchemy-trino library upon startup. We know the name by looking up the Trino driver page for the driver documentation and how to use the connection string. If you were to install these directly on a Superset node, you would refer to this database drivers page.
Now run the following command to start up Superset and make sure you’re in the root folder of the repo.
docker-compose -f docker-compose-non-dev.yml up.
After Superset is running, you need to start Trino as well. We did so using a separate docker-compose app.
As soon as this is done, you can navigate to Superset’s homepage http://localhost:8088 and scroll to the Data > Databases menu.
Click the +Database button.
Set Name to “Trino” and URI to
and click Add.
If you want to allow CTAS, CVAS, or DML operations, you’ll want to edit the Database you just created and click on the SQL LAB SETTINGS tab and select in the operations you want to allow.
Connection settings that allows for creation/manipulation of tables.
You should be able to verify under SQL Lab > SQL Editor and run a SELECT query.
We cover adding charts and creating a dashboard in the show. We linked some blogs from Preset around how to do a lot of this workflow in great detail. Find these blogs linked below! Here’s a taste of what we created in Superset with some BTS On-Time : Reporting Carrier On-Time Performance (1987-present) and Covid Cases reported by the CDC.
COVID-19 and flights data dashboard!
I want to just use the REST API of Trino. Where is the documentation? How do I do that?
Don’t do that. Use a Trino client instead.
The typical desired use case for using the REST API is to run a query and get the result. However that part of the API is not really a traditional REST API (HTTP POST, HTTP GET). That just doesn’t work for large datasets to be returned. Instead, it is a constant open connection and stream of data and interaction between client and Trino.
The clients take care of all this complexity and provide it in standard API for the various platforms (JDBC, …). Use the clients!
And if there is no client, or the existing client is not good enough. Create an open source one or contribute improvements.
There are other simple, pure REST API endpoints that you can use just straight out of the box. Try http://localhost:8080/v1/info or http://localhost:8080/v1/status. You could use those for a liveness/readiness probe in k8s or for cluster status display. By the way, the Web UI uses those and others..
If you really can’t help yourself, here are some docs. https://github.com/trinodb/trino/wiki/HTTP-Protocol
Latest training from David, Dain, and Martin(Now with timestamps!):
If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.
Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.