Presto 334 introduces the new Pinot Connector which allows Presto to query data stored in Apache Pinot™. Not only does this allow access to Pinot tables but gives users the ability to do things they could not do with Pinot alone such as join Pinot tables to other tables and use Presto’s scalar functions, window functions and complex aggregations.
Pinot UDF’s can be directly used by including the Pinot SQL query in quotes, explained below in the Pinot SQL Passthrough section. This enables aggregations and other complex query types to be done directly in Pinot.
This connector supports Pinot 0.3.0 and newer.
Create a properties file in the catalog directory, such as
etc/catalog/pinot.properties which includes at least the
following to get started:
pinot.controller-urls is a comma separated list of controller hosts. If Pinot is deployed via Kubernetes and you expose the
pinot.controller-urls needs to point to the controller Service endpoint. The Pinot broker and server must be accessible
via DNS as Pinot will return hostnames and not ip addresses.
If you have a smaller number of Pinot servers than Presto workers or a relatively small number of rows per Pinot segment, you can minimize the requests to pinot by increasing the number of Pinot segments per split (default is 1 segment per split):
If DNS resolution is slow or you get
Request timed out errors, you can increase the request timeout as follows:
Pinot supports the following data types. Currently null values are not supported. The corresponding Presto datatypes are:
|Pinot Datatype||Presto Datatype|
- The Pinot
bytestype is converted to a hex-encoded varchar. See the Pinot docs for more information.
Pinot SQL Passthrough
If you would like to leverage Pinot’s fast aggregations you can use a “dynamic” table where you specify the Pinot SQL query as the table name and it is passed directly to Pinot:
SELECT * FROM pinot.default."SELECT col3, col4, MAX(col1), COUNT(col2) FROM pinot_table GROUP BY col3, col4" WHERE col3 IN ('FOO', 'BAR') AND col4 > 50 LIMIT 30000
The filter in the outer presto query will be pushed down into the Pinot query via Presto’s applyFilter(). These queries are routed to the broker and should not return huge amounts of data as broker queries currently return a single response with all the results. This is more suited to aggregate queries.
Limits are pushed into the “dynamic” Pinot query via Presto’s applyLimit(). The above query would yield the following Pinot PQL query:
Pinot functions such as
PERCENTILEEST can be used in the quoted sql.
SELECT MAX(col1), COUNT(col2) FROM pinot_table WHERE col3 IN('FOO', 'BAR') and col4 > 50 LIMIT 30000
If you are returning a larger dataset you can issue a normal Presto query which will get routed to the Pinot servers which store the Pinot segments. Filters and Limits are pushed down to Pinot for regular queries as well.
As Presto and Pinot continue to evolve the Pinot connector will leverage new features such as aggregation pushdown and more.