Trino metrics with OpenMetrics#
Trino supports the metrics standard OpenMetrics, that originated with the open-source systems monitoring and alerting toolkit Prometheus.
Metrics are automatically enabled and available on the coordinator at the
/metrics
endpoint. The endpoint is protected with the configured
authentication, identical to the
Web UI and the Client protocol.
For example, you can retrieve metrics data from an unsecured Trino server
running on localhost:8080
with random username example
:
curl -H X-Trino-User:foo localhost:8080/metrics
The result follows the OpenMetrics specification and looks similar to the following example output:
# TYPE io_airlift_http_client_type_HttpClient_name_ForDiscoveryClient_CurrentResponseProcessTime_Min gauge
io_airlift_http_client_type_HttpClient_name_ForDiscoveryClient_CurrentResponseProcessTime_Min NaN
# TYPE io_airlift_http_client_type_HttpClient_name_ForDiscoveryClient_CurrentResponseProcessTime_P25 gauge
io_airlift_http_client_type_HttpClient_name_ForDiscoveryClient_CurrentResponseProcessTime_P25 NaN
# TYPE io_airlift_http_client_type_HttpClient_name_ForDiscoveryClient_CurrentResponseProcessTime_Total gauge
io_airlift_http_client_type_HttpClient_name_ForDiscoveryClient_CurrentResponseProcessTime_Total 0.0
# TYPE io_airlift_http_client_type_HttpClient_name_ForDiscoveryClient_CurrentResponseProcessTime_P90 gauge
io_airlift_http_client_type_HttpClient_name_ForDiscoveryClient_CurrentResponseProcessTime_P90 NaN
The same data is available when using a browser, and logging manually.
The user, foo
in the example, must have read permission to system information
on a secured deployment, and the URL and port must be adjusted accordingly.
Each Trino node, so the coordinator and all workers, provide separate metrics independently.
Use the property openmetrics.jmx-object-names
in Config properties to
define the JMX object names to include when retrieving all metrics. Multiple
object names are must be separated with |
. Metrics use the package namespace
for any metric. Use :*
to expose all metrics. Use name
to select specific
classes or type
for specific metric types.
Examples:
trino.plugin.exchange.filesystem:name=FileSystemExchangeStats
for metrics from theFileSystemExchangeStats
class in thetrino.plugin.exchange.filesystem
package.trino.plugin.exchange.filesystem.s3:name=S3FileSystemExchangeStorageStats
for metrics from theS3FileSystemExchangeStorageStats
class in thetrino.plugin.exchange.filesystem.s3
package.io.trino.hdfs:*
for all metrics in theio.trino.hdfs
package.java.lang:type=Memory
for all memory metrics in thejava.lang
package.
Typically, Prometheus or a similar application is configured to monitor the endpoint. The same application can then be used to inspect the metrics data.
Trino also includes a Prometheus connector that allows you to query Prometheus data using SQL.
Examples#
The following sections provide tips and tricks for your usage with small examples.
Other configurations with tools such as grafana-agent or grafana alloy opentelemetry agent are also possible, and can use platforms such as Cortex or Grafana Mimir for metrics storage and related monitoring and analysis.
Simple example with Docker and Prometheus#
The following steps provide a simple demo setup to run Prometheus and Trino locally in Docker containers.
Create a shared network for both servers called platform
:
docker network create platform
Start Trino in the background:
docker run -d \
--name=trino \
--network=platform \
--network-alias=trino \
-p 8080:8080 \
trinodb/trino:latest
The preceding command starts Trino and adds it to the platform
network with
the hostname trino
.
Create a prometheus.yml
configuration file with the following content, that
point Prometheus at the trino
hostname:
scrape_configs:
- job_name: trino
basic_auth:
username: trino-user
static_configs:
- targets:
- trino:8080
Start Prometheus from the same directory as the configuration file:
docker run -d \
--name=prometheus \
--network=platform \
-p 9090:9090 \
--mount type=bind,source=$PWD/prometheus.yml,target=/etc/prometheus/prometheus.yml \
prom/prometheus
The preceding command adds Prometheus to the platform
network. It also mounts
the configuration file into the container so that metrics from Trino are
gathered by Prometheus.
Now everything is running.
Install and run the Trino CLI or any other client application and
submit a query such as SHOW CATALOGS;
or SELECT * FROM tpch.tiny.nation;
.
Optionally, log into the Trino Web UI at http://localhost:8080 with a random username. Press the Finished button and inspect the details for the completed queries.
Access the Prometheus UI at http://localhost:9090/, select Status > Targets and see the configured endpoint for Trino metrics.
To see an example graph, select Graph, add the metric name
trino_execution_name_QueryManager_RunningQueries
in the input field and press
Execute. Press Table for the raw data or Graph for a visualization.
As a next step, run more queries and inspect the effect on the metrics.
Once you are done you can stop the containers:
docker stop prometheus
docker stop trino
You can start them again for further testing:
docker start trino
docker start prometheus
Use the following commands to completely remove the network and containers:
docker rm trino
docker rm prometheus
docker network rm platform
Coordinator and worker metrics with Kubernetes#
To get a complete picture of the metrics on your cluster, you must access the coordinator and the worker metrics. This section details tips for setting up for this scenario with the Trino Helm chart on Kubernetes.
Add an annotation to flag all cluster nodes for scraping in your values for the Trino Helm chart:
coordinator:
annotations:
prometheus.io/trino_scrape: "true"
worker:
annotations:
prometheus.io/trino_scrape: "true"
Configure metrics retrieval from the workers in your Prometheus configuration:
- job_name: trino-metrics-worker
scrape_interval: 10s
scrape_timeout: 10s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_trino_scrape]
action: keep # scrape only pods with the trino scrape anotation
regex: true
- source_labels: [__meta_kubernetes_pod_container_name]
action: keep # dont try to scrape non trino container
regex: trino-worker
- action: hashmod
modulus: $(SHARDS)
source_labels:
- __address__
target_label: __tmp_hash
- action: keep
regex: $(SHARD)
source_labels:
- __tmp_hash
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
metric_relabel_configs:
- source_labels: [__name__]
regex: ".+_FifteenMinute.+|.+_FiveMinute.+|.+IterativeOptimizer.+|.*io_airlift_http_client_type_HttpClient.+"
action: drop # droping some highly granular metrics
- source_labels: [__meta_kubernetes_pod_name]
regex: ".+"
target_label: pod
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
regex: ".+"
target_label: container
action: replace
scheme: http
tls_config:
insecure_skip_verify: true
basic_auth:
username: mysuer # replace with a user with system information permission
# DO NOT ADD PASSWORD
The worker authentication uses a user with access to the system information, yet does not add a password and uses access via HTTP.
Configure metrics retrieval from the coordinator in your Prometheus configuration:
- job_name: trino-metrics-coordinator
scrape_interval: 10s
scrape_timeout: 10s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_trino_scrape]
action: keep # scrape only pods with the trino scrape anotation
regex: true
- source_labels: [__meta_kubernetes_pod_container_name]
action: keep # dont try to scrape non trino container
regex: trino-coordinator
- action: hashmod
modulus: $(SHARDS)
source_labels:
- __address__
target_label: __tmp_hash
- action: keep
regex: $(SHARD)
source_labels:
- __tmp_hash
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- action: replace # overide the address to the https ingress address
target_label: __address__
replacement: {{ .Values.trinourl }}
metric_relabel_configs:
- source_labels: [__name__]
regex: ".+_FifteenMinute.+|.+_FiveMinute.+|.+IterativeOptimizer.+|.*io_airlift_http_client_type_HttpClient.+"
action: drop # droping some highly granular metrics
- source_labels: [__meta_kubernetes_pod_name]
regex: ".+"
target_label: pod
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
regex: ".+"
target_label: container
action: replace
scheme: https
tls_config:
insecure_skip_verify: true
basic_auth:
username: mysuer # replace with a user with system information permission
password_file: /some/password/file
The coordinator authentication uses a user with access to the system information and requires authentication and access via HTTPS.