This is the first episode in a series where we cover the basics and just enough advanced Kubernetes features and information to understand how to deploy Trino on Kubernetes.
For this concept of the week, we want to provide you a minimalistic overview of what you need to know about Kubernetes to deploy Trino to a cluster.
Why Kubernetes? Kubernetes is a container orchestration platform that allows you to indicate how to manage containers declaritively using yaml configuration files. This definition can be tricky to understand if you don’t have proper context. To make sure nobody is left behind, it is useful to cover what containers are:
The traditional way to deploy an application is to take the compiled binary of that application and run it directly on computer hardware that has an operating system to run the application on it. This works, but has a lot of dependency on the underlying hardware and operating system to be functional and requires multiple applications to share the same resources. If one of the applications fails and causes any of the shared resources to crash, it could cause all applications to fail on that machine.
To remove these dependencies, engineers created virtual machines (VMs) by using a VM manager called the hypervisor that emulate hardware environments to host other operating systems. This is a big step forward as now each application can be isolated, but it comes at a great cost. Each virtual machine hosts an entire operating system and is resource intensive and slow.
Containers are the newest type of deployment. Containers enable a logical isolation of resources while still physically running on shared resources. All resources created in the hardware and operating systems exist on the host system. The isolation restricts any interference from other processes. Containers achieve the goals of virtualization without sacrificing much performance or efficiency.
Source: https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/
Source: https://www.slideshare.net/devopsdaysaustin/continuously-delivering-microservices-in-kubernetes-using-jenkins
Here we hae two services that each sit behind a load balancer provided and mapped by the Kuberenets cluster.
Kubernetes components and architecture:
Source: https://kubernetes.io/docs/concepts/overview/components/
kube-proxy - network proxy that maintains network rules applied to nodes and allows network access between Pods in a cluster
You can scale up multiple pods on a single node until the node has no more resources, at which time a new node needs to be added and pod instances are distributed between the nodes.
Source: https://learnk8s.io/graceful-shutdown
Source: https://learnk8s.io/graceful-shutdown
This weeks PR of the week comes from a different repo under the trinodb org, trinodb/charts. This PR contains the merging from contributor Valeriano Manassero.
Valerino maintains a very useful helm chart, that started before the Trino org had defined our own community chart. This pull request effectively is trying to merge some useful features Valeriano added to his Trino helm chart so that it can be maintained in the community version.
Valeriano’s Trino Helm Chart: https://artifacthub.io/packages/helm/valeriano-manassero/trino
It hasn’t been merged yet but we are really looking forward to seeing this get merged in. Thanks Valeriano!
For this weeks demo, you need to install kubectl, minikube using the docker driver, and helm. You can find the trino helm chart on ArtifactHub at this URL.
https://artifacthub.io/packages/helm/trino/trino
First, start your minikube instance.
minikube start --driver=docker
Now take a quick look at the state of your k8s cluster.
kubectl get all
Add the template for the different trino catalogs on coordinators and workers.
kubectl apply -f - <<EOF
# Source: trino/templates/configmap-catalog.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: tcb-trino-catalog
labels:
app: trino
chart: trino-0.2.0
release: tcb
heritage: Helm
role: catalogs
data:
tpch.properties: |
connector.name=tpch
tpch.splits-per-node=4
tpcds.properties: |
connector.name=tpcds
tpcds.splits-per-node=4
EOF
Add the template for a single coordinator configuration.
kubectl apply -f - <<EOF
# Source: trino/templates/configmap-coordinator.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: tcb-trino-coordinator
labels:
app: trino
chart: trino-0.2.0
release: tcb
heritage: Helm
component: coordinator
data:
node.properties: |
node.environment=production
node.data-dir=/data/trino
plugin.dir=/usr/lib/trino/plugin
jvm.config: |
-server
-Xmx8G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
-XX:-UseBiasedLocking
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.nio.maxCachedBufferSize=2000000
config.properties: |
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=4GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
memory.heap-headroom-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
log.properties: |
io.trino=INFO
EOF
Add the tcb-trino service definition to run Trino.
kubectl apply -f - <<EOF
# Source: trino/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: tcb-trino
labels:
app: trino
chart: trino-0.2.0
release: tcb
heritage: Helm
spec:
type: ClusterIP
ports:
- port: 8080
targetPort: http
protocol: TCP
name: http
selector:
app: trino
release: tcb
component: coordinator
EOF
Add the deployment definition for the service.
kubectl apply -f - <<EOF
# Source: trino/templates/deployment-coordinator.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: tcb-trino-coordinator
labels:
app: trino
chart: trino-0.2.0
release: tcb
heritage: Helm
component: coordinator
spec:
selector:
matchLabels:
app: trino
release: tcb
component: coordinator
template:
metadata:
labels:
app: trino
release: tcb
component: coordinator
spec:
securityContext:
runAsUser: 1000
runAsGroup: 1000
volumes:
- name: config-volume
configMap:
name: tcb-trino-coordinator
- name: catalog-volume
configMap:
name: tcb-trino-catalog
imagePullSecrets:
- name: registry-credentials
containers:
- name: trino-coordinator
image: "trinodb/trino:latest"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /etc/trino
name: config-volume
- mountPath: /etc/trino/catalog
name: catalog-volume
ports:
- name: http
containerPort: 8080
protocol: TCP
livenessProbe:
httpGet:
path: /v1/info
port: http
readinessProbe:
httpGet:
path: /v1/info
port: http
resources:
{}
EOF
Now check the state of the k8s cluster again.
kubectl get all
Run the following command to expose the url and port to the localhost system.
minikube service tcb-trino --url
Clean up all the resources.
kubectl delete pod --all
kubectl delete replicaset --all
kubectl delete service tcb-trino
kubectl delete deployment tcb-trino-coordinator
kubectl delete configmap --all
Now you can run the same demo using the helm chart which includes all of these templates out-of-the-box. First add the trino helm chart, check the templates that are produced by helm, and run the install.
# HELM DEMO
helm repo add trino https://trinodb.github.io/charts/
helm template tcb trino/trino --version 0.2.0
helm install tcb trino/trino --version 0.2.0
Now that it’s installed, run the same command to expose the url of the service.
minikube service tcb-trino --url
Clean up all the resources.
minikube delete
helm repo remove trino
Trino Summit is moving to 100% virtual: register here.
Trino Meetup groups
If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.
Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.