Kubernetes Taints and Tolerations - Guide and Examples (2024)

Organizations and teams often need multi-tenant, heterogeneous Kubernetes clusters to meet users’ application needs. They may also need to address certain special constraints on the Kubernetes cluster; for example, some pods may require special hardware, colocation with other specific pods, or isolation from others. There are many options for placing those application containers into different, separate node groups, one of which is through the use of taints and tolerations. In this article, we describe taints and tolerations and then use an example to illustrate how to use them to place pods on specific worker nodes while avoiding the nodes where you don’t want pods to get scheduled.

Taints and Tolerations – Concepts

Taints and tolerations are a mechanism that allows you to ensure that pods are not placed on inappropriate nodes. Taints are added to nodes, while tolerations are defined in the pod specification. When you taint a node, it will repel all the pods except those that have a toleration for that taint. A node can have one or many taints associated with it.

For example, most Kubernetes distributions will automatically taint the master nodes so that one of the pods that manages the control plane is scheduled onto them and not any other data plane pods deployed by users. This ensures that the master nodes are dedicated to run control plane pods.

A taint can produce three possible effects:

NoSchedule: The Kubernetes scheduler will only allow scheduling pods that have tolerations for the tainted nodes.
PreferNoSchedule: The Kubernetes scheduler will try to avoid scheduling pods that don’t have tolerations for the tainted nodes.
NoExecute: Kubernetes will evict the running pods from the nodes if the pods don’t have tolerations for the tainted nodes.

Use Cases for Taints and Tolerations

Dedicated Nodes

If you need to dedicate a group of worker nodes for a set of users, you can add a taint to those nodes, such as by using this command:

kubectl taint nodes nodename dedicated=groupName:NoSchedule

Then add tolerations of the taint in that user group’s pods so they can run on those nodes. To further ensure that pods only get scheduled on that set of tainted nodes, you can also add a label to those nodes, e.g., dedicated=groupName. Then use NodeSelector in the deployment/pod spec, which will make sure that pods from the user group are bound to the node group and don’t run anywhere else.

Nodes with Special Hardware

If there are worker nodes with special hardware, you need to make sure that normal pods that don’t need the special hardware don’t run on those worker nodes. Do this by adding a taint to those nodes as follows:

kubectl taint nodes nodename special=true:NoSchedule

Later on, the pods requiring special hardware can be run on those worker nodes by adding tolerations for the above taint.

Taint-Based Evictions

A taint with the NoExecute effect will evict the running pod from the node if the pod has no tolerance for the taint. The Kubernetes node controller will automatically add this kind of taint to a node in some scenarios so that pods can be evicted immediately and the node is “drained” (have all of its pods evicted). For example, suppose a network outage causes a node to be unreachable from the controller. In this scenario, it would be best to move all of the pods off the node so that they can get rescheduled to other nodes. The node controller takes this action automatically to avoid the need for manual intervention.

How to Use Taints and Tolerations

We will now present a scenario to help you better understand taints and tolerations. Let’s start with a Kubernetes cluster that has worker nodes categorized into different groups, such as front-end nodes and back-end nodes. Let’s assume that we need to deploy the front-end application pods so that they are placed only on front-end nodes and not back-end nodes. We also must ensure that new pods are not scheduled into master nodes because those nodes run control plane components such as etcd.

Let’s start by getting the list of nodes to see what is already tainted by the Kubernetes default installation. Here we are on a cluster created by the Rancher RKE tool.

kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect NodeName TaintKey TaintValue TaintEffect cluster01-master-1 node-role.kubernetes.io/controlplane,node-role.kubernetes.io/etcd true,true NoSchedule,NoExecute cluster01-master-2 node-role.kubernetes.io/controlplane,node-role.kubernetes.io/etcd true,true NoSchedule,NoExecute cluster01-master-3 node-role.kubernetes.io/controlplane,node-role.kubernetes.io/etcd true,true NoSchedule,NoExecute cluster01-worker-1 <none> <none> <none>

From the output above, we noticed that the master nodes are already tainted by the Kubernetes installation so that no user pods land on them until intentionally configured by the user to be placed on master nodes by adding tolerations for those taints. The output also shows a worker node that has no taints. We will now taint the worker so that only front-end pods can land on it. We can do this by using the kubectl taint command.

kubectl taint nodes cluster01-worker-1 app=frontend:NoSchedule node/cluster01-worker-1 tainted

Machine learning for Kubernetes sizing

Learn More

	Visualize Utilization Metrics	Set Resource Requests & Limits	Set Requests & Limits with Machine Learning	Identify mis-sized containers at a glance & automate resizing	Get Optimal Node Configuration Recommendations
Kubernetes
Kubernetes + Densify

The above taint has a key name app, with a value frontend, and has the effect of NoSchedule, which means that no pod will be placed on this node until the pod has defined a toleration for the taint. We will see what the toleration looks like in later steps.

Let’s try to deploy an app on the cluster without any toleration configured in the app deployment specification.

kubectl create ns frontend namespace/frontend created kubectl run nginx --image=nginx --namespace frontend deployment.apps/nginx created kubectl get pods -n frontend NAME READY STATUS RESTARTS AGE nginx-76df748b9-gjbs4 0/1 Pending 0 9s kubectl get events -n frontend LAST SEEN TYPE REASON OBJECT MESSAGE <unknown> Warning FailedScheduling pod/nginx-76df748b9-gjbs4 0/4 nodes are available: 1 node(s) had taint {app: frontend}, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/controlplane: true}, that the pod didn't tolerate. <unknown> Warning FailedScheduling pod/nginx-76df748b9-gjbs4 0/4 nodes are available: 1 node(s) had taint {app: frontend}, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/controlplane: true}, that the pod didn't tolerate.

We created a namespace and deployed Nginx using the kubectl run command, but looking at the pod status and cluster events, we see that the pod can’t be scheduled because there are no appropriate worker nodes. Three master nodes have taints that the pod didn’t tolerate and one worker node has a taint that the pod doesn’t tolerate. To successfully place the pod on the worker node, we need to edit the deployment and add a toleration of the taint we configured earlier on the node.

Let’s see what the current deployment YAML looks like.

kubectl get deployment nginx -n frontend -o yaml apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" creationTimestamp: "2021-08-29T09:39:37Z" generation: 1 labels: run: nginx name: nginx namespace: frontend resourceVersion: "13367313" selfLink: /apis/apps/v1/namespaces/frontend/deployments/nginx uid: e46e026e-3a92-4aac-b985-7110426aa437 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: run: nginx strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: run: nginx spec: containers: - image: nginx imagePullPolicy: Always name: nginx resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30

From the output above, we can see that there is no toleration added in the pod spec. Let’s edit and add one.

Pick the ideal instance type for your workload using an ML-powered visual catalog map

See how it works

kubectl edit deployment nginx -n frontend deployment.apps/nginx edited kubectl get deployment nginx -n frontend -o yaml apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "3" creationTimestamp: "2021-08-29T09:39:37Z" generation: 3 labels: run: nginx name: nginx namespace: frontend resourceVersion: "13368509" selfLink: /apis/apps/v1/namespaces/frontend/deployments/nginx uid: e46e026e-3a92-4aac-b985-7110426aa437 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: run: nginx strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: run: nginx spec: containers: - image: nginx imagePullPolicy: Always name: nginx resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: app operator: Equal value: frontend

Identify under/over-provisioned K8s resources and use Terraform to auto-optimize

WATCH 3-MIN VIDEO

Notice the tolerations section of the pod spec: We have added a toleration for the taint so that the pod can be scheduled on the worker node.

Now let’s get the pod’s status and events.

kubectl get events -n frontend LAST SEEN TYPE REASON OBJECT MESSAGE 3m56s Normal SuccessfulCreate replicaset/nginx-9cf9fd78f Created pod: nginx-9cf9fd78f-khc5z 2s Normal SuccessfulDelete replicaset/nginx-9cf9fd78f Deleted pod: nginx-9cf9fd78f-khc5z 7m7s Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-76df748b9 to 1 3m56s Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-9cf9fd78f to 1 10s Normal ScalingReplicaSet deployment/nginx Scaled down replica set nginx-76df748b9 to 0 10s Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-8cb54bccc to 1 2s Normal ScalingReplicaSet deployment/nginx Scaled down replica set nginx-9cf9fd78f to 0 kubectl get pods -n frontend NAME READY STATUS RESTARTS AGE nginx-8cb54bccc-g4htt 1/1 Running 0 38s

The pod has now been allowed to run on the tainted node. If there are other worker nodes in the cluster, and they are not tainted, then this pod can also land on those free nodes. To make sure that this pod lands on the nodes that are dedicated to front-end pods, then aside from taint and toleration, we need to label the front-end nodes (e.g., app=frontend) and then use NodeSelector in the pod deployment spec so that the pod is only scheduled on front-end nodes.

Free Proof of Concept implementation if you run more than 5,000 containers

REQUEST SESSION

Conclusion

Taints and Tolerations provide advanced pod scheduling where tainted nodes control which pods can be scheduled on them. They are easier to manage as compared to other custom scheduling methods such as affinities. Nodes with special hardware, dedicating nodes for a group of users, and taint based pod evictions are some of the known use cases for taints and tolerations.