DaemonSets in Kubernetes: Managing Cluster-wide Workloads

DaemonSets are a fundamental resource in Kubernetes used for deploying and managing cluster-wide workloads. Unlike other controllers like Deployments, DaemonSets ensure that a copy of a Pod runs on every node within the cluster, making them ideal for deploying system daemons or agents that need to be present on all nodes.

What is DaemonSets in Kubernetes?

DaemonSets are a Kubernetes controller that ensures a specific Pod runs on every node within the cluster. They are commonly used for deploying system-level daemons or agents that need to perform tasks on every node, such as log collection, monitoring, or networking.

A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created. In this article, we will deploy Fluentd as a daemonSet in every Node. We will not cover the complete EFK setup here.

How it work internally?

Internally, DaemonSets leverages the Kubernetes control plane to ensure that a Pod matching the DaemonSet’s specification is running on each node. When a new node is added to the cluster, Kubernetes automatically schedules the DaemonSet Pod on the new node. Similarly, when a node is removed from the cluster, Kubernetes terminates the DaemonSet Pod running on that node.

Daemon Pods are scheduled using node selectors or node affinity/anti-affinity rules specified in the DaemonSet manifest. These rules allow users to define which nodes should run the corresponding Pods based on node labels, node attributes, or other criteria.

Use Cases of DaemonSets:

Deploying a logging agent like Fluentd or Filebeat as a DaemonSet to collect logs from every node and forward them to a centralized logging system. DaemonSets are commonly used for deploying:

Monitoring Agents: Deploying monitoring agents (such as Prometheus Node Exporter) on every node to collect metrics and monitor system health.
Logging Agents: Deploying logging agents (such as Fluentd or Filebeat) on every node to collect logs and forward them to a centralized logging system.
Network Plugins: Deploying network plugins (such as Calico or Flannel) on every node to enable network communication and policy enforcement.
Security Agents: Deploying security agents (such as antivirus or intrusion detection systems) on every node to monitor and protect against security threats.
Storage Plugins: Running a cluster storage daemon on every node.

Key Features of DaemonSets:

Node-level Deployment: DaemonSets ensure that exactly one instance of a Pod is running on each node in the cluster, regardless of the cluster’s size or changes in the number of nodes.
Automatic Scheduling: DaemonSets automatically schedule Pods on new nodes as they are added to the cluster and terminate Pods on nodes that are removed from the cluster.
Flexible Pod Placement: DaemonSets allow users to specify which nodes should run the corresponding Pods by using node selectors or node affinity/anti-affinity rules.
Update and Rollback: Like other Kubernetes resources, DaemonSets supports rolling updates and rollbacks, allowing users to update DaemonSet Pods with minimal disruption to cluster operations.

Node Level Deployment of DaemonSets:

Fluentd is a popular open-source data collector for unified logging layers. It is often used as a logging agent in Kubernetes clusters to collect and aggregate container logs.

To deploy Fluentd as a DaemonSet in Kubernetes, you can create a YAML file that defines the DaemonSet object. Here’s an example YAML file that deploys Fluentd as a DaemonSet:

fluentd-elasticsearch-daemonset.yml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
    version: v1
spec:
  selector:
    matchLabels:
      k8s-app: fluentd-logging
      version: v1
  template:
    metadata:
      labels:
        k8s-app: fluentd-logging
        version: v1
    spec:
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      containers:
        - name: fluentd
          image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
          env:
            - name: K8S_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name:  FLUENT_ELASTICSEARCH_HOST
              value: "elasticsearch-logging"
            - name:  FLUENT_ELASTICSEARCH_PORT
              value: "9200"
            ## refer the original link
          resources:
            limits:
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 200Mi
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: dockercontainerlogdirectory
              mountPath: /var/log/pods
              readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: dockercontainerlogdirectory
          hostPath:
            path: /var/log/pods

This file is taken from the official fluentd-kubernetes-daemon repo.

This DaemonSet manifest is for deploying Fluentd as a logging agent in the Kubernetes cluster. Let’s break down the key components and configurations:

Namespace: The DaemonSet is deployed in the kube-system namespace, which is commonly used for system-level components in Kubernetes.
Labels: The DaemonSet and its Pods are labeled with k8s-app: fluentd-logging and version: v1 for identification and grouping purposes.
Node Tolerations: The DaemonSet Pods tolerate scheduling on nodes with specific roles (control-plane and master) using tolerations. This ensures Fluentd Pods can run on control-plane and master nodes if necessary.
Container Configuration:
- Image: The Fluentd container uses the image fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch, which includes Fluentd with Elasticsearch plugin support.
- Environment Variables: Environment variables are set to configure Fluentd to forward logs to an Elasticsearch cluster (FLUENT_ELASTICSEARCH_HOST, FLUENT_ELASTICSEARCH_PORT, etc.).
- Resource Limits: Resource limits are set to restrict the memory and CPU usage of the Fluentd container.
- Volume Mounts: Two volume mounts are specified to access the host’s log directories (/var/log and /var/log/pods). This allows Fluentd to collect logs from both log files and Docker container logs.
Termination Grace Period: Specifies the grace period for terminating the Fluentd Pods.
Volumes: Defines two hostPath volumes (varlog and dockercontainerlogdirectory) to mount host directories into the Fluentd Pods for log collection.

Automatic Scheduling of DaemonSets:

Automatic scheduling of DaemonSets refers to the process by which Kubernetes ensures that a copy of a specific Pod runs on every node within the cluster without manual intervention. This feature is intrinsic to the nature of DaemonSets and is essential for deploying system daemons or agents that need to be present on all nodes for various tasks such as log collection, monitoring, or networking.

When a DaemonSet is created in a Kubernetes cluster, the Kubernetes control plane automatically schedules a Pod matching the DaemonSet’s specification on each node. This ensures that the desired state defined by the DaemonSet is met across the entire cluster, regardless of its size or changes in the number of nodes.

The automatic scheduling of DaemonSets involves the following steps:

Pod Creation: When a DaemonSet is created or updated, Kubernetes creates or updates a Pod for each node in the cluster based on the DaemonSet’s Pod template.
Node Selection: Kubernetes selects nodes in the cluster based on various factors such as node labels, node selectors, or node affinity/anti-affinity rules specified in the DaemonSet manifest.
Pod Deployment: Kubernetes deploys the DaemonSet Pods to the selected nodes, ensuring that each node has a copy of the Pod running.
Dynamic Adaptation: As nodes are added to or removed from the cluster, Kubernetes dynamically adapts the deployment of DaemonSet Pods to maintain the desired state across all nodes.

Automatic scheduling of DaemonSets simplifies the deployment and management of cluster-wide workloads, as it eliminates the need for manual intervention to ensure that system daemons or agents are running on all nodes. It enables seamless scaling of applications across the cluster and ensures consistent behavior regardless of changes in the cluster’s topology. This feature is particularly useful for scenarios where certain tasks need to be performed uniformly across all nodes, such as log collection, monitoring, or network configuration.

Flexible Pod Scheduling: Affinity, Taints and Tolerations

In Kubernetes, flexible pod scheduling using taints and tolerations allows users to control which pods can be scheduled onto which nodes within the cluster. This feature is essential for scenarios where certain nodes need to be reserved for specific workloads or where certain nodes should avoid running particular types of workloads. For example: Node with GPU would be able to run an ML workload better, we should not run them in Nodes without GPU.

a. Affinity Rules

You can also use AFFINITY RULES in Kubernetes which allows you to influence the scheduling of pods based on node attributes such as labels, node names, or other pod annotations. Here’s an example of how you can use affinity rules to schedule DaemonSets:

YAML

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: "k8s-app"
                operator: In
                values:
                - fluentd
            topologyKey: "kubernetes.io/hostname"
      containers:
        - name: fluentd-elasticsearch
          image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
          securityContext:
            allowPrivilegeEscalation: false
          volumeMounts:
            - name: varlog
              mountPath: /var/log
      volumes:
        - name: varlog
          hostPath:
            path: /var/log

In this example, we’re using the podAntiAffinity rule to ensure that the Fluentd pods are not scheduled on the same node.

affinity: Specifies the affinity rules for pod scheduling.
podAntiAffinity: Specifies pod anti-affinity rules, indicating that DaemonSet pods should not be colocated with pods that match certain criteria.
requiredDuringSchedulingIgnoredDuringExecution: Indicates that the specified anti-affinity rules must be satisfied during pod scheduling, but they can be ignored during pod execution.
labelSelector: Defines the label selector used to identify pods that DaemonSet pods should not be colocated with. In this example, pods with the label key k8s-app and value fluentd are selected.
topologyKey: Specifies the topology key used to determine the topology domain. In this case, the hostname of the node (kubernetes.io/hostname) is used as the topology key.

b. Taints:

A taint is a label applied to a node that marks it as unsuitable for certain types of pods. When a node is tainted, it indicates that the node should not accept pods that do not tolerate the taint. Taints are used to repel specific types of workloads from nodes, such as critical system components or high-resource-consuming applications. Example of applying a taint to a Node.

kubectl taint nodes <node-name> key=value:taint-effect

<node-name>: The name of the node to taint.
key=value: The key-value pair representing the taint.
taint-effect: The effect of the taint, which can be one of NoSchedule, PreferNoSchedule, or NoExecute. This determines whether pods will be scheduled onto the node or evicted from it.

c. Tolerations:

A toleration is a specification within a pod’s configuration that allows the pod to tolerate the specified taints on nodes. By adding tolerations to pod specifications, users can control which nodes are eligible for scheduling their pods, even if the nodes have been tainted.

tolerations:
- key: "key"
  operator: "Equal"
  value: "value"
  effect: "NoSchedule"

In the first fluentd fluentd-elasticsearch-daemonset.yml, you noticed:

      tolerations:
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
        - key: node-role.kubernetes.io/master
          effect: NoSchedule

In the above-provided tolerations configuration allows the DaemonSet pods to tolerate taints applied to nodes with specific roles. Specifically, it allows DaemonSet pods to schedule on nodes tainted with the node-role.kubernetes.io/control-plane and node-role.kubernetes.io/master taints, both with the NoSchedule effect.

Here’s what each part of the configuration means:

key: Specifies the key of the taint. In this case, the taints are associated with node roles.
effect: Specifies the effect of the taint. The NoSchedule effect prevents new pods from being scheduled onto nodes with the specified taints.

The provided configuration ensures that DaemonSet pods can schedule on nodes even if they are tainted with the control plane or master taints. This can be useful for ensuring that critical system components deployed as DaemonSet pods have the necessary resources and privileges to run on nodes with such taints, regardless of their roles within the cluster.

d. Remove a Taint:

To delete a taint from a node in Kubernetes, you can use the kubectl taint command with the -.:

Suppose you have a node named node1 with a taint key1=value1:NoSchedule. To delete this taint, you can run the following command:

$ kubectl taint nodes node1 key1=value1:NoSchedule-

$ kubectl taint nodes node1 key1-

This command removes the taint key1=value1:NoSchedule from the node node1. The - symbol after the taint specifies that the taint should be removed.

Update and Rollbacks of deployments

You can manage the no. of revisionHistoryLimit that the DaemonSets Controller will retain for rollback purposes.

fluentd-elasticsearch-daemonset.yml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
    version: v1
spec:
  revisionHistoryLimit: 5
  selector:
  ....
  ....

Let us say, you decided to change the fluentd image. You can either change the image tag of fluentd-elasticsearch-daemonset.yml or do it manually as below:

$ kubectl set image daemonset/fluentd-elasticsearch fluentd-elasticsearch=quay.io/fluentd_elasticsearch/fluentd:v2.6.0 -n kube-system

To Rollback:

To rollback to the previous version of the FluentD DaemonSet, you can run the following command:

$ kubectl rollout undo daemonset/fluentd-elasticsearch -n kube-system

To rollback to revision 1 of the FluentD DaemonSet, you can run the following command:

kubectl rollout undo daemonset/fluentd-elasticsearch --to-revision=1 -n kube-system

To check history:

You can check the rollout history of the FluentD DaemonSet using the kubectl rollout history command. For example, to check the rollout history of the FluentD DaemonSet, you can run the following command:

$ kubectl rollout history daemonset/fluentd-elasticsearch -n kube-system

DaemonSet Pod Priority – `PriorityClass`

In the case of a DaemonSet, the priority of the pods is set in the pod template. When you create a DaemonSet, the Kubernetes control plane schedules the pods on each node that matches the node selector. If there are not enough resources on a node to schedule a pod with the specified priority, the Kubernetes scheduler may preempt (evict) some of the existing pods on that node based on their priority.

Therefore, if you want your Daemonsets to be stable and not evicted during a node resource crunch, you need to set a higher pod priority class to the Daemonset. This way, the Kubernetes scheduler will prioritize the Daemonset pods over other pods with lower priority when scheduling and preempting pods.

In Kubernetes, you can set the priority of a DaemonSet pod using the priorityClassName field in the pod template. Here’s an example:

YAML

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      priorityClassName: high-priority
      containers:
        - name: fluentd-elasticsearch

In this example, the priorityClassName field is set to high-priority, which is the name of a PriorityClass object that you need to create in your cluster. Here’s an example of a PriorityClass object with a priority value of 1000000:

fluentd-priority-class.yml

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority-pod
value: 1000000
globalDefault: false
description: >-
  (Optional) This priority class should only be used for high-priority pods.

You can create this PriorityClass object using kubectl:

$ kubectl apply -f fluentd-priority-class.yml

$ kubectl get pc
NAME                      VALUE        GLOBAL-DEFAULT   AGE
high-priority-pod         1000000      false            3m13s

After you have one or more PriorityClasses, you can create Pods that specify one of those PriorityClass names in their specifications. The priority admission controller uses the priorityClassName field and populates the integer value of the priority. If the priority class is not found, the Pod is rejected.

Daemonset Troubleshooting

To print the taints of a node, you can use the describe command:

$ kubectl describe node master-node-1

..
Taints:             node-role.kubernetes.io/master:NoSchedule
..

Check the status of the DaemonSet and the pods, you should see the Fluentd pods running on the available worker nodes.:

kubectl get daemonset -n logging
kubectl get pods -n logging -o wide

Here are some useful commands to describe, edit, and get the DaemonSet:

kubectl describe daemonset -n logging
kubectl edit daemonset -n logging
kubectl get ds -n logging

Taints and tolerations are a feature in Kubernetes that allows you to ensure that pods are not placed on inappropriate nodes. Here’s an example of how to taint a node and add tolerations to the pod schema:

kubectl taint nodes node1 key1=value1:<Effect>

Daemonset Best Practices:

DaemonSet pods must have Restart Policy set to Always or unspecified.
Separate each DaemonSet into its own namespace to ensure clear isolation and easier resource management.
The priority should be 10000. It is not advised for DaemonSet Pods to be evicted from cluster nodes
Use labels and selectors effectively to ensure DaemonSet Pods are scheduled on the appropriate nodes.
Monitor resource utilization to ensure DaemonSet Pods are evenly distributed across the cluster.
Regularly update DaemonSet Pods to apply security patches and feature enhancements.
Implement health checks and readiness probes to ensure the stability and reliability of DaemonSet Pods.

Conclusion:

DaemonSets are a powerful resource in Kubernetes for deploying and managing cluster-wide workloads. It ensures that a specific Pod runs on every node within the cluster. DaemonSets enable seamless deployment of system daemons, monitoring agents, logging agents, and other essential components. Understanding DaemonSets and their practical applications is essential for Kubernetes administrators and operators looking to effectively manage and scale their Kubernetes clusters.

By Bikram Kundu|Last Updated: April 18th, 2024|Categories: Kubernetes|

Join on Gitter

Github