In Kubernetes, a Job is a resource used to run batch or one-time tasks. Unlike long-running applications managed by other controllers like Deployments or StatefulSets, Jobs are designed for short-lived tasks that are meant to run to completion. For example data processing, backups, or batch jobs. In this article, you will about Jobs in Kubernetes in detail.

What are Jobs in Kubernetes?

In Kubernetes, a Job is a type of workload that runs one or more Pods to completion. Unlike other workloads like Deployments or StatefulSets, which run Pods continuously, Jobs are designed to run a specific task or batch process and then stop.

  • Job internally runs the workload in a Pod.
  • Remember, once the activity is completed, the Pods running as part of the Job will be marked as completed, and then the assigned resources will be released.

Jobs are useful when you need to perform a task that has a start and an end. For example:

  • Running a database migration script
  • Processing a batch of data files
  • Executing a machine learning model training job
  • Performing one-time backup or cleanup operation

When to use Jobs:

In Kubernetes, you use Jobs when you want to run tasks or processes that need to be done only once or a finite number of times. Imagine you have something like a script or a program that needs to be run, like backing up data or running a maintenance task. You’d use a Job in Kubernetes to make sure a task gets done reliably, without you having to manually start it each time. Once the task is finished, the Job completes its job and stops running, just like when you finish a task, you move on to the next thing. So, in simple terms, Jobs are like little helpers in Kubernetes that do specific tasks for you and then go away when they’re done.

How Jobs Work in Kubernetes?

When you create a Job in Kubernetes, the Job controller will start one or more Pods based on the Job’s configuration. These Pods will run the specified container(s) and perform the desired task.

Once the task is completed successfully, the Job is considered complete, and the Pods are terminated. If a Pod fails or is terminated for any reason, the Job controller will start a new Pod to replace it, up to a specified maximum number of retries.

Jobs in Kubernetes are managed by the Job controller, which is part of the Kubernetes control plane. The Job controller is responsible for creating, monitoring, and managing the lifecycle of Jobs and their associated Pods. When you create a Job in Kubernetes by defining a Job manifest, the following sequence of events occurs:

  1. Job Creation: The Job manifest is submitted to the Kubernetes API server, which stores the Job object in the cluster’s etcd datastore.
  2. Job Controller Watch: The Job controller continuously watches for new Job objects in the etcd datastore using the Kubernetes API server.
  3. Job Validation: When the Job controller detects a new Job object, it validates the Job specification to ensure that it is well-formed and meets the necessary requirements.
  4. Pod Creation: Based on the Job specification, the Job controller creates one or more Pod objects using the provided Pod template. The number of Pods created initially is determined by the parallelism field in the Job spec.
  5. Pod Scheduling: The Kubernetes scheduler assigns the newly created Pods to available nodes in the cluster, based on various factors such as resource requirements, node affinity/anti-affinity rules, and other constraints defined in the Pod specification.
  6. Pod Execution: The assigned nodes start executing the Pods, which run the specified container(s) and perform the desired task or workload.
  7. Pod Monitoring: The Job controller continuously monitors the status of the Pods associated with the Job. If a Pod completes successfully, the Job controller increments the successful completion count. If a Pod fails or is terminated, the Job controller may create a new Pod to replace it, up to the specified backoffLimit.
  8. Job Completion: When the number of successful Pod completions reaches the value specified in the completions field, the Job is considered complete, and no new Pods will be created.
  9. Job Cleanup (Optional): If the ttlSecondsAfterFinished field is set in the Job spec, the Job controller will automatically delete the Job object and its associated Pods and logs after the specified time-to-live (TTL) duration has elapsed since the Job completed.

It’s important to note that the Job controller operates independently from the Pods themselves. The Job controller manages the creation, monitoring, and lifecycle of the Pods, but the actual execution of the workload within the Pods is handled by the container runtime (e.g., Docker, containerd) on the respective nodes.

The Job controller also interacts with other Kubernetes components, such as the API server and the scheduler, to create and manage Pods, and with the etcd datastore to store and retrieve Job and Pod objects.

How to Create a Job in Kubernetes?

Examples here are kept simple for you to understand the syntax of its creation, these are dummy examples.

Example-1

You will need to create a manifest file to create a Job in Kubernetes. Here is a Manifest file defining a Job named pi-job that runs a Perl script to calculate the value of pi to 2000 decimal places.

pi-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: pi-job
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl:5.34.0
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never

Let’s break down the Manifest file:

  • apiVersion: The API version to use for this job. In this case, we are using batch/v1.
  • kind: The kind of Kubernetes resource to create. Here, we are creating a Job.
  • metadata: Data that helps uniquely identify the Job, including a name.
  • spec: The specification of the Job, which includes:
  • template: The Pod template that defines the Pods that the Job creates.
  • spec: The specification of the Pod template, which includes:
  • containers: The containers to run in the Pods. Here, we have one container that runs the Perl script to calculate pi.
  • restartPolicy: The restart policy for the Pods. Here, we are using Never, which means that the Pods will not be restarted if they exit.

To create the Job, save the above Manifest file as pi-job.yaml and run the following command:

kubectl create -f pi-job.yaml

You can check the status of the Job using the following command:

kubectl get jobs

This will display the status of all Jobs in the cluster, including the pi-job we created.

To check the output of the pi-job that we created in the previous answer, you can use the kubectl logs command to view the logs of the container in the Pod created by the Job. Here’s an example command to view the logs of the pi container in the Pod created by the pi-job:

kubectl logs $(kubectl get pods -o jsonpath='{.items[0].metadata.name}' -l job-name=pi-job) -c pi

# Print the value of pi calculated to 2000 decimal places by the Perl script.

Example-2

Here’s an example of a Kubernetes Job that prints the first 10 prime numbers:

prime-numbers-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: prime-numbers-job
spec:
  template:
    spec:
      containers:
      - name: prime-numbers
        image: alpine:latest
        command: ["/bin/sh"]
        args: ["-c", "for i in $(seq 2 100); do if [[ $(seq 2 $i | wc -l) -eq $(expr $(echo $i % 2) + 1) ]]; then echo $i; fi; done"]
      restartPolicy: Never

This Job creates a Pod that runs an Alpine Linux container and executes a shell command to print the prime numbers between 2 and 100. The restartPolicy is set to Never, which means that the Job will not be retried if the Pod fails.

  • apiVersion: The API version of the Job.
  • kind: The kind of Kubernetes resource to create (in this case, a Job).
  • metadata: Metadata about the Job, such as its name.
  • spec: The specification of the Job.
  • template: The Pod template specification.
  • spec.containers: The container specifications for the Pod.
  • spec.containers.name: The name of the container.
  • spec.containers.image: The Docker image to use for the container.
  • spec.containers.command: The command to execute in the container.
  • spec.containers.args: The arguments to pass to the command.
  • spec.restartPolicy: The restart policy for the Pod.

Various Properties of Job Spec manifest file:

Here are various properties of the Job spec manifest file and their possible values:

  1. Parallelism: This field specifies the maximum number of Pods that the Job should run in parallel. The default value is 1. This will run 3 Pods in parallel for the Job.
YAML
parallelism: 3 # This will run 3 Pods in parallel for the Job.
  1. completions This field specifies the desired number of successful completions for the Job. The Job will be considered complete when this number of Pods have completed successfully. This Job will be considered complete when 5 Pods have completed successfully.
YAML
completions: 5
  1. activeDeadlineSeconds This field specifies the duration in seconds for the Job to be considered failed after its first Pod runs. If the Job isn’t completed within this duration, it will be terminated. This Job will be considered failed if it doesn’t complete within 1 hour (3600 seconds) after its first Pod runs.
YAML
activeDeadlineSeconds: 3600
  1. backoffLimit This field specifies the number of retries for a failed Job before it is considered a permanent failure. The default value is 6. This Job will be retried up to 3 times if it fails.
YAML
backoffLimit: 3
  1. manualSelector This field specifies a selector that can be used to manually select Pods that belong to the Job. This is useful when you want to manage Pods directly, rather than letting the Job controller manage them.
YAML
manualSelector: true
  1. template This field specifies the Pod template to be used for creating Pods for the Job. It includes fields like containers, volumes, and other Pod-level configurations.
YAML
template:
  spec:
    containers:
    - name: my-container
      image: my-image
      command: ["/bin/sh", "-c", "echo 'Hello, Kubernetes!' && sleep 30"]
    restartPolicy: Never
  1. ttlSecondsAfterFinished This field specifies the time-to-live (TTL) duration in seconds for the Job’s Pods after the Job has completed. After this duration, the Job’s Pods and their logs will be automatically deleted.
YAML
ttlSecondsAfterFinished: 600

Here’s an example of a Kubernetes Job manifest that makes use of all the properties we discussed:

YAML
apiVersion: batch/v1
kind: Job
metadata:
  name: my-complex-job
spec:
  parallelism: 3
  completions: 6
  activeDeadlineSeconds: 3600
  backoffLimit: 5
  manualSelector: true
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-container
        image: my-image
        command: ["/bin/sh", "-c", "echo 'Hello, Kubernetes!' && sleep 30"]
      restartPolicy: OnFailure
  ttlSecondsAfterFinished: 600

Kubectl commands to work with Kubernetes Job

Listing some kubectl commands for working with Jobs in Kubernetes :

  1. Create a Job:
   kubectl create job <job_name> --image=<container_image>

Example:

   kubectl create job my-job --image=nginx
  1. List all Jobs:
   kubectl get jobs
  1. Describe a Job:
   kubectl describe job <job_name>

Example:

   kubectl describe job my-job
  1. View logs of a Job:
   kubectl logs job/<job_name>

Example:

   kubectl logs job/my-job
  1. Delete a Job:
   kubectl delete job <job_name>

Example:

   kubectl delete job my-job
  1. Run a one-time Job:
   kubectl run <job_name> --image=<container_image> --restart=OnFailure -- <command>

Example:

   kubectl run my-one-time-job --image=busybox --restart=OnFailure -- echo "Hello, World!"
  1. Scale a Job:
   kubectl scale job <job_name> --replicas=<num_replicas>

Example:

   kubectl scale job my-job --replicas=3
  1. Run a Job from YAML file:
    Save the below YAML as my-job.yaml and then run:
   kubectl apply -f my-job.yaml

These commands will help you manage Jobs effectively in Kubernetes.

Troubleshooting Common Issues in Jobs

Here are some common troubleshooting issues with Kubernetes Jobs along with commands to diagnose and resolve them:

  1. Job is stuck or not progressing:
  • Check the status of the Job and its pods:
kubectl describe job <job_name>

kubectl get pods --selector=job-name=<job_name>
  • Inspect the logs of the pods to identify any errors:
kubectl logs <pod_name>
  1. Pods are failing to start:
  • Check the events associated with the Job and pods for any errors:
kubectl describe job <job_name>

kubectl describe pod <pod_name>
  • View the logs of the failing pods to identify the root cause:
kubectl logs <pod_name>
  1. Job is completed but failed:
  • Retrieve the logs of the Job to identify any errors during execution:
kubectl logs job/<job_name>
  • Inspect the status of the Job and pods to understand the reason for failure:
kubectl describe job <job_name>

kubectl describe pod <pod_name>
  1. Job is taking longer than expected:
  • Check the resource constraints and utilization of the nodes:
kubectl describe nodes
  • Monitor the events associated with the Job and pods for any delays:
kubectl describe job <job_name>

kubectl describe pod <pod_name>
  1. Resource constraints leading to Job failures:
  • Check the resource requests and limits specified for the Job and pods:
kubectl describe job <job_name>

kubectl describe pod <pod_name>
  • Adjust the resource requests and limits as needed to match the workload requirements.
  1. Network connectivity issues:
  • Verify that the pods can communicate with necessary services and external resources:
kubectl exec -it <pod_name> --<command_to_test_network_connectivity>
  • Check the network policies and firewall rules applied to the cluster.
  1. Image pull failures:
  • Inspect the image pull errors associated with the pods:
kubectl describe pod <pod_name>
  • Ensure that the container image specified in the Job is accessible and correctly configured.
  1. Cluster resource exhaustion:
  • Check the overall resource utilization of the cluster:
kubectl top nodes

kubectl top pods
  • Scale the cluster or adjust resource quotas if necessary to alleviate resource constraints.

By using these commands and strategies, you can effectively troubleshoot various issues that may arise with Kubernetes Jobs.

Conclusion:

In conclusion, Jobs in Kubernetes provide a flexible and powerful way to run batch processes, data processing tasks, and other workloads that have a defined start and end. By understanding how Jobs work and how to configure them, you can easily automate and manage these types of tasks in your Kubernetes cluster.

Jobs are particularly useful when you need to perform a specific task or process that doesn’t require continuous execution, such as running database migrations, processing data batches, training machine learning models, or performing backups and cleanup operations.

By |Last Updated: May 8th, 2024|Categories: Kubernetes|