CronJobs in Kubernetes - In-Depth Guide

What is a CronJob?

A CronJob in Kubernetes is a higher-level API object used to run Jobs on a scheduled basis, similar to how the cron utility works in Unix/Linux. It allows you to automate recurring tasks such as database backups, sending reports, clearing caches, or syncing data at defined intervals.

📌 A CronJob creates a new Job resource according to the schedule you define. That Job, in turn, creates one or more Pods to run the actual workload.

Syntax Overview

Here is a minimal example of a CronJob YAML:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            command:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

Anatomy of a CronJob

Let’s break down each component in detail:

1. schedule

  • Format: Standard cron format ("minute hour day month day-of-week")
  • Example: "0 0 * * *" means once per day at midnight.
  • You can use special characters:
  • * — every possible value
  • */5 — every 5 units (e.g., every 5 minutes)
  • 1-5 — range (e.g., Mon–Fri)

2. jobTemplate

  • A CronJob doesn’t run containers directly. It creates Jobs, and Jobs create Pods.
  • This section defines the template for those Jobs, just like a regular Job YAML spec.

3. restartPolicy

  • Must be OnFailure or Never. Always is not allowed.
  • It tells Kubernetes what to do if the Pod exits unexpectedly.

Key CronJob Fields

1. startingDeadlineSeconds

  • Maximum time (in seconds) the system has to start a Job if it misses its schedule.
  • Useful when your cluster is under heavy load and a Job start is delayed.
  • Example: If this is set to 200, and the schedule is every 5 minutes, but the controller checks late by 210 seconds, it will skip that run.

2. concurrencyPolicy

Controls what happens if the previous Job hasn’t finished when the next one is scheduled.

  • Allow (default): Runs Jobs concurrently.
  • Forbid: Skips the new Job if the previous one hasn’t finished.
  • Replace: Deletes the currently running Job and replaces it with the new one.

3. suspend

  • Boolean field that disables a CronJob without deleting it.
  • Use case: Temporarily stop scheduling (e.g., for maintenance).

CronJob Execution Flow

  1. At scheduled time, CronJob controller checks if a Job needs to be created.
  2. If conditions are met (not suspended, not too late), a Job is created from the jobTemplate.
  3. That Job runs its Pods as usual.
  4. Success/failure is recorded. If TTL is set, the Job can be auto-deleted.

Real-World Example: Database Backup

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-job
spec:
  schedule: "0 */6 * * *"  # every 6 hours
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: my-backup-image
            args:
            - "/backup.sh"
          restartPolicy: OnFailure

CronJob Resource Management

1. successfulJobsHistoryLimit

  • Number of successful Jobs to retain in history.
  • Helps avoid clutter while allowing for some audit trail.

2. failedJobsHistoryLimit

  • Number of failed Jobs to retain.
  • Helps with debugging recurring failures.
spec:
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1

CronJobs and Timezones

  • Kubernetes CronJobs use the kube-controller-manager node’s time zone, which is usually UTC.
  • You cannot directly configure time zones per CronJob.
  • If you need timezone control:
  • Adjust schedule times manually.
  • Or run logic inside the container to sleep until the desired local time.

Cleanup and Lifecycle

  • CronJobs don’t clean up completed Jobs automatically unless TTL or history limits are set.
  • Best practices:
  • Use .spec.ttlSecondsAfterFinished inside jobTemplate to auto-delete Jobs.
  • Set successfulJobsHistoryLimit and failedJobsHistoryLimit to control retained Jobs.

Common Pitfalls

1. Time Drift / Missed Schedules

  • If a CronJob is skipped due to node or controller downtime, it won't retroactively catch up unless startingDeadlineSeconds is set.

2. Overlapping Jobs

  • If concurrencyPolicy is not set, multiple overlapping Jobs can cause resource contention.

3. Pod Failure and Retry

  • CronJobs depend on the Job retry logic (backoffLimit) and Pod restartPolicy.
  • Make sure failure handling is correctly configured.

When to Use CronJob vs Other Controllers

Use Case Recommended Resource
One-time task Job
Periodic scheduled task CronJob
Long-running services Deployment / StatefulSet
Real-time triggered tasks Event-based controller (e.g., Argo, Knative)

Best Practices Summary

  • Use restartPolicy: OnFailure.
  • Set backoffLimit to control retries.
  • Limit job history for better performance.
  • Use suspend: true to temporarily disable scheduling.
  • Monitor execution via Job/Pod logs.
  • Validate your cron expression with online tools.