Education & Careers

Kubernetes v1.36: Resizing Pod Resources on Suspended Jobs (Beta Guide)

2026-05-01 10:52:29

Overview

Starting with Kubernetes v1.36, the ability to modify container resource requests and limits in the pod template of a suspended Job has been promoted to beta. First introduced as alpha in v1.35, this feature empowers queue controllers and cluster administrators to dynamically adjust CPU, memory, GPU, and extended resource specifications on a Job while it is suspended, before it starts or resumes running. This capability addresses a long-standing pain point for batch and machine learning workloads, where optimal resource allocation often depends on real-time cluster capacity, queue priorities, and hardware availability.

Kubernetes v1.36: Resizing Pod Resources on Suspended Jobs (Beta Guide)

Before this feature, resource requirements in a Job's pod template were immutable once set. If a queue controller like Kueue determined that a suspended Job should run with different resources, the only option was to delete and recreate the Job, losing metadata, status, and history. With this new beta feature, you can now adjust resource allocations without destroying the Job, enabling more resilient and efficient scheduling—for example, allowing a specific CronJob instance to progress slowly with reduced resources rather than failing outright under heavy cluster load.

Prerequisites

To take advantage of this feature, ensure your environment meets the following:

Step-by-Step Guide

1. Create a Suspended Job

Start by defining a Job with spec.suspend: true and initial resource requests that may need adjustment later. Below is an example of a machine learning training Job requesting 4 GPUs:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job-example-abcd123
  labels:
    app.kubernetes.io/name: trainer
spec:
  suspend: true
  template:
    metadata:
      annotations:
        kubernetes.io/description: "ML training, ID abcd123"
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:2026-04-23T150405.678
        resources:
          requests:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
          limits:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
      restartPolicy: Never

Apply the Job:

kubectl apply -f training-job.yaml

2. Modify Resources While Suspended

Suppose your queue controller determines that only 2 GPUs are currently available. You can update the container resource requests and limits directly on the suspended Job. Use kubectl patch or edit the resource YAML. For example, to adjust CPU to 4, memory to 16Gi, and GPU to 2:

kubectl patch job training-job-example-abcd123 --type='merge' -p='{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "trainer",
          "resources": {
            "requests": {
              "cpu": "4",
              "memory": "16Gi",
              "example-hardware-vendor.com/gpu": "2"
            },
            "limits": {
              "cpu": "4",
              "memory": "16Gi",
              "example-hardware-vendor.com/gpu": "2"
            }
          }
        }]
      }
    }
  }
}'

The updated YAML would appear as:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job-example-abcd123
  labels:
    app.kubernetes.io/name: trainer
spec:
  suspend: true
  template:
    metadata:
      annotations:
        kubernetes.io/description: "ML training, ID abcd123"
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:2026-04-23T150405.678
        resources:
          requests:
            cpu: "4"
            memory: "16Gi"
            example-hardware-vendor.com/gpu: "2"
          limits:
            cpu: "4"
            memory: "16Gi"
            example-hardware-vendor.com/gpu: "2"
      restartPolicy: Never

3. Resume the Job

Once the resources are updated to match available capacity, resume the Job by setting spec.suspend to false:

kubectl patch job training-job-example-abcd123 --type='merge' -p='{"spec":{"suspend":false}}'

The Job controller will then create pods using the adjusted resource specifications. The previous modification is only allowed while the Job is suspended; after resumption, the pod template becomes immutable again (until the Job is suspended again, if ever).

4. Verify the Outcome

Check the Job's pods to confirm the new resource requests:

kubectl get pods -l job-name=training-job-example-abcd123 -o yaml | grep -A 10 resources

You should see the updated CPU, memory, and GPU values.

Common Mistakes

Summary

The beta promotion of mutable pod resources for suspended Jobs in Kubernetes v1.36 brings significant flexibility to batch and ML workloads. By allowing dynamic adjustments to CPU, memory, and extended resources (such as GPUs) before a Job resumes, administrators and queue controllers can optimize resource utilization without the overhead of deleting and recreating Jobs. This feature seamlessly integrates with the existing Job API—no new objects or CRDs required—and works with any controller that respects the spec.suspend field. As the feature matures, expect to see broader adoption in batch scheduling frameworks and improved efficiency in shared cluster environments.

Explore

Kubernetes and the Rise of Persistent AI Agents: How Agent Sandbox Bridges the Gap The Cyclical Evolution of Web Development: From Hacks to Standards 6 Crucial Things to Understand About Purdue Pharma's Dissolution and Settlement Navigating Stack Overflow’s March 2026 Update: Redesign, Open-Ended Questions, and Populist Badge Insights How Scientists Teleported a Photon's State Across 270 Meters: A Step-by-Step Breakdown