Skip to content

Add resource limit support to podman exec #28919

Closed
ranjithrajaram wants to merge 1 commit into
podman-container-tools:mainfrom
ranjithrajaram:exec-resource-limits
Closed

Add resource limit support to podman exec #28919
ranjithrajaram wants to merge 1 commit into
podman-container-tools:mainfrom
ranjithrajaram:exec-resource-limits

Conversation

@ranjithrajaram

@ranjithrajaram ranjithrajaram commented Jun 12, 2026

Copy link
Copy Markdown

This commit adds support for constraining resource usage (CPU, memory, cpuset) of processes started via 'podman exec' by placing them in dedicated cgroups with specified limits.

Key changes:

  • Added --cpu-quota, --cpu-period, --cpuset-cpus, and --memory flags to podman exec command
  • Implemented cgroup setup with proper controller delegation handling
  • Added ExecResourceLimits entity and CgroupPath field to ExecConfig
  • Runtime passes --cgroup flag to OCI runtime (crun/runc)
  • Added comprehensive tests for ExecConfig serialization

Technical implementation:

  • Creates child cgroup under container's scope (e.g., scope/exec-)
  • Enables required controllers in scope's cgroup.subtree_control
  • Applies resource limits to cgroup interface files
  • Passes relative cgroup path (../exec-) to runtime
  • Automatic cleanup when exec session ends
  • Uses nanosecond precision for exec IDs to prevent collisions
  • Validates cgroups v2 availability upfront with clear error messages
  • Proper error handling for controller delegation failures

Cgroup structure:

  • Container scope: machine.slice/libpod-.scope
  • Container cgroup: machine.slice/libpod-.scope/container
  • Exec cgroup: machine.slice/libpod-.scope/exec-
  • Relative path to runtime: ../exec- (from container cgroup)

Tested with crun 1.27.1 on cgroups v2 systems.

Does this PR introduce a user-facing change?

Yes. The PR adds four new flags to podman exec:

--cpu-quota — limit CPU CFS quota (microseconds)
--cpu-period — set CPU CFS period (microseconds)
--memory — limit memory (e.g., 512m, 2g)
--cpuset-cpus — restrict to specific CPUs (e.g., 0-3, 0,2,4)
These allow users to constrain resource usage of exec'd processes via cgroups v2. The flags are local-mode only (hidden in remote mode). Documentation is added in podman-exec.1.md.in

release-note

Podman exec now supports resource limits via --cpu-quota, --cpu-period, --memory, and --cpuset-cpus flags, allowing users to constrain CPU and memory usage of exec'd processes using cgroups v2 (local mode only).

This commit adds support for constraining resource usage (CPU, memory,
cpuset) of processes started via 'podman exec' by placing them in
dedicated cgroups with specified limits.

Key changes:
- Added --cpu-quota, --cpu-period, --cpuset-cpus, and --memory flags
  to podman exec command
- Implemented cgroup setup with proper controller delegation handling
- Added ExecResourceLimits entity and CgroupPath field to ExecConfig
- Runtime passes --cgroup flag to OCI runtime (crun/runc)
- Added comprehensive tests for ExecConfig serialization

Technical implementation:
- Creates child cgroup under container's scope (e.g., scope/exec-<timestamp>)
- Enables required controllers in scope's cgroup.subtree_control
- Applies resource limits to cgroup interface files
- Passes relative cgroup path (../exec-<timestamp>) to runtime
- Automatic cleanup when exec session ends
- Uses nanosecond precision for exec IDs to prevent collisions
- Validates cgroups v2 availability upfront with clear error messages
- Proper error handling for controller delegation failures

Cgroup structure:
- Container scope: machine.slice/libpod-<id>.scope
- Container cgroup: machine.slice/libpod-<id>.scope/container
- Exec cgroup: machine.slice/libpod-<id>.scope/exec-<nano-timestamp>
- Relative path to runtime: ../exec-<timestamp> (from container cgroup)

Fixes address review feedback:
- Exec ID now uses nanosecond precision to prevent concurrent collisions
- Added cgroups v2 validation with clear error messages
- Improved error clarity for controller delegation failures
- Fixed cgroup path to create exec as sibling of container, not child

Tested with crun 1.27.1 on cgroups v2 systems.

Signed-off-by: Ranjith Rajaram <ranjith@redhat.com>
@ranjithrajaram ranjithrajaram changed the title Add resource limit support to podman exec with comprehensive tests Add resource limit support to podman exec Jun 12, 2026
return ctr.ID(), nil
}

// setupExecCgroup creates a sub-cgroup for the exec process and applies resource limits

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this should be in Libpod, in the container exec logic. As written, this only works with local Podman, not remote Podman.

}

// Get container's cgroup path (relative, e.g., "user.slice/user-1000.slice/container_id")
cgroupPath, err := ctr.CgroupPath()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the Systemd cgroup driver is in use, we should not be doing this manually, but asking systemd to make the scope for us.

@mheon

mheon commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Does this actually work? I don't see us actually placing the PID of the exec session in the cgroup anywhere.

I also have some reservations about doing this outside of the OCI runtime. Are we moving the process into a new cgroup after the runtime has already created it? That could be an issue.

Finally, now that I'm thinking about it: since this is a sub-cgroup of the container cgroup, won't the container be able to edit the resource limits in it (under at least some circumstances... thinking about systemd in a container here)

@giuseppe PTAL

@giuseppe

Copy link
Copy Markdown
Contributor

I haven't looked in details but we got something similar few months ago.

No, this won't work because it must be done in the OCI runtime and it must handle/create sub-cgroups.

Have you even tried running it?

@ranjithrajaram

Copy link
Copy Markdown
Author

OCI runtime spec currently has no mechanism to pass cgroup resource limits for exec. This path will require RFC I suppose

We can close this PR. Thanks for reviewing it

To answer the questions around "does this actually work" and "have you ever tried running it":

Yes, I have tested this implementation in a rootless environment on a cgroups v2 system using crun and also for rootful. The limits are indeed correctly imposed. Here is a quick demonstration of the run and verification:

  1. Start the exec command with limits in the background:
    podman exec --cpuset-cpus "2,3" --memory 100m --cpu-quota 50000 redhat2 sleep 360 &
    
  2. Verify the generated exec sub-cgroup and applied limits on the host:

EXEC_CGROUP=$(find /sys/fs/cgroup/user.slice/user-10352.slice/user@10352.service/user.slice/libpod-/exec- -type d 2>/dev/null | head -1)

echo "Exec cgroup: $EXEC_CGROUP"
echo -n "CPUs: "; cat $EXEC_CGROUP/cpuset.cpus
echo -n "Memory: "; cat $EXEC_CGROUP/memory.max
echo -n "CPU quota: "; cat $EXEC_CGROUP/cpu.max
echo -n "Process in cgroup: "; cat $EXEC_CGROUP/cgroup.procs

  1. Outpuut

Exec cgroup: /sys/fs/cgroup/user.slice/user-10352.slice/user@10352.service/user.slice/libpod-cdd770f86806ed1f3bc620bc8e70cbdfb72b36a1cc44fa5a54627b3f95791613.scope/exec-6b6b72b9a7069d4e
CPUs: 2-3
Memory: 104857600
CPU quota: 50000 100000
Process in cgroup: 66836


@giuseppe giuseppe closed this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants