Add resource limit support to podman exec #28919
Conversation
This commit adds support for constraining resource usage (CPU, memory, cpuset) of processes started via 'podman exec' by placing them in dedicated cgroups with specified limits. Key changes: - Added --cpu-quota, --cpu-period, --cpuset-cpus, and --memory flags to podman exec command - Implemented cgroup setup with proper controller delegation handling - Added ExecResourceLimits entity and CgroupPath field to ExecConfig - Runtime passes --cgroup flag to OCI runtime (crun/runc) - Added comprehensive tests for ExecConfig serialization Technical implementation: - Creates child cgroup under container's scope (e.g., scope/exec-<timestamp>) - Enables required controllers in scope's cgroup.subtree_control - Applies resource limits to cgroup interface files - Passes relative cgroup path (../exec-<timestamp>) to runtime - Automatic cleanup when exec session ends - Uses nanosecond precision for exec IDs to prevent collisions - Validates cgroups v2 availability upfront with clear error messages - Proper error handling for controller delegation failures Cgroup structure: - Container scope: machine.slice/libpod-<id>.scope - Container cgroup: machine.slice/libpod-<id>.scope/container - Exec cgroup: machine.slice/libpod-<id>.scope/exec-<nano-timestamp> - Relative path to runtime: ../exec-<timestamp> (from container cgroup) Fixes address review feedback: - Exec ID now uses nanosecond precision to prevent concurrent collisions - Added cgroups v2 validation with clear error messages - Improved error clarity for controller delegation failures - Fixed cgroup path to create exec as sibling of container, not child Tested with crun 1.27.1 on cgroups v2 systems. Signed-off-by: Ranjith Rajaram <ranjith@redhat.com>
| return ctr.ID(), nil | ||
| } | ||
|
|
||
| // setupExecCgroup creates a sub-cgroup for the exec process and applies resource limits |
There was a problem hiding this comment.
All of this should be in Libpod, in the container exec logic. As written, this only works with local Podman, not remote Podman.
| } | ||
|
|
||
| // Get container's cgroup path (relative, e.g., "user.slice/user-1000.slice/container_id") | ||
| cgroupPath, err := ctr.CgroupPath() |
There was a problem hiding this comment.
When the Systemd cgroup driver is in use, we should not be doing this manually, but asking systemd to make the scope for us.
|
Does this actually work? I don't see us actually placing the PID of the exec session in the cgroup anywhere. I also have some reservations about doing this outside of the OCI runtime. Are we moving the process into a new cgroup after the runtime has already created it? That could be an issue. Finally, now that I'm thinking about it: since this is a sub-cgroup of the container cgroup, won't the container be able to edit the resource limits in it (under at least some circumstances... thinking about systemd in a container here) @giuseppe PTAL |
|
I haven't looked in details but we got something similar few months ago. No, this won't work because it must be done in the OCI runtime and it must handle/create sub-cgroups. Have you even tried running it? |
|
OCI runtime spec currently has no mechanism to pass cgroup resource limits for exec. This path will require RFC I suppose We can close this PR. Thanks for reviewing it To answer the questions around "does this actually work" and "have you ever tried running it": Yes, I have tested this implementation in a rootless environment on a cgroups v2 system using
EXEC_CGROUP=$(find /sys/fs/cgroup/user.slice/user-10352.slice/user@10352.service/user.slice/libpod-/exec- -type d 2>/dev/null | head -1) echo "Exec cgroup: $EXEC_CGROUP"
Exec cgroup: /sys/fs/cgroup/user.slice/user-10352.slice/user@10352.service/user.slice/libpod-cdd770f86806ed1f3bc620bc8e70cbdfb72b36a1cc44fa5a54627b3f95791613.scope/exec-6b6b72b9a7069d4e |
This commit adds support for constraining resource usage (CPU, memory, cpuset) of processes started via 'podman exec' by placing them in dedicated cgroups with specified limits.
Key changes:
Technical implementation:
Cgroup structure:
Tested with crun 1.27.1 on cgroups v2 systems.
Does this PR introduce a user-facing change?
Yes. The PR adds four new flags to podman exec:
--cpu-quota — limit CPU CFS quota (microseconds)
--cpu-period — set CPU CFS period (microseconds)
--memory — limit memory (e.g., 512m, 2g)
--cpuset-cpus — restrict to specific CPUs (e.g., 0-3, 0,2,4)
These allow users to constrain resource usage of exec'd processes via cgroups v2. The flags are local-mode only (hidden in remote mode). Documentation is added in podman-exec.1.md.in
release-note
Podman exec now supports resource limits via
--cpu-quota,--cpu-period,--memory, and--cpuset-cpusflags, allowing users to constrain CPU and memory usage of exec'd processes using cgroups v2 (local mode only).