msft-preview: 3.31.0 rebase#448
Conversation
eb49730 to
ac1ca3d
Compare
|
todo: without ac1ca3d , uvm build fails with https://dev.azure.com/mariner-org/mariner/_build/results?buildId=1122834&view=logs&j=bbe135f3-541e-58e6-8b02-61d4669cdf5f&t=5095dea9-1b86-57b3-726c-ffe250f063d1&l=1963 . Figure out what changed, if we need to upstream this:
|
e84953f to
f7bb7a0
Compare
f7bb7a0 to
7d0903d
Compare
Mystery solved. Due to shellcheck fix in kata-containers@6471894#diff-9cff33aa4403bdc6b3a82a5fb4abd4aa4c6038938f8d0f63bbf9c5e9456f7695 , particularly updating to make us fail due to not copying VERSION during package_tools_install.sh. So we'll copy that now. Demo showing change in behaviour. |
7d0903d to
d87a305
Compare
a543c5f to
7977900
Compare
|
DO NOT MERGE: we'll force push to msft-preview instead once approved |
7977900 to
de5785f
Compare
sprt
left a comment
There was a problem hiding this comment.
This almost LGTM! A few asks:
- Can you change the first commit message to this:
ci: switch default branch to msft-preview
* Update the default branch to msft-preview in different places for
the CI to work with our fork.
* Add the MSFT-required SECURITY.md and corresponding dictionary entries.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
- Can you link to passing BB/conformance/performance test runs in the PR description?
Re: the actual merge, I actually think it's better to lean even more into this up/3.31.0 branch to have a paper trail. So I removed the WIP label and when this PR is fully ready, let us:
- Ensure the gatekeeper is green.
- Get approvals.
- Merge this PR into up/3.31.0 as is (WITH the last commit with the temporary up/3.31.0 CI branch changes).
- Then we can just do the following to force push msft-preview:
git checkout msft-preview
git reset --hard up/3.31.0 # This clones up/3.31.0 into msft-preview
git reset --hard HEAD^ # This removes the last commit with the temporary up/3.31.0 CI branch changes
git push -f # Push the new msft-previewde5785f to
f5bb7ae
Compare
addressed and trying to re-require a few more tests (will squash if it works) |
ade9dad to
c49acbe
Compare
c49acbe to
8c8ccbf
Compare
* Update the default branch to msft-preview in different places for the CI to work with our fork. * Add the MSFT-required SECURITY.md and corresponding dictionary entries. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
8c8ccbf to
89a7d3f
Compare
sprt
left a comment
There was a problem hiding this comment.
Just one nit as we wait for the test results!
For runtime-go and runtime-rs. See below for details
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
tools: Add initial igvm-builder and node-builder/azure-linux scripting
This branch starts introducing additional scripting to build, deploy
and evaluate the components used in AKS' Pod Sandboxing and
Confidential Containers preview features. This includes the capability
to build the IGVM file and its reference measurement file for remote
attestation.
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
tools: Improve igvm-builder and node-builder/azure-linux scripting
- Support for Mariner 3 builds using OS_VERSION variable
- Improvements to IGVM build process and flow as described in README
- Adoption of using only cloud-hypervisor-cvm on CBL-Mariner
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
tools: Add package-tools-install functionality
- Add script to install kata-containers(-cc)-tools bits
- Minor improvements in README.md
- Minor fix in package_install
- Remove echo outputs in package_build
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
tools: Enable setting IGVM SVN
- Allow setting SVN parameter for IGVM build scripting
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
node-builder: introduce BUILD_TYPE variable
This lets developers build and deploy Kata in debug mode without having to make
manual edits to the build scripts.
With BUILD_TYPE=debug (default is release):
* The agent is built in debug mode.
* The agent is built with a permissive policy (using allow-all.rego).
* The shim debug config file is used, ie. we create the symlink
configuration-clh-snp-debug.toml <- configuration-clh-snp.toml.
For example, building and deploying Kata-CC in debug mode is now as simple as:
make BUILD_TYPE=debug all-confpods deploy-confpods
Also do note that make still lets you override the other variables even after
setting BUILD_TYPE. For example, you can use the production shim config with
BUILD_TYPE=debug:
make BUILD_TYPE=debug SHIM_USE_DEBUG_CONFIG=no all-confpods deploy-confpods
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
node-builder: introduce SHIM_REDEPLOY_CONFIG
See README: when SHIM_REDEPLOY_CONFIG=no, the shim configuration is NOT
redeployed, so that potential config changes made directly on the host
during development aren't lost.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
node-builder: Use img for Pod Sandboxing
Switch from UVM initrd to image format
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
node-builder: Adapt README instructions
- Sanitize containerd config snippet
- Set podOverhead for Kata runtime class
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
tools: Adapt AGENT_POLICY_FILE path
- Adapt path in uvm_build.sh script to comply
with the usptream changes we pulled in
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
node-builder: Use Azure Linux 3 as default path
- update recipe and node-builder scripting
- change default value on rootfs-builder
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
node-builder: Deploy-only for AzL3 VMs
- split deployment sections in node-builder README.md
- install jq, curl dependencies within IGVM script
- add path parameter to UVM install script
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
node-builder: Minor updates to README.md
- no longer install make package, is part of meta package
- remove superfluous popd
- add note on permissive policy for ConfPods UVM builds
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
node-builder: Updates to README.md
- with the latest 3.2.0.azl4 package on PMC, can remove OS_VERSION parameter
and use the make deploy calls instead of copying files by hand for variant
I (now aligned with Variant II)
- with the latest changes on msft-main, set the podOverhead to 600Mi
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
node-builder: Fix SHIM_USE_DEBUG_CONFIG behavior
Using a symlink would create a cycle after calling this script again when
copying the final configuration at line 74 so we just use cp instead.
Also, I moved this block to the end of the file to properly override the final
config file.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
node-builder: Build and install debug configuration for pod sandboxing
For ease of debugging, install a configuration-clh-debug.toml for pod
sandboxing as we do in Conf pods.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
runtime: remove clh-snp config file usage in makefile
Not needed to build vanilla kata
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
package_tools_install.sh: include nsdax.gpl.c
Include nsdax.gpl.c
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
node-builder: fix typo in string comparison
This also fixes a shellcheck error and lets us require the
shellcheck-required job:
In ./tools/osbuilder/node-builder/azure-linux/uvm_build.sh line 34:
if [ -z "${UVM_KERNEL_HEADER_DIR}}" ]; then
^-- SC2157 (error): Argument to -z is always false due to literal strings.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
docs: node-builder: fix static check error
This fixes the below static check error to follow up on the infra fix from
kata-containers#11646:
2025-07-31T19:32:45.0031829Z time="2025-07-31T19:32:44.990004665Z" level=fatal msg="found 2 parse errors:\nfile=\"tools/osbuilder/node-builder/azure-linux/README.md\": duplicate heading: \"Set up environment\" (heading: {Name:Set up environment MDName:Set up environment LinkName:set-up-environment Level:2})\nfile=\"tools/osbuilder/node-builder/azure-linux/README.md\": duplicate heading: \"Install build dependencies\" (heading: {Name:Install build dependencies MDName:Install build dependencies LinkName:install-build-dependencies Level:2})" commit=1d17f56b1aa7a880468b8e25d14467c92dca8eeb name=kata-check-markdown pid=9075 source=check-markdown version=0.0.1
Note: that is likely flagged because having two headings with the same
name, even under different sections, makes it impossible to create a
canonical heading link in Markdown.
This should eventually be squashed into the node-builder commit.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
docs: node-builder: Remove references to moby-containerd-cc
As we adopted containerd2, we remove references to our prior
forked containerd version.
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
node-builder: 2Mb aligned guest image size
Build the mariner guest image using IMAGE_SIZE_ALIGNMENT_MB=2.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
to-squash: node-builder: add reference to README.md
This is needed to avoid the following static-checks error:
2025-08-05T21:27:20.0028337Z [static-checks.sh:808] ERROR: Document tools/osbuilder/node-builder/azure-linux/README.md is not referenced
This commit is to be squashed into the node-builder commit.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
node-builder: build and install runtime-rs
Build and install both runtime-rs and runtime-go configs and binaries side by side:
- runtime-go:
/usr/local/bin/containerd-shim-kata-v2-go
/usr/local/share/defaults/kata-containers/configuration-clh.toml
/usr/local/share/defaults/kata-containers/configuration-clh-debug.toml
- runtime-rs:
/usr/local/bin/containerd-shim-kata-v2-rs
/usr/local/share/defaults/kata-containers/configuration-cloud-hypervisor.toml
/usr/local/share/defaults/kata-containers/configuration-cloud-hypervisor-debug.toml
Also add USE_RUNTIME_RS variable and default to "yes". This controls which runtime binary and configuration will be installed
to /usr/local/bin/containerd-shim-kata-v2 and /usr/local/share/defaults/kata-containers/configuration.toml respectively.
Also install kata-ctl (runtime-rs equivalent of kata-runtime) so we can exec into the UVM when using runtime-rs
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
- if no limits are specified, assign a default static amount of memory (512Mi) and vcpu (1) to the UVM - if limits are specified, use those limit values for the UVM resources (don't add any extra) Signed-off-by: Saul Paredes <saulparedes@microsoft.com> runtime: Resolve high UVM memory footprint Bug: https://microsoft.visualstudio.com/OS/_workitems/edit/43668151 Rationale: This is a temporary solution for optimizing memory usage for the current mechanism of requesting resources through pod Limit annotations: - if no Limits are specified and hence WorkloadMemMB is 0, set a default value 'StaticWorkloadDefaultMem' to allocate a default amount of memory for use for containers in the sandbox in addition to the base memory - if Limits are specified, the base memory and the sum of Limits are allocated. The end user needs to be aware of the minimum memory requirements for their pods, otherwise the pod will be stuck in the ContainerCreating state Testing: Manual testing, creating pods with Limits and without limits, and with two containers where each container has a limit, tested with integration in a SPEC file where the config variables were set via environment variables via the make command Adapted by @mfrw from 3.1.0 to apply to 3.2.0 Signed-off-by: Muhammad Falak R Wani <mwani@microsoft.com> Signed-off-by: Manuel Huber <mahuber@microsoft.com> runtime: Remove unused VMM options for mem alloc - We only ever tested these fork changes with CLH+MSHV - Remove these options as we don't use QEMU/FC Signed-off-by: Manuel Huber <mahuber@microsoft.com> runtime: improved memory overhead management After these changes: 1. The value of the K8s runtime class memory overhead: - Covers the memory usage from all the Host-side components (mainly the Kata Shim and the VMM). - Doesn't include the memory usage from any Guest-side components. 2. The value of a pod memory limit specified by the user: - Is equal to the memory size of the Pod VM. - Includes the memory usage from all the Guest-side components (mainly user's workload, the Guest kernel, and the Kata Agent) - Doesn't include the memory usage from any Host-side components. Signed-off-by: Dan Mihai <dmihai@microsoft.com> runtime: fix `make test` This addresses the following errors from `make test` to allow us to require that upstream CI: https://github.com/microsoft/kata-containers/actions/runs/16656407213/job/47142422035?pr=392#step:13:53 Signed-off-by: Aurélien Bombo <abombo@microsoft.com> runtime: Allocate default workload vcpus - similar to the static_sandbox_default_workload_mem option, assign a default number of vcpus to the VM when no limits are given, 1 vcpu in this case - similar to commit c7b8ee9, do not allocate additional vcpus when limits are provided Signed-off-by: Manuel Huber <mahuber@microsoft.com>
- if no limits are specified, assign a default static amount of memory (512Mi) and vcpu (1) to the UVM - if limits are specified, use those limit values for the UVM resources (don't add any extra) Signed-off-by: Saul Paredes <saulparedes@microsoft.com> runtime-rs: Resolve high UVM memory footprint This is a port from b03db3e into runtime-rs Rationale: This is a temporary solution for optimizing memory usage for the current mechanism of requesting resources through pod Limit annotations: - if no Limits are specified and hence WorkloadMemMB is 0, set a default value 'StaticWorkloadDefaultMem' to allocate a default amount of memory for use for containers in the sandbox in addition to the base memory - if Limits are specified, the base memory and the sum of Limits are allocated. The end user needs to be aware of the minimum memory requirements for their pods, otherwise the pod will be stuck in the ContainerCreating state Testing: Manual testing, creating pods with Limits and without limits, and with two containers where each container has a limit, tested with integration in a SPEC file where the config variables were set via environment variables via the make command Signed-off-by: Saul Paredes <saulparedes@microsoft.com> runtime-rs: improved memory overhead management This is a port from 7ddec33 into runtime-rs After these changes: 1. The value of the K8s runtime class memory overhead: - Covers the memory usage from all the Host-side components (mainly the Kata Shim and the VMM). - Doesn't include the memory usage from any Guest-side components. 2. The value of a pod memory limit specified by the user: - Is equal to the memory size of the Pod VM. - Includes the memory usage from all the Guest-side components (mainly user's workload, the Guest kernel, and the Kata Agent) - Doesn't include the memory usage from any Host-side components. Signed-off-by: Saul Paredes <saulparedes@microsoft.com> runtime-rs: Allocate default workload vcpus This is a port from 9af9844 Plus ports an existing behaviour from runtime-go to also add the vcpus. See https://github.com/fidencio/kata-containers/blob/e2476f587c472d5d217df9c75cdb80193dd85994/src/runtime/pkg/oci/utils.go#L1232 - similar to the static_sandbox_default_workload_mem option, assign a default number of vcpus to the VM when no limits are given, 1 vcpu in this case - similar to commit c7b8ee9, do not allocate additional vcpus when limits are provided Signed-off-by: Saul Paredes <saulparedes@microsoft.com> runtime-rs: add test coverage for static resource management If using static management and initial size manager uses 0 for CPU or memory, we add default static values to the hv config Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
- tests that deploy pods with too small of a memory limit - tests try to set a minimum memory limit for some containerd tests - tests that use runners we don't have - tests that depend on pushing to GHCR - disable Kata Containers CI / kata-containers-ci-on-push / run-kata-deploy-tests / run-kata-deploy-tests (qemu, k3s) Also disable these for runtime-rs that fail due to resource management patches: - run-nerdctl-tests (dragonball) - run-nydus (active, dragonball) - run-nydus (lts, dragonball) Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
If memory limit is set and less than minimum, set it to minimum. This is to to account for kata-containers@0ec3403 Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Temporary patch to test rebase Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
89a7d3f to
288cd70
Compare
sprt
left a comment
There was a problem hiding this comment.
LGTM, understanding that we have one pending regression that will be prioritized right after this.
https://dev.azure.com/mariner-org/container-runtime/_workitems/edit/20444 |
|
I would prefer to have commit SHA values next to description like runtime: Remove unused VMM options for mem alloc
Signed-off-by: Manuel Huber mahuber@microsoft.com But, missing that information is not a big deal, if it's relatively difficult to include. |
|
I would prefer to see in the commit description of the "webhook: enforce minimum memory limit" change a more detailed explanation of the goals for this change, what fails if we don't make the change, that the change is specific to testing, etc. |
|
@danmihai1 I'll address these next rebase |
Test Methodology
CI: https://dev.azure.com/mariner-org/mariner/_build/results?buildId=1127027&view=logs&j=1d103282-d184-539c-6f02-9cecc7887239&t=7581b7d8-55b2-5d9a-91e4-4f9c30e5e89d:
How are current diff (msft-preview) looks like for comparison: #451
Worth to note changes:
cloud-hypervisortoclh-runtime-rs. This affects our node builder recipes at the time of copying the configs