-
Notifications
You must be signed in to change notification settings - Fork 49
fix(instancetypes): Protect instancetype/preference controllerRevision objects
#1556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(instancetypes): Protect instancetype/preference controllerRevision objects
#1556
Conversation
Signed-off-by: Javier Cano Cano <[email protected]>
…on` objects This protection ensures that instancetype and preference `controllerRevision` objects are not deleted before `VirtualMachine` objects. This is due to the fact that in some scenarios, such as when a VM is delete protected, if the `controllerRevision` is deleted but the VM object is still alive, users will not be able to patch or update the VM object. This leads to virtually deadlocking the VM. The VAP ensures that only the Kubernetes garbage collector and kubevirt-controller are able to delete these objects: * Garbage-collector: It deletes objects when the `ownerReference`'d object is removed, e.g., the instancetype "test-it" with `ownerReference: test-vm` is removed when the VM test-vm is removed. Therefore, by only allowing it to drop these objects, we ensure the deletion order. * Kubevirt-controller: It is required to no block this user because is the main owner of these `controllerRevision`, it just deletes them in upgrade scenarios, live migration, etc. Fix: https://issues.redhat.com/browse/CNV-69511 Signed-off-by: Javier Cano Cano <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `internal/operands/vm-delete-protection/reconcile.go:65` </location>
<code_context>
func (v *VMDeleteProtection) Name() string { return operandName }
-func reconcileVAP(request *common.Request) (common.ReconcileResult, error) {
+func reconcileVMDeletionProtectionVAP(request *common.Request) (common.ReconcileResult, error) {
return common.CreateOrUpdate(request).
- ClusterResource(newValidatingAdmissionPolicy()).
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the four similar reconciler functions into a single reusable helper to reduce boilerplate and improve maintainability.
Here’s one way to collapse all four of your almost‐identical reconcilers into a single helper plus four one-liners. First, extract the common “CreateOrUpdate→ClusterResource→WithAppLabels→UpdateFunc→[optional StatusFunc]→Reconcile” logic:
```go
// reconcileAdmissionObject handles both Policy and Binding reconciliation in one place.
func reconcileAdmissionObject(
request *common.Request,
obj client.Object,
statusFn func(client.Object) common.ResourceStatus,
) (common.ReconcileResult, error) {
builder := common.CreateOrUpdate(request).
ClusterResource(obj).
WithAppLabels(operandName, operandComponent).
UpdateFunc(func(expected, found client.Object) {
// copy .Spec for both Policy and Binding
switch e := expected.(type) {
case *admissionregistrationv1.ValidatingAdmissionPolicy:
found.(*admissionregistrationv1.ValidatingAdmissionPolicy).Spec = e.Spec
case *admissionregistrationv1.ValidatingAdmissionPolicyBinding:
found.(*admissionregistrationv1.ValidatingAdmissionPolicyBinding).Spec = e.Spec
}
})
if statusFn != nil {
builder = builder.StatusFunc(statusFn)
}
return builder.Reconcile()
}
```
Then each of your four reconcilers turns into:
```go
func reconcileVMDeletionProtectionVAP(request *common.Request) (common.ReconcileResult, error) {
return reconcileAdmissionObject(
request,
newVMDeletionProtectionValidatingAdmissionPolicy(),
func(o client.Object) common.ResourceStatus {
vap := o.(*admissionregistrationv1.ValidatingAdmissionPolicy)
if vap.Status.TypeChecking == nil {
msg := "VM-deletion VAP type checking in progress"
return common.ResourceStatus{NotAvailable: &msg, Degraded: &msg}
}
return common.ResourceStatus{}
},
)
}
func reconcileVMDeletionProtectionVAPB(request *common.Request) (common.ReconcileResult, error) {
return reconcileAdmissionObject(
request,
newVMDeletionProtectionValidatingAdmissionPolicyBinding(),
nil, // no status function on bindings
)
}
func reconcileInstancetypeControllerRevisionsVAP(request *common.Request) (common.ReconcileResult, error) {
return reconcileAdmissionObject(
request,
newInstancetypeControllerRevisionsValidatingAdmissionPolicy(request.Instance.Namespace),
func(o client.Object) common.ResourceStatus {
vap := o.(*admissionregistrationv1.ValidatingAdmissionPolicy)
if vap.Status.TypeChecking == nil {
msg := "Instancetype-controller revisions VAP type checking in progress"
return common.ResourceStatus{Progressing: &msg, NotAvailable: &msg, Degraded: &msg}
}
if len(vap.Status.TypeChecking.ExpressionWarnings) != 0 {
msg := fmt.Sprintf("Incorrect instancetype-controller revisions VAP CEL expression %v", vap.Status.TypeChecking)
return common.ResourceStatus{NotAvailable: &msg, Degraded: &msg}
}
return common.ResourceStatus{}
},
)
}
func reconcileInstancetypeControllerRevisionsVAPB(request *common.Request) (common.ReconcileResult, error) {
return reconcileAdmissionObject(
request,
newInstancetypeControllerRevisionsValidatingAdmissionPolicyBinding(),
nil,
)
}
```
This collapses ~30 lines of boilerplate into one reusable helper and four concise calls, preserving all existing behavior.
</issue_to_address>
### Comment 2
<location> `internal/operands/vm-delete-protection/resources.go:74` </location>
<code_context>
}
}
+
+func newInstancetypeControllerRevisionsValidatingAdmissionPolicy(namespace string) *admissionregistrationv1.ValidatingAdmissionPolicy {
+ return &admissionregistrationv1.ValidatingAdmissionPolicy{
+ ObjectMeta: metav1.ObjectMeta{
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring repeated struct construction into shared builder functions to reduce code duplication and improve readability.
Here’s one way to cut down on the literal noise by introducing two tiny “builder” helpers and then having your two functions just supply the bits that change:
```go
// builder.go
func buildPolicy(
name string,
failure *admissionregistrationv1.FailurePolicyType,
match *admissionregistrationv1.MatchResources,
vals []admissionregistrationv1.Validation,
) *admissionregistrationv1.ValidatingAdmissionPolicy {
return &admissionregistrationv1.ValidatingAdmissionPolicy{
ObjectMeta: metav1.ObjectMeta{Name: name},
Spec: admissionregistrationv1.ValidatingAdmissionPolicySpec{
FailurePolicy: failure,
MatchConstraints: match,
Validations: vals,
},
}
}
func buildBinding(
name, policyName string,
actions []admissionregistrationv1.ValidationAction,
match *admissionregistrationv1.MatchResources,
) *admissionregistrationv1.ValidatingAdmissionPolicyBinding {
return &admissionregistrationv1.ValidatingAdmissionPolicyBinding{
ObjectMeta: metav1.ObjectMeta{Name: name},
Spec: admissionregistrationv1.ValidatingAdmissionPolicyBindingSpec{
PolicyName: policyName,
ValidationActions: actions,
MatchResources: match,
},
}
}
```
Then your new functions shrink to:
```go
func newInstancetypeControllerRevisionsValidatingAdmissionPolicy(ns string) *admissionregistrationv1.ValidatingAdmissionPolicy {
match := &admissionregistrationv1.MatchResources{
ResourceRules: []admissionregistrationv1.NamedRuleWithOperations{{
RuleWithOperations: admissionregistrationv1.RuleWithOperations{
Operations: []admissionregistrationv1.OperationType{admissionregistrationv1.Delete},
Rule: admissionregistrationv1.Rule{
APIGroups: []string{"apps"},
APIVersions: []string{"v1"},
Resources: []string{"controllerrevisions"},
},
},
}},
}
val := admissionregistrationv1.Validation{
Expression: fmt.Sprintf(instancetypeControllerRevisionsCELExpressionTemplate, ns),
MessageExpression: "'Instancetype controller revision deletion is blocked only GC/kubevirt-controller: ' + string(request.userInfo.username)",
}
return buildPolicy(instancetypeControllerRevisionsPolicyName, ptr.To(admissionregistrationv1.Fail), match, []admissionregistrationv1.Validation{val})
}
func newInstancetypeControllerRevisionsValidatingAdmissionPolicyBinding() *admissionregistrationv1.ValidatingAdmissionPolicyBinding {
match := &admissionregistrationv1.MatchResources{
ObjectSelector: &metav1.LabelSelector{MatchExpressions: []metav1.LabelSelectorRequirement{
{Key: "instancetype.kubevirt.io/object-kind", Operator: metav1.LabelSelectorOpIn, Values: []string{
"VirtualMachineClusterInstancetype", "VirtualMachineClusterPreference",
"VirtualMachineInstancetype", "VirtualMachinePreference",
}},
}},
}
return buildBinding(
instancetypeControllerRevisionsPolicyName+"-binding",
instancetypeControllerRevisionsPolicyName,
[]admissionregistrationv1.ValidationAction{admissionregistrationv1.Deny},
match,
)
}
```
This preserves every bit of functionality but moves all the deeply‐nested struct plumbing into two small, shared builders.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| } | ||
| } | ||
|
|
||
| func newInstancetypeControllerRevisionsValidatingAdmissionPolicy(namespace string) *admissionregistrationv1.ValidatingAdmissionPolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider refactoring repeated struct construction into shared builder functions to reduce code duplication and improve readability.
Here’s one way to cut down on the literal noise by introducing two tiny “builder” helpers and then having your two functions just supply the bits that change:
// builder.go
func buildPolicy(
name string,
failure *admissionregistrationv1.FailurePolicyType,
match *admissionregistrationv1.MatchResources,
vals []admissionregistrationv1.Validation,
) *admissionregistrationv1.ValidatingAdmissionPolicy {
return &admissionregistrationv1.ValidatingAdmissionPolicy{
ObjectMeta: metav1.ObjectMeta{Name: name},
Spec: admissionregistrationv1.ValidatingAdmissionPolicySpec{
FailurePolicy: failure,
MatchConstraints: match,
Validations: vals,
},
}
}
func buildBinding(
name, policyName string,
actions []admissionregistrationv1.ValidationAction,
match *admissionregistrationv1.MatchResources,
) *admissionregistrationv1.ValidatingAdmissionPolicyBinding {
return &admissionregistrationv1.ValidatingAdmissionPolicyBinding{
ObjectMeta: metav1.ObjectMeta{Name: name},
Spec: admissionregistrationv1.ValidatingAdmissionPolicyBindingSpec{
PolicyName: policyName,
ValidationActions: actions,
MatchResources: match,
},
}
}Then your new functions shrink to:
func newInstancetypeControllerRevisionsValidatingAdmissionPolicy(ns string) *admissionregistrationv1.ValidatingAdmissionPolicy {
match := &admissionregistrationv1.MatchResources{
ResourceRules: []admissionregistrationv1.NamedRuleWithOperations{{
RuleWithOperations: admissionregistrationv1.RuleWithOperations{
Operations: []admissionregistrationv1.OperationType{admissionregistrationv1.Delete},
Rule: admissionregistrationv1.Rule{
APIGroups: []string{"apps"},
APIVersions: []string{"v1"},
Resources: []string{"controllerrevisions"},
},
},
}},
}
val := admissionregistrationv1.Validation{
Expression: fmt.Sprintf(instancetypeControllerRevisionsCELExpressionTemplate, ns),
MessageExpression: "'Instancetype controller revision deletion is blocked only GC/kubevirt-controller: ' + string(request.userInfo.username)",
}
return buildPolicy(instancetypeControllerRevisionsPolicyName, ptr.To(admissionregistrationv1.Fail), match, []admissionregistrationv1.Validation{val})
}
func newInstancetypeControllerRevisionsValidatingAdmissionPolicyBinding() *admissionregistrationv1.ValidatingAdmissionPolicyBinding {
match := &admissionregistrationv1.MatchResources{
ObjectSelector: &metav1.LabelSelector{MatchExpressions: []metav1.LabelSelectorRequirement{
{Key: "instancetype.kubevirt.io/object-kind", Operator: metav1.LabelSelectorOpIn, Values: []string{
"VirtualMachineClusterInstancetype", "VirtualMachineClusterPreference",
"VirtualMachineInstancetype", "VirtualMachinePreference",
}},
}},
}
return buildBinding(
instancetypeControllerRevisionsPolicyName+"-binding",
instancetypeControllerRevisionsPolicyName,
[]admissionregistrationv1.ValidationAction{admissionregistrationv1.Deny},
match,
)
}This preserves every bit of functionality but moves all the deeply‐nested struct plumbing into two small, shared builders.
lyarwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflecting on this I'm not convinced by the approach as I'm worried it's too brittle and will break backup/restore tooling.
What about having SSP itself reconcile all VMs with the delete protection label and then applying or removing it from any additional resource we also need to protect? Possibly using the objectgraph subresource API?
This would keep the VAPs simple, don't delete anything with the delete protection label while ensuring we programmatically include resources the VM depends on.
| Name: "containerdisk", | ||
| VolumeSource: kubevirtv1.VolumeSource{ | ||
| ContainerDisk: &kubevirtv1.ContainerDiskSource{ | ||
| Image: "quay.io/containerdisks/debian:latest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will a VM without any OS create the controllerRevision objects? If that's the case, we can drop the usage of any container disk, bootable volume, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was wondering if that's why you added this. The CR creation doesn't rely on the guest OS, just the VM control loop in virt-controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice point. Thanks.
Could you explain how it will break backup/restore? Won't these operations be executed by kubevirt-controller?
Yes, we could apply the same concept. However, what other objects should we protect? AFAIK, we just have
The requirements of this feature are just to protect VMs, in the past we explored what would happen if we overprotect objects like, launcher pods, VMIs, etc. and the result for instance, vms won't be able to stop, migrate, etc. |
No, not always AFAIK.
We could use the ObjectGraph to understand if there are any other objects we would also need to protect.
Reconciled by what? SSP would only watch VMs, adding the labels when a VM is updated with the label and removing them when the label is dropped from the VM.
Yeah understood, we wouldn't need to protect everything listed by the objectgraph but I still think it should be used as a potential source of truth for these objects instead of hardcoding VAPs for all objects in the cluster. |
IIUC, the idea is: If a VM is updated with the protection label, we update the associated
Fair point. As you suggested, we can programmatically include just some of them in the VAP and reconcile the dependants to include the protection label. However, IMHO, we should be safe and only include |
|
@lyarwood isn't it a generic problem, independently from the deletion protection, that a controller revision of a instancetype/preference can be deleted before the related VM is deleted? As long as there is a controller revision per VM, might there be a generic solution like having a finalizer on the controller revision, and during VM deletion the finalizer on the controller revision is removed? |
No it will not.
Yup that's fine.
Yes it can also be framed as a generic issue but there's lots of hidden complexity and cost to adding finalizers at this stage that I'd like to avoid if possible. If folks are against a solution for this in SSP I can take a deeper look but it's not going to be as simple as adding and removing a finalizer on create and delete respectively. @0xFelix, any thoughts on this? |
|
I don't see benefit in contrast to the added complexity. We should not be protecting CRs at all. Why is it not enough to clear either Are there no further admission validations for VMs that could cause the same behavior after deleting a resource on which a VM depends? |
This is a valid concern, but it only prevents the deletion of a VM in combination with this external feature. |
We will still see a failure attempting to mutate the labels after this as it's the lookup of the referenced CR that's failing. The workaround would remain the same. |
|
What about going the dependency tree upwards instead of downwards similar to the namespace deletion protection in https://openkruise.io/docs/user-manuals/deletionprotection / https://github.com/openkruise/kruise/blob/master/pkg/webhook/util/deletionprotection/deletion_protection.go#L76 ? |
The only limitation I can see is; it won't be implementable with just a VAP. A webhook will be required. VAPs can't look up of other objects rather than the one you are currently modifying. |
For a good reason, Webhooks should not depend on foreign data because there is no strong consistency in kubernets. To use finalizes on the controllerRevision there's lots of hidden complexity and cost. Might it solve the initial issue if there are new controller revisions created when to old ones are deleted, e.g. in a recocile loop? |
Maybe this is the way to go, IMHO, if the user deletes the controller revision while the VM is still running, it should be recreated. WDYT @lyarwood @0xFelix? |
|
We should avoid simply recreating the CR, as instancetypes or preferences may have changed since they were captured previously. Doing so would defeat the purpose of using a CR to begin with. |


What this PR does / why we need it:
It refactors the VAP and VAPB VM delete protection function names to be more descriptive.
Moreover, it enables a protection for instancetype and preference
controllerRevisionobjects, which ensures that they are not deleted beforeVirtualMachineobjects. This is due to the fact that in some scenarios, such as when a VM is deleted protected, if thecontrollerRevisionis deleted, but the VM object is still alive, users will not be able to patch or update the VM object. This leads to virtually deadlocking the VM object.The VAP ensures that only the Kubernetes garbage collector and kubevirt-controller are able to delete these objects:
ownerReference'd object is removed, e.g., the instancetype "test-it" withownerReference: test-vmis removed when the VM test-vm is removed. Therefore, by only allowing it to drop these objects, we ensure the deletion order.controllerRevision, it needs to delete them in upgrade scenarios, live migration, etc.Which issue(s) this PR fixes:
Fixes # CNV-69511
Special notes for your reviewer:
Release note: