Skip to content

Conversation

@Ruclo
Copy link
Contributor

@Ruclo Ruclo commented Oct 1, 2025

What this PR does / why we need it:
Creates a separate service monitor targeting the template validator metrics and verifies the metrics service tls certificate when querying prometheus metrics.

Which issue(s) this PR fixes:

Fixes CNV-55455

Release note:

NONE

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Oct 1, 2025
@openshift-ci
Copy link

openshift-ci bot commented Oct 1, 2025

Hi @Ruclo. Thanks for your PR.

I'm waiting for a kubevirt member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 1, 2025
return &promv1.SecretOrConfigMap{
ConfigMap: &v1.ConfigMapKeySelector{
LocalObjectReference: v1.LocalObjectReference{
Name: "openshift-service-ca.crt",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make name and key a const? Also, this introduces further hard dependencies to OCP. Is there no way to make it configurable?

},
},
}
tlsConfig.ServerName = ptr.To("virt-template-validator.kubevirt.svc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read that from the cert to be sure? The namespace can vary.

return &promv1.SecretOrConfigMap{
Secret: &v1.SecretKeySelector{
LocalObjectReference: v1.LocalObjectReference{
Name: "ssp-operator-service-cert",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names of objects that are not reconciled by SSP should be configurable.

Comment on lines 34 to 39
// Variable to store OLM deployment info (set from main)
var isOLMDeployment bool

func SetOLMDeployment(isOLM bool) {
isOLMDeployment = isOLM
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a global variable is an antipattern. Can you instead pass the value into functions that need it?


// getCAConfigForServiceMonitor returns the appropriate CA configuration
func getCAConfigForServiceMonitor() *promv1.SecretOrConfigMap {
if isOLMDeployment {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pass this global as a parameter to this function?

}

func newSspServiceMonitor(namespace string) *promv1.ServiceMonitor {
certPath := filepath.Join(DefaultCertsDirectory, CertFilename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding a path like this makes this code hard to unit test. The path will not exist on the machine where tests run. Can you pass the hostname as a parameter to this function?

@Ruclo Ruclo force-pushed the metrics-tls branch 4 times, most recently from 5d3cb20 to 1893326 Compare October 20, 2025 08:09
Copy link
Collaborator

@akrejcir akrejcir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good. I mostly have code style comments.

The CI is failing, because some files need to be regenerated using make bundle.

Can you please write more information into the commit message? A good commit message explains the reason why is the change done, not only what is changed.
It will be useful when someone looks at git history, to find out why a code is written in a certain way.

PrometheusServiceAccountName = "prometheus-k8s"
MetricsPortName = "http-metrics"

TemplateValidatorMetricsServiceName = "template-validator-metrics"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use this constant also in template-validator/resources.go file, so that we are sure that it is the same string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above comment still applies.

The string template-validator-metrics is used to connect 2 resources in the cluster: Service and ServiceMonitor. It is a good code style if the 2 strings are also the same constant in the code. Then we can be sure, that there is not a typo in one of them.

Can you extract the constant to a different package, so it does not cause import loop?

MetricsPortName = "http-metrics"

TemplateValidatorMetricsServiceName = "template-validator-metrics"
MetricsServiceName = "ssp-operator-metrics"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use this constant also in controllers/services_contorller.go.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same reason as in my comment above.

Comment on lines 28 to 31
ServiceCABundle = "openshift-service-ca.crt"
ServiceCABUndleKey = "service-ca.crt"
OLMManagedCert = "ssp-operator-service-cert"
OLMManagedCertKey = "olmCAKey"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 4 constants don't need to be exported, because they are only used in a single function in this file.
Can you move them to the functions where they are used?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still valid.

}
}

func olmManagedCABundle() promv1.SecretOrConfigMap {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you inline this function? It is only used in one place, and it is short.


func olmManagedCABundle() promv1.SecretOrConfigMap {
return promv1.SecretOrConfigMap{
Secret: &v1.SecretKeySelector{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment, that OLM uses a self-signed CA, and that it is not available in a ConfigMap.
So that future developers know why we use a secret.

@Ruclo Ruclo force-pushed the metrics-tls branch 2 times, most recently from 596b681 to f527836 Compare October 29, 2025 12:23
@Ruclo Ruclo marked this pull request as ready for review October 30, 2025 09:11
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2025
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • There’s a typo in ServiceCABUndleKey (should be ServiceCABundleKey) and you may want to correct it to avoid confusion.
  • The TLSConfig construction in newValidatorServiceMonitor and newSspServiceMonitor is almost identical—consider extracting a shared helper to reduce duplication.
  • extractHostnameFromCert references sdkTLSDir which no longer exists—ensure you introduce or update the constant/path so the code compiles and reads the intended cert.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There’s a typo in ServiceCABUndleKey (should be ServiceCABundleKey) and you may want to correct it to avoid confusion.
- The TLSConfig construction in newValidatorServiceMonitor and newSspServiceMonitor is almost identical—consider extracting a shared helper to reduce duplication.
- extractHostnameFromCert references sdkTLSDir which no longer exists—ensure you introduce or update the constant/path so the code compiles and reads the intended cert.

## Individual Comments

### Comment 1
<location> `internal/operands/metrics/resources.go:110` </location>
<code_context>
+	)
+}
+
+func newValidatorServiceMonitor(request common.Request) *promv1.ServiceMonitor {
+	tlsConfig := &promv1.TLSConfig{
+		SafeTLSConfig: promv1.SafeTLSConfig{
</code_context>

<issue_to_address>
**issue (complexity):** Consider refactoring the three ServiceMonitor constructors into a single, parameterized function to reduce duplication and centralize logic.

Here’s one way to collapse all three builders into a single, parameterized constructor.  You keep the same inputs (name, namespace, label sets, port, CA bundle, serverName), but only write the service‐monitor shape once:

```go
// newServiceMonitor constructs any ServiceMonitor.
//  - labels: service labels to merge with the defaults (cluster-monitoring & k8s-app)
//  - selector: the MatchLabels for the ServiceMonitor selector
//  - port: the metrics port name
//  - ca:   the CA bundle source
//  - serverName: optional server name override
func newServiceMonitor(
    name, namespace string,
    labels, selector map[string]string,
    port string,
    ca promv1.SecretOrConfigMap,
    serverName *string,
) *promv1.ServiceMonitor {
    return &promv1.ServiceMonitor{
        ObjectMeta: metav1.ObjectMeta{
            Name:      name,
            Namespace: namespace,
            Labels:    serviceMonitorLabels(labels),
        },
        Spec: promv1.ServiceMonitorSpec{
            NamespaceSelector: promv1.NamespaceSelector{Any: true},
            Selector:          metav1.LabelSelector{MatchLabels: selector},
            Endpoints: []promv1.Endpoint{{
                Port:   port,
                Scheme: "https",
                TLSConfig: &promv1.TLSConfig{
                    SafeTLSConfig: promv1.SafeTLSConfig{
                        CA:         ca,
                        ServerName: serverName,
                    },
                },
                HonorLabels: true,
            }},
        },
    }
}
```

Then each specialized builder becomes a one‐liner:

```go
func newValidatorServiceMonitor(req common.Request) *promv1.ServiceMonitor {
    svcLabels := template_validator.PrometheusServiceLabels()
    return newServiceMonitor(
        template_validator.MetricsServiceName,
        req.Namespace,
        svcLabels,
        svcLabels,
        template_validator.MetricsPortName,
        serviceCABundle(),
        ptr.To(fmt.Sprintf("%s.%s.svc", template_validator.VirtTemplateValidator, req.Namespace)),
    )
}

func newSspServiceMonitor(req common.Request) *promv1.ServiceMonitor {
    svcLabels := resources.PrometheusServiceLabels()
    return newServiceMonitor(
        resources.MetricsServiceName,
        req.Namespace,
        svcLabels,
        svcLabels,
        resources.MetricsPortName,
        getCAConfigForServiceMonitor(req.OLMDeployment),
        ptr.To(req.SSPServiceHostname),
    )
}
```

And you can drop the old `newServiceMonitorCR`, `TemplateValidatorServiceMonitorLabels` and `SspServiceMonitorLabels` helpers completely—everything funnels through that single `newServiceMonitor`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Collaborator

@akrejcir akrejcir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only done partial review.

Can you please put the refactoring commit that fixes import cycle before the commit that implements new functionality, or can you merge the 2 commits?

A commit should be a code change that compiles and preferably passes tests. So having a second commit that fixes the first commit is unusual.

Can you also add more text to the commit messages? Mainly in the first commit, the message should explain why is the symlink removed.

Comment on lines 76 to 80
func PrometheusServiceLabels(serviceName string) map[string]string {
return map[string]string{
PrometheusLabelKey: serviceName,
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is trivial, and can be inlined. PrometheusLabelKey is exported variable, so it can be accessed from other packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps exporting only the function would be better? i also renamed the function to MetricsServiceLabels().

Comment on lines 62 to 72
func serviceMonitorLabels(serviceLabels map[string]string) map[string]string {
labels := map[string]string{
"openshift.io/cluster-monitoring": "true",
PrometheusLabelKey: PrometheusLabelValue,
"k8s-app": "kubevirt",
}

for k, v := range serviceLabels {
labels[k] = v
}

return labels
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessarily complex. Can you use the same function that was here, but make the PrometheusLabelValue a parameter?

return common.PrometheusServiceLabels(MetricsServiceName)
}

func SspMetricsService(namespace string) *v1.Service {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only used in one file, and a functional test file. I think it would not cause an import cycle, if it is left in the services_controller.go where it is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it causes an import cycle so i had to extract it out of the controllers package.

const (
GoldenImagesNamespace = "kubevirt-os-images"
DefaultOperatorVersion = "devel"
SspDeploymentName = "ssp-operator"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant is only used in one place and does not need to be here.

This file contains only constants that change value in the build scripts for downstream. The name of SSP deployment is the same in upstream and downstream, so this constant doesn't need to be here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now, it's used in more places. I think this constant could be extracted somewhere else if this file is not adequate.

@Ruclo Ruclo force-pushed the metrics-tls branch 2 times, most recently from a7edeb2 to 884d98e Compare November 5, 2025 09:57

ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))

err := createCertificateSymlinks()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a text to the commit message explaining why the symlinks are not needed anymore?
It is not obvious to me, and it would be helpful for others, when they look at commit history.

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/utils/ptr"

"kubevirt.io/ssp-operator/internal/common"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline above this line to split imports into 3 blocks.

PrometheusServiceAccountName = "prometheus-k8s"
MetricsPortName = "http-metrics"

TemplateValidatorMetricsServiceName = "template-validator-metrics"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above comment still applies.

The string template-validator-metrics is used to connect 2 resources in the cluster: Service and ServiceMonitor. It is a good code style if the 2 strings are also the same constant in the code. Then we can be sure, that there is not a typo in one of them.

Can you extract the constant to a different package, so it does not cause import loop?

MetricsPortName = "http-metrics"

TemplateValidatorMetricsServiceName = "template-validator-metrics"
MetricsServiceName = "ssp-operator-metrics"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same reason as in my comment above.

Comment on lines 28 to 31
ServiceCABundle = "openshift-service-ca.crt"
ServiceCABUndleKey = "service-ca.crt"
OLMManagedCert = "ssp-operator-service-cert"
OLMManagedCertKey = "olmCAKey"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still valid.

}
tlsConfig.ServerName = ptr.To(request.SSPServiceHostname)

serviceMonitor := newServiceMonitor(rules.RuleName, request.Namespace, tlsConfig, metav1.LabelSelector{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rules.RuleName us not a good name for ServiceMonitor. Rules are not related to metrics collection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the Service monitor can be named after the service, just as it's done in the template validator.

@Ruclo
Copy link
Contributor Author

Ruclo commented Nov 10, 2025

I tried to extract the related functionality in the third commit that was part of this branch. Was this a bad thing ? Do you mind giving me a review on that ?

@akrejcir
Copy link
Collaborator

I tried to extract the related functionality in the third commit that was part of this branch. Was this a bad thing ? Do you mind giving me a review on that ?

I meant to only extract the constants to a separate file, and use it in multiple files. Like this:
akrejcir@1525fa0

In the previous commit, you've extracted a function, but that was not needed, because it was only used in a single file.

return cert.Subject.CommonName, nil
}

return cert.DNSNames[0], nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommonName and DNSNames[0] should match, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they should match, can this be kept in place as a fallback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a check to verify that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do that? (CommonName == DNSNames[0])

Copy link
Collaborator

@akrejcir akrejcir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only a few code style comments. Otherwise it looks good.

Can you also please add more text to the second commit message?

Comment on lines 26 to 29
ServiceCABundle = "openshift-service-ca.crt"
ServiceCABUndleKey = "service-ca.crt"
OLMManagedCert = "ssp-operator-service-cert"
OLMManagedCertKey = "olmCAKey"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants are only used in one function in this file, so they don't need to be exported.
Can you please inline them?

Comment on lines 8 to 10
SspOperatorMetricsServiceName = "ssp-operator-metrics"
TemplateValidatorMetricsServiceName = "template-validator-metrics"
VirtTemplateValidator = "virt-template-validator"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move these constants to a different file.
This file only contains constants that are changed by the build process downstream. We use a simple sed command to change the values, so it would be best if there are small number of constants here.


. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"kubevirt.io/ssp-operator/internal"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this import to the last block.


. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"kubevirt.io/ssp-operator/internal"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above.


. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"kubevirt.io/ssp-operator/internal"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above.

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

return cert.Subject.CommonName, nil
}

return cert.DNSNames[0], nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do that? (CommonName == DNSNames[0])

}
}

func createCertificateSymlinks() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we test CSV deployment in the upstream lanes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rucle Can you please make sure this will not break stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the metrics, both when manually deployed and also OLM-deployed and the metrics were correctly reported in both scenarios.

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 24, 2025
@akrejcir
Copy link
Collaborator

The CI failures are unrelated. Recently golangci-lint either OOMs or takes too long and times out. I will look into it.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2025
The certificates get mounted to the correct mount points in the filesystem,
Symlink creation is unnecessary.
Signed-off-by: Michal Vavrinec <[email protected]>
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 8, 2025
@akrejcir
Copy link
Collaborator

akrejcir commented Dec 8, 2025

Can you update the failing functional tests, and include a new test for the second ServiceMonitor?

This commit implements TLS encryption for both SSP operator and template validator metrics services.
The implementation handles both OLM and non-OLM deployments with proper certificate management.

Changes:
- Add --olm-deployment flag to distinguish OLM-managed deployments
- Create separate ServiceMonitors for SSP and template validator metrics
- Configure TLS settings with appropriate CA bundles based on deployment type
- Move common service names to constants package for reusability
- Update ServiceMonitor selector labels to differentiate between services

Technical details:
- Non-OLM deployments use service-ca.crt from ConfigMap
- OLM deployments use ssp-operator-service-cert Secret with olmCAKey

Signed-off-by: Michal Vavrinec <[email protected]>
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 9, 2025

@akrejcir
Copy link
Collaborator

akrejcir commented Dec 9, 2025

Looks good now.

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akrejcir

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 9, 2025
@akrejcir
Copy link
Collaborator

/cc @0xFelix @jcanocan

@kubevirt-bot kubevirt-bot requested a review from 0xFelix December 12, 2025 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants