Enable admission webhooks for operator with self-signed certificates#3773
Enable admission webhooks for operator with self-signed certificates#3773
Conversation
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3773 +/- ##
==========================================
+ Coverage 66.57% 66.71% +0.14%
==========================================
Files 432 441 +9
Lines 42212 43577 +1365
==========================================
+ Hits 28102 29073 +971
- Misses 11969 12248 +279
- Partials 2141 2256 +115 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
Implements self-signed certificate support for Kubernetes operator admission webhooks (issue #3360, option 2). This enables validation of VirtualMCPServer, VirtualMCPCompositeToolDefinition, and MCPExternalAuthConfig resources at admission time, preventing invalid configurations from being created. Closes: #3360 Enable admission webhooks for operator with self-signed certificates Implements self-signed certificate support for Kubernetes operator admission webhooks (issue #3360, option 2). This enables validation of VirtualMCPServer, VirtualMCPCompositeToolDefinition, and MCPExternalAuthConfig resources at admission time, preventing invalid configurations from being created. - Add self-signed X.509 certificate generation with RSA 2048-bit keys - Generate certificates at operator startup with proper DNS names - Reuse existing certificates across pod restarts - Inject CA bundle into ValidatingWebhookConfiguration automatically - Implement secure file permissions (0600 for keys, 0644 for certs) - Integrate webhook setup in main.go before manager starts - Add environment variable configuration (ENABLE_WEBHOOKS, WEBHOOK_SERVICE_NAME, WEBHOOK_SERVICE_NAMESPACE) - Generate certificates to /tmp/k8s-webhook-server/serving-certs/ - Add webhook Service exposing port 9443 - Add ValidatingWebhookConfiguration for 3 webhook endpoints - Add RBAC (ClusterRole + ClusterRoleBinding) for webhook config access - Update Deployment to expose port 9443 and mount emptyDir volume - Add operator.webhook.enabled (default: false) and operator.webhook.failurePolicy (default: Fail) values - Add comprehensive webhook validation tests using envtest - Test all 3 webhooks with valid and invalid resources - Verify webhook rejection messages are clear and actionable - Test both create and update operations - 16 test specs covering edge cases and error conditions - Add Chainsaw-based E2E tests for webhook validation in real cluster - Test webhook rejection of invalid configurations - VirtualMCPServer without required groupRef - MCPExternalAuthConfig with conflicting auth types - Includes prerequisite checks (skips gracefully if CRDs or webhooks not available) - Proper namespace lifecycle management (create/cleanup test-namespace) - E2E lifecycle workflow deploys operator with webhooks enabled - Tests run automatically with other Chainsaw tests across K8s 1.33-1.35 - Fixed API group consistency across all webhook configurations and tests (toolhive.stacklok.dev, not toolhive.stacklok.com or mcp.stacklok.com) - Fixed webhook naming conventions (virtualmcpservers.toolhive.stacklok.dev) - Fixed CA bundle injection by passing mgr.GetClient() instead of nil - Simplified webhook gating to just webhook.enabled for flexibility - Fixed test prerequisite skipping (exit 0 instead of exit 1) - Added proper namespace management in Chainsaw tests - Add operator webhook guide (cmd/thv-operator/docs/webhooks.md) - Add package documentation (cmd/thv-operator/pkg/webhook/README.md) - Add integration test README with troubleshooting guide 1. **VirtualMCPServer** - Validates groupRef, auth configs, aggregation 2. **VirtualMCPCompositeToolDefinition** - Validates workflow structure 3. **MCPExternalAuthConfig** - Validates auth type/config consistency - Unit tests: 10 tests (certificate generation and setup) - Integration tests: 16 test specs (webhook validation with envtest) - E2E tests: 4 scenarios (webhook validation in real cluster with Chainsaw) - All tests passing Webhooks can be disabled by setting: ```yaml operator: webhook: enabled: false ``` Fixes #3360 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Replace unconditional sleep with retry backoff in webhook CA injection Removes the hardcoded 2-second sleep in InjectCABundle and replaces it with exponential backoff retry logic. This improves startup time and makes the behavior more resilient. Benefits: - Fast when webhook config exists (no unnecessary delay) - Resilient when config doesn't exist yet (retries with backoff) - Context-aware with proper timeout handling - Better observability with V(1) logging for retries - Reduces test execution time Implementation: - Uses wait.ExponentialBackoffWithContext with 5 retry attempts - Starts at 100ms, doubles each retry (max ~3.1 seconds total) - Retries on NotFound and transient errors - Adds 10% jitter to prevent thundering herd All unit tests pass (10 tests) with faster execution time. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Fix webhook certificate KeyUsage to include CertSign for CA semantics The webhook certificate was marked IsCA: true but lacked the required x509.KeyUsageCertSign flag in its KeyUsage. This inconsistency can cause TLS verification failures in some TLS stacks that expect CA certificates to have proper signing capabilities. Problem: - Certificate had IsCA: true (acts as CA) - Certificate was used as CA bundle (trust root) - KeyUsage only included KeyEncipherment and DigitalSignature - Missing KeyUsageCertSign required for CA certificates Solution: Add x509.KeyUsageCertSign to the KeyUsage flags to properly align with the CA semantics. This makes the self-signed certificate valid as both a CA cert and server cert. This follows the simpler approach (option b from review) of using a single self-signed cert with proper CA key usages, rather than generating a separate CA cert + serving cert. All unit tests updated and passing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 24 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Implements self-signed certificate support for Kubernetes operator admission webhooks (issue #3360, option 2). This enables validation of VirtualMCPServer, VirtualMCPCompositeToolDefinition, and MCPExternalAuthConfig resources at admission time, preventing invalid configurations from being created.
Large PR Justification
This is an atomic and complete pr, with single functionality and complete testing. Cannot be split.
Closes: #3360