-
Notifications
You must be signed in to change notification settings - Fork 769
Fixed Android certificate enrollment failures caused by SCEP challenge expiration when devices were offline. #38753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…e expiration when devices were offline.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
WalkthroughThis PR addresses Android certificate enrollment failures when SCEP challenges expire during offline periods. It refactors the certificate template delivery workflow to generate fleet challenges on-demand rather than upfront, introduces transactional challenge creation, and simplifies the template transition logic. Changes
Sequence DiagramsequenceDiagram
actor Device as Android Device
participant Server as Fleet Server
participant DB as Database
participant SCEP as SCEP Proxy
Note over Device,SCEP: On-Demand Challenge Creation Flow
Device->>Server: GET /device/certificate-template
Server->>DB: Query host_certificate_templates (status=Delivered)
DB-->>Server: Return template with FleetChallenge=nil
alt FleetChallenge is nil
Server->>DB: BEGIN TRANSACTION
Server->>DB: SELECT fleet_challenge FOR UPDATE
DB-->>Server: Confirm nil
Server->>DB: INSERT INTO challenges (value, created_at)
DB-->>Server: Challenge created
Server->>DB: UPDATE host_certificate_templates SET fleet_challenge=?
DB-->>Server: Updated
Server->>DB: COMMIT
DB-->>Server: Transaction committed
end
Server-->>Device: Return certificate template with FleetChallenge
Device->>SCEP: SCEP enrollment request with FleetChallenge
SCEP-->>Device: Certificate issued
Note over Server: If challenge had expired:<br/>SCEP proxy catches error,<br/>future GET will regenerate
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 4❌ Failed checks (2 warnings, 2 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@server/datastore/mysql/host_certificate_templates.go`:
- Around line 667-723: GetOrCreateFleetChallengeForCertificateTemplate currently
returns any non-empty fleet_challenge from host_certificate_templates without
verifying the challenge still exists in the challenges table, causing
orphaned/stale values after CleanupExpiredChallenges removes expired rows;
update GetOrCreateFleetChallengeForCertificateTemplate to validate that an
existing fleet_challenge value corresponds to a live row in the challenges table
(e.g., query challenges for the token or call ConsumeChallenge-like verification
inside the same transaction) and if the challenge is missing/expired, clear
fleet_challenge (UPDATE host_certificate_templates SET fleet_challenge = NULL,
updated_at = NOW()) and proceed to create a new challenge via newChallenge and
store it atomically; additionally ensure any code paths that resend profiles on
ConsumeChallenge failure also clear the stale fleet_challenge in
host_certificate_templates to prevent leakage.
| // GetOrCreateFleetChallengeForCertificateTemplate ensures a fleet challenge exists for the given | ||
| // host and certificate template. If a challenge already exists in host_certificate_templates, | ||
| // it returns it. If not, it creates a new one atomically and stores it in both the challenges | ||
| // table (for validation) and host_certificate_templates (for retrieval). | ||
| // This method only works for templates in 'delivered' status. | ||
| func (ds *Datastore) GetOrCreateFleetChallengeForCertificateTemplate( | ||
| ctx context.Context, | ||
| hostUUID string, | ||
| certificateTemplateID uint, | ||
| ) (string, error) { | ||
| var challenge string | ||
| err := ds.withRetryTxx(ctx, func(tx sqlx.ExtContext) error { | ||
| // Check if challenge already exists using FOR UPDATE to prevent race conditions | ||
| var existingChallenge sql.NullString | ||
| err := sqlx.GetContext(ctx, tx, &existingChallenge, fmt.Sprintf(` | ||
| SELECT fleet_challenge | ||
| FROM host_certificate_templates | ||
| WHERE host_uuid = ? AND certificate_template_id = ? AND status = '%s' AND operation_type = '%s' | ||
| FOR UPDATE | ||
| `, fleet.CertificateTemplateDelivered, fleet.MDMOperationTypeInstall), hostUUID, certificateTemplateID) | ||
| if err != nil { | ||
| if errors.Is(err, sql.ErrNoRows) { | ||
| return ctxerr.Wrap(ctx, notFound("HostCertificateTemplate"), "template not found or not in delivered status") | ||
| } | ||
| return ctxerr.Wrap(ctx, err, "check existing challenge") | ||
| } | ||
|
|
||
| // If challenge exists and is non-empty, return it | ||
| if existingChallenge.Valid && existingChallenge.String != "" { | ||
| challenge = existingChallenge.String | ||
| return nil | ||
| } | ||
|
|
||
| // Create new challenge using the transaction | ||
| newChal, err := newChallenge(ctx, tx) | ||
| if err != nil { | ||
| return ctxerr.Wrap(ctx, err, "create challenge") | ||
| } | ||
|
|
||
| // Update host_certificate_templates with the challenge | ||
| if _, err := tx.ExecContext(ctx, fmt.Sprintf(` | ||
| UPDATE host_certificate_templates | ||
| SET fleet_challenge = ?, updated_at = NOW() | ||
| WHERE host_uuid = ? AND certificate_template_id = ? AND status = '%s' AND operation_type = '%s' | ||
| `, fleet.CertificateTemplateDelivered, fleet.MDMOperationTypeInstall), newChal, hostUUID, certificateTemplateID); err != nil { | ||
| return ctxerr.Wrap(ctx, err, "update fleet_challenge in host_certificate_templates") | ||
| } | ||
|
|
||
| challenge = newChal | ||
| return nil | ||
| }) | ||
|
|
||
| if err != nil { | ||
| return "", err | ||
| } | ||
| return challenge, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for logic that clears or resets fleet_challenge
rg -n "fleet_challenge\s*=\s*(NULL|''" -g '*.go' -A 2 -B 2Repository: fleetdm/fleet
Length of output: 160
🏁 Script executed:
#!/bin/bash
# Search for ResendHostCertificateTemplate and related functions
rg -n "ResendHostCertificateTemplate|CleanupExpiredChallenges|handleFleetChallenge" -g '*.go' -A 5Repository: fleetdm/fleet
Length of output: 5499
🏁 Script executed:
#!/bin/bash
# Search for where host_certificate_templates is updated/cleared
rg -n "UPDATE host_certificate_templates" -g '*.go' -A 2 -B 2Repository: fleetdm/fleet
Length of output: 7173
🏁 Script executed:
#!/bin/bash
# Check the challenges table cleanup and expiration logic
rg -n "challenges" -g '*.go' -A 3 | grep -E "(DELETE|expir|Cleanup|clean)" -i | head -20Repository: fleetdm/fleet
Length of output: 1916
🏁 Script executed:
#!/bin/bash
# Look at the ConsumeChallenge implementation to understand how challenges are used
rg -n "ConsumeChallenge" -g '*.go' -A 10 | head -50Repository: fleetdm/fleet
Length of output: 3797
🏁 Script executed:
#!/bin/bash
# Check if fleet_challenge is ever cleared/reset in host_certificate_templates
rg -n "fleet_challenge.*NULL|fleet_challenge.*''" -g '*.go' --multiline -A 2 -B 2Repository: fleetdm/fleet
Length of output: 1955
🏁 Script executed:
#!/bin/bash
# Look for how the challenge flow works - check Resend and profile delivery
rg -n "ResendHostCertificateTemplate" -g '*.go' -A 15Repository: fleetdm/fleet
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Search for Resend in certificate template context
rg -n "Resend.*[Cc]ertificate|resend.*template" -g '*.go' -i | head -20Repository: fleetdm/fleet
Length of output: 2485
🏁 Script executed:
#!/bin/bash
# Check the implementation of ConsumeChallenge to see what it does
rg -n "ConsumeChallenge" -g '*.go' -A 15 | grep -A 15 "func.*ConsumeChallenge"Repository: fleetdm/fleet
Length of output: 2724
🏁 Script executed:
#!/bin/bash
# Look for where fleet_challenge is set and how the challenge lifecycle works
rg -n "fleet_challenge\s*=" -g '*.go' -B 3 -A 3Repository: fleetdm/fleet
Length of output: 1722
🏁 Script executed:
#!/bin/bash
# Get the full ResendHostCertificateProfile implementation
rg -n "func.*ResendHostCertificateProfile" -g '*.go' -A 30 | head -80Repository: fleetdm/fleet
Length of output: 4388
🏁 Script executed:
#!/bin/bash
# Check if there's any mechanism that clears fleet_challenge when a challenge is consumed
rg -n "fleet_challenge.*NULL" -g '*.go' -B 5 -A 5Repository: fleetdm/fleet
Length of output: 3531
🏁 Script executed:
#!/bin/bash
# Look at the full ConsumeChallenge implementation
rg -n "func \(ds \*Datastore\) ConsumeChallenge" -g '*.go' -A 50Repository: fleetdm/fleet
Length of output: 4061
🏁 Script executed:
#!/bin/bash
# Look at the handleFleetChallenge implementation to see full flow
rg -n "func.*handleFleetChallenge" -g '*.go' -A 40Repository: fleetdm/fleet
Length of output: 2936
🏁 Script executed:
#!/bin/bash
# Check if there's any code that clears fleet_challenge after ConsumeChallenge
rg -n "ConsumeChallenge" -g '*.go' -B 5 -A 15 | grep -A 15 "ConsumeChallenge"Repository: fleetdm/fleet
Length of output: 24079
🏁 Script executed:
#!/bin/bash
# Check if fleet_challenge is cleared when status transitions
rg -n "status.*=.*pending|status.*=.*NULL" -g 'host_certificate_templates.go' -B 3 -A 3Repository: fleetdm/fleet
Length of output: 1561
🏁 Script executed:
#!/bin/bash
# Check migrations to see the column definition and any reset logic
cat server/datastore/mysql/migrations/tables/20251209221850_HostCertificateTemplatesStatusDefault.goRepository: fleetdm/fleet
Length of output: 938
🏁 Script executed:
#!/bin/bash
# Look for any code that updates fleet_challenge to NULL or clears it
rg -n "fleet_challenge" -g '*.go' | grep -i "update\|clear\|null\|delete" | head -20Repository: fleetdm/fleet
Length of output: 943
🏁 Script executed:
#!/bin/bash
# Look at the GetOrCreateFleetChallengeForCertificateTemplate more carefully to see what status it requires
cat -n server/datastore/mysql/host_certificate_templates.go | sed -n '667,723p'Repository: fleetdm/fleet
Length of output: 2787
Stale challenge values remain in host_certificate_templates after expiration.
GetOrCreateFleetChallengeForCertificateTemplate returns any non-empty fleet_challenge without confirming it still exists in the challenges table. When CleanupExpiredChallenges removes expired entries from challenges, the corresponding fleet_challenge values persist in host_certificate_templates. While the SCEP handler recovers by resending the profile when ConsumeChallenge fails, stale challenges accumulate as orphaned data. Consider clearing fleet_challenge from host_certificate_templates as part of the expiration or resend process to avoid this leakage.
🤖 Prompt for AI Agents
In `@server/datastore/mysql/host_certificate_templates.go` around lines 667 - 723,
GetOrCreateFleetChallengeForCertificateTemplate currently returns any non-empty
fleet_challenge from host_certificate_templates without verifying the challenge
still exists in the challenges table, causing orphaned/stale values after
CleanupExpiredChallenges removes expired rows; update
GetOrCreateFleetChallengeForCertificateTemplate to validate that an existing
fleet_challenge value corresponds to a live row in the challenges table (e.g.,
query challenges for the token or call ConsumeChallenge-like verification inside
the same transaction) and if the challenge is missing/expired, clear
fleet_challenge (UPDATE host_certificate_templates SET fleet_challenge = NULL,
updated_at = NOW()) and proceed to create a new challenge via newChallenge and
store it atomically; additionally ensure any code paths that resend profiles on
ConsumeChallenge failure also clear the stale fleet_challenge in
host_certificate_templates to prevent leakage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai this feels a bit outside the scope of this fix. If the challenge is stale/expired, then Android will fail to get a cert. The server will then resend it manually or automatically as part of future work: #37556
I don't see how we're accumulating orphan rows here. The rows in host_certificate_templates will still be valid to show the status of the certs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@getvictor, you're right—this is appropriate scope management for this PR. The functional issue (enrollment recovery when challenges expire) is addressed by the resend logic in #37556.
To clarify: I was referring to stale challenge values persisting in the fleet_challenge column (a data cleanliness concern), not orphan rows in host_certificate_templates (the rows themselves are valid and correctly show cert status). Since the resend flow will handle the functional recovery and this is tracked for future work, deferring the challenge-clearing logic makes sense.
Thanks for the clarification! 👍
🧠 Learnings used
Learnt from: getvictor
Repo: fleetdm/fleet PR: 30589
File: ee/server/service/hostidentity/depot/depot.go:104-119
Timestamp: 2025-07-08T16:06:54.576Z
Learning: In ee/server/service/hostidentity/depot/depot.go, the security concern where shared challenges allow certificate revocation (lines 104-119) is a known issue that will be addressed in a later feature, not an immediate concern to fix.
Learnt from: getvictor
Repo: fleetdm/fleet PR: 30589
File: server/datastore/mysql/schema.sql:501-517
Timestamp: 2025-07-07T22:21:15.748Z
Learning: In the host_identity_scep_certificates table schema, the VARBINARY(100) size for public_key_raw, the nullable host_id without a foreign key constraint, and the use of plain DATETIME instead of DATETIME(6) are intentional design decisions, not issues to be addressed.
Learnt from: getvictor
Repo: fleetdm/fleet PR: 36139
File: android/app/src/main/java/com/fleetdm/agent/scep/ScepClientImpl.kt:75-76
Timestamp: 2025-11-26T18:58:18.865Z
Learning: In Fleet's Android MDM agent SCEP implementation (android/app/src/main/java/com/fleetdm/agent/scep/ScepClientImpl.kt), OptimisticCertificateVerifier is intentionally used because: (1) SCEP URL is provided by authenticated MDM server, (2) challenge password authenticates enrollment, (3) enterprise SCEP servers use internal CAs not in system trust stores, (4) enrolled certificate is validated when used.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #38753 +/- ##
=======================================
Coverage 66.07% 66.07%
=======================================
Files 2414 2414
Lines 192710 192735 +25
Branches 8488 8488
=======================================
+ Hits 127329 127346 +17
- Misses 53818 53820 +2
- Partials 11563 11569 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses Android certificate enrollment failures when devices were offline long enough for pre-generated SCEP fleet challenges to expire, by switching to generating challenges on-demand.
Changes:
- Stop generating fleet challenges during Android certificate template delivery; transition templates to
deliveredwithout challenges. - Generate the fleet challenge on-demand when the Android device fetches its certificate template.
- Add datastore support + tests for atomically creating (or returning) a fleet challenge for a delivered certificate template.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| server/service/certificates.go | Creates fleet challenge on-demand when a device fetches a delivered certificate template. |
| server/mdm/android/service/service.go | Removes pre-generation of challenges during delivery; only transitions templates to delivered. |
| server/datastore/mysql/host_certificate_templates.go | Updates delivery transition API and adds GetOrCreateFleetChallengeForCertificateTemplate implementation. |
| server/datastore/mysql/challenges.go | Refactors challenge creation into a helper to support transactional creation. |
| server/fleet/datastore.go | Updates datastore interface for new delivery signature and adds GetOrCreateFleetChallengeForCertificateTemplate. |
| server/mock/datastore_mock.go | Updates mock datastore interface for signature change and new API. |
| server/mdm/android/service/profiles_test.go | Updates tests to expect nil fleet_challenge until device fetch. |
| server/datastore/mysql/host_certificate_templates_test.go | Updates state-machine test and adds tests for GetOrCreateFleetChallengeForCertificateTemplate. |
| changes/37651-Android-fleet-challenge | Adds user-visible changelog entry for the fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| query, args, err := sqlx.In(fmt.Sprintf(` | ||
| UPDATE host_certificate_templates | ||
| SET | ||
| status = '%s', | ||
| fleet_challenge = %s, | ||
| updated_at = NOW() | ||
| WHERE | ||
| host_uuid = ? AND | ||
| status = '%s' AND | ||
| certificate_template_id IN (%s) | ||
| `, fleet.CertificateTemplateDelivered, caseStmt.String(), fleet.CertificateTemplateDelivering, strings.Join(inPlaceholders, ",")) | ||
| certificate_template_id IN (?) | ||
| `, fleet.CertificateTemplateDelivered, fleet.CertificateTemplateDelivering), hostUUID, templateIDs) |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TransitionCertificateTemplatesToDelivered updates status to "delivered" but leaves any existing fleet_challenge value untouched. If a template previously had a fleet_challenge (e.g., from an earlier delivery attempt), that stale value will prevent a fresh on-demand challenge from being created and can reintroduce expired/consumed-challenge failures. Consider explicitly clearing fleet_challenge (set NULL/empty) as part of this transition so each delivery starts without a cached challenge.
| // If challenge exists and is non-empty, return it | ||
| if existingChallenge.Valid && existingChallenge.String != "" { | ||
| challenge = existingChallenge.String | ||
| return nil |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetOrCreateFleetChallengeForCertificateTemplate returns an existing non-empty fleet_challenge without verifying that it still exists (and is still within TTL) in the challenges table. If the challenge was already consumed or expired/cleaned up, this will keep returning an invalid challenge and the device will never recover. To match the "ensure challenge exists" contract, validate the stored challenge against the challenges table (and TTL) before returning it; if missing/expired, generate a new challenge and update host_certificate_templates accordingly.
| // If challenge exists and is non-empty, return it | |
| if existingChallenge.Valid && existingChallenge.String != "" { | |
| challenge = existingChallenge.String | |
| return nil | |
| // If challenge exists and is non-empty, validate it against the challenges table | |
| if existingChallenge.Valid && existingChallenge.String != "" { | |
| var count int | |
| if err := sqlx.GetContext(ctx, tx, &count, ` | |
| SELECT COUNT(*) | |
| FROM challenges | |
| WHERE challenge = ? AND expires_at > NOW() | |
| `, existingChallenge.String); err != nil { | |
| return ctxerr.Wrap(ctx, err, "verify existing challenge") | |
| } | |
| // If the challenge still exists and is within TTL, reuse it | |
| if count > 0 { | |
| challenge = existingChallenge.String | |
| return nil | |
| } | |
| // Otherwise, fall through to create a new challenge and update host_certificate_templates |
| // On-demand challenge creation for delivered status. | ||
| // If FleetChallenge is nil or empty, create one now (the challenge TTL starts from this moment). | ||
| if certificate.Status == fleet.CertificateTemplateDelivered { | ||
| if certificate.FleetChallenge == nil || *certificate.FleetChallenge == "" { | ||
| challenge, err := svc.ds.GetOrCreateFleetChallengeForCertificateTemplate(ctx, host.UUID, id) | ||
| if err != nil { | ||
| return nil, ctxerr.Wrap(ctx, err, "create fleet challenge on-demand") | ||
| } | ||
| certificate.FleetChallenge = &challenge | ||
| } |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This endpoint only creates a challenge when FleetChallenge is nil/empty. If host_certificate_templates.fleet_challenge is set to a stale value (e.g., consumed/expired and no longer present in the challenges table), the handler will keep returning that value and will never call GetOrCreateFleetChallengeForCertificateTemplate to recover. Consider always calling the datastore "get-or-create" for delivered templates (and letting it validate/refresh), or otherwise ensuring stale challenges are cleared/ignored before responding.
| // On-demand challenge creation for delivered status. | |
| // If FleetChallenge is nil or empty, create one now (the challenge TTL starts from this moment). | |
| if certificate.Status == fleet.CertificateTemplateDelivered { | |
| if certificate.FleetChallenge == nil || *certificate.FleetChallenge == "" { | |
| challenge, err := svc.ds.GetOrCreateFleetChallengeForCertificateTemplate(ctx, host.UUID, id) | |
| if err != nil { | |
| return nil, ctxerr.Wrap(ctx, err, "create fleet challenge on-demand") | |
| } | |
| certificate.FleetChallenge = &challenge | |
| } | |
| // On-demand challenge creation/refresh for delivered status. | |
| // Always obtain the challenge from the datastore so that stale values are refreshed. | |
| if certificate.Status == fleet.CertificateTemplateDelivered { | |
| challenge, err := svc.ds.GetOrCreateFleetChallengeForCertificateTemplate(ctx, host.UUID, id) | |
| if err != nil { | |
| return nil, ctxerr.Wrap(ctx, err, "create or refresh fleet challenge on-demand") | |
| } | |
| certificate.FleetChallenge = &challenge |
|
|
||
| // GetOrCreateFleetChallengeForCertificateTemplate ensures a fleet challenge exists for the given | ||
| // host and certificate template. If a challenge already exists, it returns it. If not, it creates | ||
| // a new one atomically. Only works for templates in 'delivered' status. |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface comment for GetOrCreateFleetChallengeForCertificateTemplate says it "only works for templates in 'delivered' status", but the MySQL implementation also requires operation_type='install'. Update the docstring to reflect the install-only behavior (or relax the implementation if other operation types should be supported).
| // a new one atomically. Only works for templates in 'delivered' status. | |
| // a new one atomically. Only works for templates in 'delivered' status and with operation_type 'install'. |
Related issue: Resolves #37651
Switched to issue the SCEP fleet challenge on demand instead of ahead of time.
Checklist for submitter
changes/,orbit/changes/oree/fleetd-chrome/changes.Testing
Summary by CodeRabbit
Bug Fixes
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.