Skip to content

Conversation

@getvictor
Copy link
Member

@getvictor getvictor commented Jan 25, 2026

Related issue: Resolves #37651

Switched to issue the SCEP fleet challenge on demand instead of ahead of time.

Checklist for submitter

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.

Testing

  • Added/updated automated tests
  • QA'd all new/changed functionality manually

Summary by CodeRabbit

  • Bug Fixes

    • Resolved Android certificate enrollment failures caused by SCEP challenge expiration during offline periods, improving enrollment reliability when devices lack connectivity.
  • Improvements

    • Certificate challenges are now generated on-demand when requested by devices, rather than pre-generated, enhancing offline enrollment support.

✏️ Tip: You can customize this high-level summary in your review settings.

@getvictor
Copy link
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 25, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 25, 2026

Walkthrough

This PR addresses Android certificate enrollment failures when SCEP challenges expire during offline periods. It refactors the certificate template delivery workflow to generate fleet challenges on-demand rather than upfront, introduces transactional challenge creation, and simplifies the template transition logic.

Changes

Cohort / File(s) Summary
Challenge Generation Infrastructure
server/datastore/mysql/challenges.go
Introduces internal helper newChallenge(ctx, exec) enabling transactional challenge creation by accepting a custom executor instead of always using ds.writer(ctx).
Certificate Template Datastore
server/datastore/mysql/host_certificate_templates.go
Changes TransitionCertificateTemplatesToDelivered signature from accepting challenges map[uint]string to templateIDs []uint, removing challenge-setting logic. Adds new public method GetOrCreateFleetChallengeForCertificateTemplate for atomically creating challenges on-demand within a transaction using FOR UPDATE to prevent races.
Interface Definition
server/fleet/datastore.go
Updates interface signatures matching implementation changes: TransitionCertificateTemplatesToDelivered parameter and new GetOrCreateFleetChallengeForCertificateTemplate method declaration.
Service Layer
server/mdm/android/service/service.go
Removes upfront challenge generation logic and direct TransitionCertificateTemplatesToDelivered call with challenges map; now passes only DeliveringTemplateIDs.
Certificate Provisioning
server/service/certificates.go
Adds on-demand fleet challenge creation in GetDeviceCertificateTemplate: calls GetOrCreateFleetChallengeForCertificateTemplate when delivering a certificate with nil/empty FleetChallenge.
Mock Datastore
server/mock/datastore_mock.go
Updates mock function signatures: TransitionCertificateTemplatesToDeliveredFunc parameter changed to templateIDs []uint; adds new GetOrCreateFleetChallengeForCertificateTemplateFunc type and corresponding DataStore field/method.
Tests
server/datastore/mysql/host_certificate_templates_test.go
Adds comprehensive test for GetOrCreateFleetChallengeForCertificateTemplate covering error cases, idempotency, and storage verification. Refactors delivered-transition tests to verify nil FleetChallenge post-transition and on-demand creation.
Android Service Tests
server/mdm/android/service/profiles_test.go
Updates test expectations: FleetChallenge struct field type changes from string to *string, assertions expect nil challenge at insertion (created on-demand later).
Changelog
changes/37651-Android-fleet-challenge
Documents fix for Android certificate enrollment failures during SCEP challenge expiration when devices are offline.

Sequence Diagram

sequenceDiagram
    actor Device as Android Device
    participant Server as Fleet Server
    participant DB as Database
    participant SCEP as SCEP Proxy

    Note over Device,SCEP: On-Demand Challenge Creation Flow

    Device->>Server: GET /device/certificate-template
    Server->>DB: Query host_certificate_templates (status=Delivered)
    DB-->>Server: Return template with FleetChallenge=nil
    
    alt FleetChallenge is nil
        Server->>DB: BEGIN TRANSACTION
        Server->>DB: SELECT fleet_challenge FOR UPDATE
        DB-->>Server: Confirm nil
        
        Server->>DB: INSERT INTO challenges (value, created_at)
        DB-->>Server: Challenge created
        
        Server->>DB: UPDATE host_certificate_templates SET fleet_challenge=?
        DB-->>Server: Updated
        
        Server->>DB: COMMIT
        DB-->>Server: Transaction committed
    end
    
    Server-->>Device: Return certificate template with FleetChallenge
    
    Device->>SCEP: SCEP enrollment request with FleetChallenge
    SCEP-->>Device: Certificate issued

    Note over Server: If challenge had expired:<br/>SCEP proxy catches error,<br/>future GET will regenerate
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

  • PR #36978: Modifies Android certificate-template delivery workflow and TransitionCertificateTemplatesToDelivered signature, directly affected by this PR's datastore changes.
  • PR #36923: Updates Android client/data types to accept nullable fleetChallenge, complementing this PR's server-side on-demand creation model.
  • PR #36648: Modifies GetDeviceCertificateTemplate behavior in server/service/certificates.go, overlapping with this PR's certificate provisioning logic.

Suggested reviewers

  • mostlikelee
  • dantecatalfamo
  • ksykulev
🚥 Pre-merge checks | ✅ 1 | ❌ 4
❌ Failed checks (2 warnings, 2 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The pull request description is incomplete compared to the template. Required sections like input validation, testing details, database migrations, and configuration checks are mostly unchecked or missing. Complete the checklist by verifying and checking all applicable boxes: input validation/SQL injection prevention, backwards compatibility, automated tests scope, QA details, database migration impact, and any fleetd/orbit compatibility considerations.
Linked Issues check ❓ Inconclusive The PR changes implement on-demand fleet challenge creation and deferred challenge generation, but the code does not include the critical ResendHostCertificateTemplate function or handleFleetChallenge integration required by issue #37651. Verify whether ResendHostCertificateTemplate was implemented elsewhere or in a separate PR, and ensure handleFleetChallenge integration for Android profile error handling is included.
Out of Scope Changes check ❓ Inconclusive The PR includes a changes file entry but focuses on refactoring challenge generation logic rather than the specific ResendHostCertificateTemplate recovery mechanism described in issue #37651. Clarify whether the challenge deferral approach is the intended solution, or if additional changes for template reversion and challenge clearing are needed per the issue requirements.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing Android certificate enrollment failures caused by SCEP challenge expiration when devices are offline.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch victor/37651-android-fleet-challenge

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@server/datastore/mysql/host_certificate_templates.go`:
- Around line 667-723: GetOrCreateFleetChallengeForCertificateTemplate currently
returns any non-empty fleet_challenge from host_certificate_templates without
verifying the challenge still exists in the challenges table, causing
orphaned/stale values after CleanupExpiredChallenges removes expired rows;
update GetOrCreateFleetChallengeForCertificateTemplate to validate that an
existing fleet_challenge value corresponds to a live row in the challenges table
(e.g., query challenges for the token or call ConsumeChallenge-like verification
inside the same transaction) and if the challenge is missing/expired, clear
fleet_challenge (UPDATE host_certificate_templates SET fleet_challenge = NULL,
updated_at = NOW()) and proceed to create a new challenge via newChallenge and
store it atomically; additionally ensure any code paths that resend profiles on
ConsumeChallenge failure also clear the stale fleet_challenge in
host_certificate_templates to prevent leakage.

Comment on lines +667 to +723
// GetOrCreateFleetChallengeForCertificateTemplate ensures a fleet challenge exists for the given
// host and certificate template. If a challenge already exists in host_certificate_templates,
// it returns it. If not, it creates a new one atomically and stores it in both the challenges
// table (for validation) and host_certificate_templates (for retrieval).
// This method only works for templates in 'delivered' status.
func (ds *Datastore) GetOrCreateFleetChallengeForCertificateTemplate(
ctx context.Context,
hostUUID string,
certificateTemplateID uint,
) (string, error) {
var challenge string
err := ds.withRetryTxx(ctx, func(tx sqlx.ExtContext) error {
// Check if challenge already exists using FOR UPDATE to prevent race conditions
var existingChallenge sql.NullString
err := sqlx.GetContext(ctx, tx, &existingChallenge, fmt.Sprintf(`
SELECT fleet_challenge
FROM host_certificate_templates
WHERE host_uuid = ? AND certificate_template_id = ? AND status = '%s' AND operation_type = '%s'
FOR UPDATE
`, fleet.CertificateTemplateDelivered, fleet.MDMOperationTypeInstall), hostUUID, certificateTemplateID)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
return ctxerr.Wrap(ctx, notFound("HostCertificateTemplate"), "template not found or not in delivered status")
}
return ctxerr.Wrap(ctx, err, "check existing challenge")
}

// If challenge exists and is non-empty, return it
if existingChallenge.Valid && existingChallenge.String != "" {
challenge = existingChallenge.String
return nil
}

// Create new challenge using the transaction
newChal, err := newChallenge(ctx, tx)
if err != nil {
return ctxerr.Wrap(ctx, err, "create challenge")
}

// Update host_certificate_templates with the challenge
if _, err := tx.ExecContext(ctx, fmt.Sprintf(`
UPDATE host_certificate_templates
SET fleet_challenge = ?, updated_at = NOW()
WHERE host_uuid = ? AND certificate_template_id = ? AND status = '%s' AND operation_type = '%s'
`, fleet.CertificateTemplateDelivered, fleet.MDMOperationTypeInstall), newChal, hostUUID, certificateTemplateID); err != nil {
return ctxerr.Wrap(ctx, err, "update fleet_challenge in host_certificate_templates")
}

challenge = newChal
return nil
})

if err != nil {
return "", err
}
return challenge, nil
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for logic that clears or resets fleet_challenge
rg -n "fleet_challenge\s*=\s*(NULL|''" -g '*.go' -A 2 -B 2

Repository: fleetdm/fleet

Length of output: 160


🏁 Script executed:

#!/bin/bash
# Search for ResendHostCertificateTemplate and related functions
rg -n "ResendHostCertificateTemplate|CleanupExpiredChallenges|handleFleetChallenge" -g '*.go' -A 5

Repository: fleetdm/fleet

Length of output: 5499


🏁 Script executed:

#!/bin/bash
# Search for where host_certificate_templates is updated/cleared
rg -n "UPDATE host_certificate_templates" -g '*.go' -A 2 -B 2

Repository: fleetdm/fleet

Length of output: 7173


🏁 Script executed:

#!/bin/bash
# Check the challenges table cleanup and expiration logic
rg -n "challenges" -g '*.go' -A 3 | grep -E "(DELETE|expir|Cleanup|clean)" -i | head -20

Repository: fleetdm/fleet

Length of output: 1916


🏁 Script executed:

#!/bin/bash
# Look at the ConsumeChallenge implementation to understand how challenges are used
rg -n "ConsumeChallenge" -g '*.go' -A 10 | head -50

Repository: fleetdm/fleet

Length of output: 3797


🏁 Script executed:

#!/bin/bash
# Check if fleet_challenge is ever cleared/reset in host_certificate_templates
rg -n "fleet_challenge.*NULL|fleet_challenge.*''" -g '*.go' --multiline -A 2 -B 2

Repository: fleetdm/fleet

Length of output: 1955


🏁 Script executed:

#!/bin/bash
# Look for how the challenge flow works - check Resend and profile delivery
rg -n "ResendHostCertificateTemplate" -g '*.go' -A 15

Repository: fleetdm/fleet

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Search for Resend in certificate template context
rg -n "Resend.*[Cc]ertificate|resend.*template" -g '*.go' -i | head -20

Repository: fleetdm/fleet

Length of output: 2485


🏁 Script executed:

#!/bin/bash
# Check the implementation of ConsumeChallenge to see what it does
rg -n "ConsumeChallenge" -g '*.go' -A 15 | grep -A 15 "func.*ConsumeChallenge"

Repository: fleetdm/fleet

Length of output: 2724


🏁 Script executed:

#!/bin/bash
# Look for where fleet_challenge is set and how the challenge lifecycle works
rg -n "fleet_challenge\s*=" -g '*.go' -B 3 -A 3

Repository: fleetdm/fleet

Length of output: 1722


🏁 Script executed:

#!/bin/bash
# Get the full ResendHostCertificateProfile implementation
rg -n "func.*ResendHostCertificateProfile" -g '*.go' -A 30 | head -80

Repository: fleetdm/fleet

Length of output: 4388


🏁 Script executed:

#!/bin/bash
# Check if there's any mechanism that clears fleet_challenge when a challenge is consumed
rg -n "fleet_challenge.*NULL" -g '*.go' -B 5 -A 5

Repository: fleetdm/fleet

Length of output: 3531


🏁 Script executed:

#!/bin/bash
# Look at the full ConsumeChallenge implementation
rg -n "func \(ds \*Datastore\) ConsumeChallenge" -g '*.go' -A 50

Repository: fleetdm/fleet

Length of output: 4061


🏁 Script executed:

#!/bin/bash
# Look at the handleFleetChallenge implementation to see full flow
rg -n "func.*handleFleetChallenge" -g '*.go' -A 40

Repository: fleetdm/fleet

Length of output: 2936


🏁 Script executed:

#!/bin/bash
# Check if there's any code that clears fleet_challenge after ConsumeChallenge
rg -n "ConsumeChallenge" -g '*.go' -B 5 -A 15 | grep -A 15 "ConsumeChallenge"

Repository: fleetdm/fleet

Length of output: 24079


🏁 Script executed:

#!/bin/bash
# Check if fleet_challenge is cleared when status transitions
rg -n "status.*=.*pending|status.*=.*NULL" -g 'host_certificate_templates.go' -B 3 -A 3

Repository: fleetdm/fleet

Length of output: 1561


🏁 Script executed:

#!/bin/bash
# Check migrations to see the column definition and any reset logic
cat server/datastore/mysql/migrations/tables/20251209221850_HostCertificateTemplatesStatusDefault.go

Repository: fleetdm/fleet

Length of output: 938


🏁 Script executed:

#!/bin/bash
# Look for any code that updates fleet_challenge to NULL or clears it
rg -n "fleet_challenge" -g '*.go' | grep -i "update\|clear\|null\|delete" | head -20

Repository: fleetdm/fleet

Length of output: 943


🏁 Script executed:

#!/bin/bash
# Look at the GetOrCreateFleetChallengeForCertificateTemplate more carefully to see what status it requires
cat -n server/datastore/mysql/host_certificate_templates.go | sed -n '667,723p'

Repository: fleetdm/fleet

Length of output: 2787


Stale challenge values remain in host_certificate_templates after expiration.

GetOrCreateFleetChallengeForCertificateTemplate returns any non-empty fleet_challenge without confirming it still exists in the challenges table. When CleanupExpiredChallenges removes expired entries from challenges, the corresponding fleet_challenge values persist in host_certificate_templates. While the SCEP handler recovers by resending the profile when ConsumeChallenge fails, stale challenges accumulate as orphaned data. Consider clearing fleet_challenge from host_certificate_templates as part of the expiration or resend process to avoid this leakage.

🤖 Prompt for AI Agents
In `@server/datastore/mysql/host_certificate_templates.go` around lines 667 - 723,
GetOrCreateFleetChallengeForCertificateTemplate currently returns any non-empty
fleet_challenge from host_certificate_templates without verifying the challenge
still exists in the challenges table, causing orphaned/stale values after
CleanupExpiredChallenges removes expired rows; update
GetOrCreateFleetChallengeForCertificateTemplate to validate that an existing
fleet_challenge value corresponds to a live row in the challenges table (e.g.,
query challenges for the token or call ConsumeChallenge-like verification inside
the same transaction) and if the challenge is missing/expired, clear
fleet_challenge (UPDATE host_certificate_templates SET fleet_challenge = NULL,
updated_at = NOW()) and proceed to create a new challenge via newChallenge and
store it atomically; additionally ensure any code paths that resend profiles on
ConsumeChallenge failure also clear the stale fleet_challenge in
host_certificate_templates to prevent leakage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai this feels a bit outside the scope of this fix. If the challenge is stale/expired, then Android will fail to get a cert. The server will then resend it manually or automatically as part of future work: #37556

I don't see how we're accumulating orphan rows here. The rows in host_certificate_templates will still be valid to show the status of the certs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@getvictor, you're right—this is appropriate scope management for this PR. The functional issue (enrollment recovery when challenges expire) is addressed by the resend logic in #37556.

To clarify: I was referring to stale challenge values persisting in the fleet_challenge column (a data cleanliness concern), not orphan rows in host_certificate_templates (the rows themselves are valid and correctly show cert status). Since the resend flow will handle the functional recovery and this is tracked for future work, deferring the challenge-clearing logic makes sense.

Thanks for the clarification! 👍


🧠 Learnings used
Learnt from: getvictor
Repo: fleetdm/fleet PR: 30589
File: ee/server/service/hostidentity/depot/depot.go:104-119
Timestamp: 2025-07-08T16:06:54.576Z
Learning: In ee/server/service/hostidentity/depot/depot.go, the security concern where shared challenges allow certificate revocation (lines 104-119) is a known issue that will be addressed in a later feature, not an immediate concern to fix.

Learnt from: getvictor
Repo: fleetdm/fleet PR: 30589
File: server/datastore/mysql/schema.sql:501-517
Timestamp: 2025-07-07T22:21:15.748Z
Learning: In the host_identity_scep_certificates table schema, the VARBINARY(100) size for public_key_raw, the nullable host_id without a foreign key constraint, and the use of plain DATETIME instead of DATETIME(6) are intentional design decisions, not issues to be addressed.

Learnt from: getvictor
Repo: fleetdm/fleet PR: 36139
File: android/app/src/main/java/com/fleetdm/agent/scep/ScepClientImpl.kt:75-76
Timestamp: 2025-11-26T18:58:18.865Z
Learning: In Fleet's Android MDM agent SCEP implementation (android/app/src/main/java/com/fleetdm/agent/scep/ScepClientImpl.kt), OptimisticCertificateVerifier is intentionally used because: (1) SCEP URL is provided by authenticated MDM server, (2) challenge password authenticates enrollment, (3) enterprise SCEP servers use internal CAs not in system trust stores, (4) enrolled certificate is validated when used.

@codecov
Copy link

codecov bot commented Jan 25, 2026

Codecov Report

❌ Patch coverage is 77.55102% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.07%. Comparing base (ef76a87) to head (088352f).
⚠️ Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
...rver/datastore/mysql/host_certificate_templates.go 79.48% 4 Missing and 4 partials ⚠️
server/service/certificates.go 66.66% 1 Missing and 1 partial ⚠️
server/mdm/android/service/service.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #38753   +/-   ##
=======================================
  Coverage   66.07%   66.07%           
=======================================
  Files        2414     2414           
  Lines      192710   192735   +25     
  Branches     8488     8488           
=======================================
+ Hits       127329   127346   +17     
- Misses      53818    53820    +2     
- Partials    11563    11569    +6     
Flag Coverage Δ
backend 67.91% <77.55%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@getvictor getvictor marked this pull request as ready for review January 26, 2026 17:29
@getvictor getvictor requested a review from a team as a code owner January 26, 2026 17:29
Copilot AI review requested due to automatic review settings January 26, 2026 17:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Android certificate enrollment failures when devices were offline long enough for pre-generated SCEP fleet challenges to expire, by switching to generating challenges on-demand.

Changes:

  • Stop generating fleet challenges during Android certificate template delivery; transition templates to delivered without challenges.
  • Generate the fleet challenge on-demand when the Android device fetches its certificate template.
  • Add datastore support + tests for atomically creating (or returning) a fleet challenge for a delivered certificate template.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server/service/certificates.go Creates fleet challenge on-demand when a device fetches a delivered certificate template.
server/mdm/android/service/service.go Removes pre-generation of challenges during delivery; only transitions templates to delivered.
server/datastore/mysql/host_certificate_templates.go Updates delivery transition API and adds GetOrCreateFleetChallengeForCertificateTemplate implementation.
server/datastore/mysql/challenges.go Refactors challenge creation into a helper to support transactional creation.
server/fleet/datastore.go Updates datastore interface for new delivery signature and adds GetOrCreateFleetChallengeForCertificateTemplate.
server/mock/datastore_mock.go Updates mock datastore interface for signature change and new API.
server/mdm/android/service/profiles_test.go Updates tests to expect nil fleet_challenge until device fetch.
server/datastore/mysql/host_certificate_templates_test.go Updates state-machine test and adds tests for GetOrCreateFleetChallengeForCertificateTemplate.
changes/37651-Android-fleet-challenge Adds user-visible changelog entry for the fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +452 to +461
query, args, err := sqlx.In(fmt.Sprintf(`
UPDATE host_certificate_templates
SET
status = '%s',
fleet_challenge = %s,
updated_at = NOW()
WHERE
host_uuid = ? AND
status = '%s' AND
certificate_template_id IN (%s)
`, fleet.CertificateTemplateDelivered, caseStmt.String(), fleet.CertificateTemplateDelivering, strings.Join(inPlaceholders, ","))
certificate_template_id IN (?)
`, fleet.CertificateTemplateDelivered, fleet.CertificateTemplateDelivering), hostUUID, templateIDs)
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransitionCertificateTemplatesToDelivered updates status to "delivered" but leaves any existing fleet_challenge value untouched. If a template previously had a fleet_challenge (e.g., from an earlier delivery attempt), that stale value will prevent a fresh on-demand challenge from being created and can reintroduce expired/consumed-challenge failures. Consider explicitly clearing fleet_challenge (set NULL/empty) as part of this transition so each delivery starts without a cached challenge.

Copilot uses AI. Check for mistakes.
Comment on lines +696 to +699
// If challenge exists and is non-empty, return it
if existingChallenge.Valid && existingChallenge.String != "" {
challenge = existingChallenge.String
return nil
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetOrCreateFleetChallengeForCertificateTemplate returns an existing non-empty fleet_challenge without verifying that it still exists (and is still within TTL) in the challenges table. If the challenge was already consumed or expired/cleaned up, this will keep returning an invalid challenge and the device will never recover. To match the "ensure challenge exists" contract, validate the stored challenge against the challenges table (and TTL) before returning it; if missing/expired, generate a new challenge and update host_certificate_templates accordingly.

Suggested change
// If challenge exists and is non-empty, return it
if existingChallenge.Valid && existingChallenge.String != "" {
challenge = existingChallenge.String
return nil
// If challenge exists and is non-empty, validate it against the challenges table
if existingChallenge.Valid && existingChallenge.String != "" {
var count int
if err := sqlx.GetContext(ctx, tx, &count, `
SELECT COUNT(*)
FROM challenges
WHERE challenge = ? AND expires_at > NOW()
`, existingChallenge.String); err != nil {
return ctxerr.Wrap(ctx, err, "verify existing challenge")
}
// If the challenge still exists and is within TTL, reuse it
if count > 0 {
challenge = existingChallenge.String
return nil
}
// Otherwise, fall through to create a new challenge and update host_certificate_templates

Copilot uses AI. Check for mistakes.
Comment on lines +261 to +270
// On-demand challenge creation for delivered status.
// If FleetChallenge is nil or empty, create one now (the challenge TTL starts from this moment).
if certificate.Status == fleet.CertificateTemplateDelivered {
if certificate.FleetChallenge == nil || *certificate.FleetChallenge == "" {
challenge, err := svc.ds.GetOrCreateFleetChallengeForCertificateTemplate(ctx, host.UUID, id)
if err != nil {
return nil, ctxerr.Wrap(ctx, err, "create fleet challenge on-demand")
}
certificate.FleetChallenge = &challenge
}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint only creates a challenge when FleetChallenge is nil/empty. If host_certificate_templates.fleet_challenge is set to a stale value (e.g., consumed/expired and no longer present in the challenges table), the handler will keep returning that value and will never call GetOrCreateFleetChallengeForCertificateTemplate to recover. Consider always calling the datastore "get-or-create" for delivered templates (and letting it validate/refresh), or otherwise ensuring stale challenges are cleared/ignored before responding.

Suggested change
// On-demand challenge creation for delivered status.
// If FleetChallenge is nil or empty, create one now (the challenge TTL starts from this moment).
if certificate.Status == fleet.CertificateTemplateDelivered {
if certificate.FleetChallenge == nil || *certificate.FleetChallenge == "" {
challenge, err := svc.ds.GetOrCreateFleetChallengeForCertificateTemplate(ctx, host.UUID, id)
if err != nil {
return nil, ctxerr.Wrap(ctx, err, "create fleet challenge on-demand")
}
certificate.FleetChallenge = &challenge
}
// On-demand challenge creation/refresh for delivered status.
// Always obtain the challenge from the datastore so that stale values are refreshed.
if certificate.Status == fleet.CertificateTemplateDelivered {
challenge, err := svc.ds.GetOrCreateFleetChallengeForCertificateTemplate(ctx, host.UUID, id)
if err != nil {
return nil, ctxerr.Wrap(ctx, err, "create or refresh fleet challenge on-demand")
}
certificate.FleetChallenge = &challenge

Copilot uses AI. Check for mistakes.

// GetOrCreateFleetChallengeForCertificateTemplate ensures a fleet challenge exists for the given
// host and certificate template. If a challenge already exists, it returns it. If not, it creates
// a new one atomically. Only works for templates in 'delivered' status.
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface comment for GetOrCreateFleetChallengeForCertificateTemplate says it "only works for templates in 'delivered' status", but the MySQL implementation also requires operation_type='install'. Update the docstring to reflect the install-only behavior (or relax the implementation if other operation types should be supported).

Suggested change
// a new one atomically. Only works for templates in 'delivered' status.
// a new one atomically. Only works for templates in 'delivered' status and with operation_type 'install'.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Android: No recovery when fleet challenge expires during SCEP enrollment

3 participants