Avoid starting and stopping service components on policy change #11740

ycombinator · 2025-12-11T02:10:07Z

What does this PR do?

This PR identifies Service Runtime components with only their input type; the output ID is not longer used.

Why is it important?

Service Runtime components are intended to be kept running (via a service) for as long as possible. We should only start or stop them if they are being explicitly added or removed, respectively, from the component model. If only their configuration is being updated, we should not stop and start the component.

If a component's ID changes between the last and current component models, Elastic Agent will as the component's service to stop and then start itself. Prior to this PR, service components' ID were determined by their input type and output ID. Therefore, if a service component's output were changed, it would cause the service to restart. This is undesirable behavior, as services should be kept running as long as possible.

With the changes in this PR, we no longer consider the output ID when generating service components' IDs. If a service component's output is changed, it's ID remains the same between the last and current component models. Elastic Agent does not stop and start the component's service but simply passes the configuration change to it (which it was doing prior to this PR anyway).

Checklist

I have read and understood the pull request guidelines of this project.
My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in ./changelog/fragments using the changelog tool
~~I have added an integration test or an E2E test~~

Disruptive User Impact

None.

How to test this PR locally

Using the Fleet UI, create three Agent policies:
- default: containing only the system integration
- tp-es: containing the Elastic Defend integration, with tamper protection enabled, and using the Elasticsearch output.
- tp-ls: containing the Elastic Defend integration, with tamper protection enabled, and using the Logstash output. Note that you will need to create the Logstash output in Fleet > Settings.
Enroll an Elastic Agent in the tp-es policy and verify the agent is healthy and shipping data.
Assign the Agent to the tp-ls policy.
Check the Agent logs and make sure the Endpoint component is not being stopped and started. Concretely, check that there is no log entry for stopping endpoint service runtime.
Check the Endpoint logs (located under /opt/Elastic/Endpoint/state/log/ on Linux) and make sure that Endpoint has connected to Logstash (or has attempted to and failed if there is no actual Logstash endpoint listening).
Assign the Agent to the default policy.
Check the Agent logs and make sure the Endpoint component is stopped and uninstalled. Concretely, check that there is a log entry for stopping endpoint service runtime, followed by uninstall endpoint service, followed by Stopped: endpoint service runtime.

Related issues

Resolves Cannot successfully change output type or name in tamper protected agent polices that contain Elastic Defend #11266

Questions to ask yourself

How are we going to support this in production?
How are we going to measure its adoption?
How are we going to debug this?
What are the metrics I should take care of?
...

mergify · 2025-12-11T02:10:47Z

This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
backport-active-all is the label that automatically backports to all active branches.
backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

elasticmachine · 2025-12-12T14:54:58Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

elasticmachine · 2025-12-12T20:57:50Z

💔 Build Failed

Buildkite Build
Commit: d7bf91b

Failed CI Steps

History

💔 Build #31949 failed 1951fec
💚 Build #31882 succeeded 8787cfc
💛 Build #31841 was flaky a71bf32
💛 Build #31839 was flaky 18d8022

cc @ycombinator

ycombinator added 4 commits December 10, 2025 18:06

Add UsesCommandRuntime and UsesServiceRuntime methods on Component

8efd9ce

Use new methods

805a697

Add test case for only output being changed on service component

56123d5

Implement logic to not remove and add same service component

18d8022

mergify bot assigned ycombinator Dec 11, 2025

ycombinator added 7 commits December 10, 2025 23:50

Adding CHANGELOG fragment

a71bf32

Improve comment

8787cfc

Fix logic location

378ce91

Update unit test

5e3335b

Update service component naming

cfd4caf

Refactor: extract logic into helper method

7f9842b

Relocate unit test and add lots of cases

ab99782

ycombinator changed the title ~~Avoid stopping and stopping service components on policy change~~ Avoid starting and stopping service components on policy change Dec 12, 2025

ycombinator added 4 commits December 12, 2025 06:51

Remove unnecessary code

b93829f

Clarify comments

d5c435d

Remove unnecessary unit test

f623b03

Undo unnecessary changes

1951fec

ycombinator force-pushed the service-component-avoid-stop-start branch from 84a4523 to 1951fec Compare December 12, 2025 14:53

ycombinator marked this pull request as ready for review December 12, 2025 14:54

ycombinator requested a review from a team as a code owner December 12, 2025 14:54

ycombinator requested review from blakerouse and michel-laterman December 12, 2025 14:54

ycombinator added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Dec 12, 2025

Update component ID in integration test

d7bf91b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid starting and stopping service components on policy change #11740

Avoid starting and stopping service components on policy change #11740

ycombinator commented Dec 11, 2025 •

edited

Loading

Uh oh!

mergify bot commented Dec 11, 2025

Uh oh!

elasticmachine commented Dec 12, 2025

Uh oh!

elasticmachine commented Dec 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Avoid starting and stopping service components on policy change #11740

Are you sure you want to change the base?

Avoid starting and stopping service components on policy change #11740

Conversation

ycombinator commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why is it important?

Checklist

Disruptive User Impact

How to test this PR locally

Related issues

Questions to ask yourself

Uh oh!

mergify bot commented Dec 11, 2025

Uh oh!

elasticmachine commented Dec 12, 2025

Uh oh!

elasticmachine commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💔 Build Failed

Failed CI Steps

History

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ycombinator commented Dec 11, 2025 •

edited

Loading

elasticmachine commented Dec 12, 2025 •

edited

Loading