Skip to content

Conversation

@florianl
Copy link
Member

What does this PR do?

CI tests reveal the following panic:

=== FAIL: internal/pkg/otel/manager TestOTelManager_Run (300.01s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1e3fa37]
goroutine 923 [running]:
testing.tRunner.func1.2({0x20ab660, 0x2ef4510})
	/opt/buildkite-agent/.asdf/installs/golang/1.24.11/go/src/testing/testing.go:1734 +0x3eb
testing.tRunner.func1()
	/opt/buildkite-agent/.asdf/installs/golang/1.24.11/go/src/testing/testing.go:1737 +0x696
panic({0x20ab660?, 0x2ef4510?})
	/opt/buildkite-agent/.asdf/installs/golang/1.24.11/go/src/runtime/panic.go:792 +0x132
github.com/elastic/elastic-agent/internal/pkg/otel/manager.countHealthCheckExtensionStatuses(0x0)
	/opt/buildkite-agent/builds/bk-agent-prod-gcp-1765273351968763419/elastic/elastic-agent/internal/pkg/otel/manager/manager_test.go:294 +0x37
github.com/elastic/elastic-agent/internal/pkg/otel/manager.TestOTelManager_Run.func4(0xc000166000, 0xc0001c2000, 0xc0002dc7e0, 0xc0002a62d0, 0xc00057be18?, 0x1?)
	/opt/buildkite-agent/builds/bk-agent-prod-gcp-1765273351968763419/elastic/elastic-agent/internal/pkg/otel/manager/manager_test.go:372 +0x465
github.com/elastic/elastic-agent/internal/pkg/otel/manager.TestOTelManager_Run.func17(0xc000166000)
	/opt/buildkite-agent/builds/bk-agent-prod-gcp-1765273351968763419/elastic/elastic-agent/internal/pkg/otel/manager/manager_test.go:699 +0xc72
testing.tRunner(0xc000166000, 0xc0001740c0)
	/opt/buildkite-agent/.asdf/installs/golang/1.24.11/go/src/testing/testing.go:1792 +0x226
created by testing.(*T).Run in goroutine 142
	/opt/buildkite-agent/.asdf/installs/golang/1.24.11/go/src/testing/testing.go:1851 +0x8f3

https://buildkite.com/elastic/elastic-agent/builds/31698#019b027d-4c02-4a92-aa14-27107a3b7585/140-4642

Fix this panic by checking status first before use.

This panic was identified in the CI run of #11671.

CI tests reveal the following panic:

```
=== FAIL: internal/pkg/otel/manager TestOTelManager_Run (300.01s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1e3fa37]
goroutine 923 [running]:
testing.tRunner.func1.2({0x20ab660, 0x2ef4510})
	/opt/buildkite-agent/.asdf/installs/golang/1.24.11/go/src/testing/testing.go:1734 +0x3eb
testing.tRunner.func1()
	/opt/buildkite-agent/.asdf/installs/golang/1.24.11/go/src/testing/testing.go:1737 +0x696
panic({0x20ab660?, 0x2ef4510?})
	/opt/buildkite-agent/.asdf/installs/golang/1.24.11/go/src/runtime/panic.go:792 +0x132
github.com/elastic/elastic-agent/internal/pkg/otel/manager.countHealthCheckExtensionStatuses(0x0)
	/opt/buildkite-agent/builds/bk-agent-prod-gcp-1765273351968763419/elastic/elastic-agent/internal/pkg/otel/manager/manager_test.go:294 +0x37
github.com/elastic/elastic-agent/internal/pkg/otel/manager.TestOTelManager_Run.func4(0xc000166000, 0xc0001c2000, 0xc0002dc7e0, 0xc0002a62d0, 0xc00057be18?, 0x1?)
	/opt/buildkite-agent/builds/bk-agent-prod-gcp-1765273351968763419/elastic/elastic-agent/internal/pkg/otel/manager/manager_test.go:372 +0x465
github.com/elastic/elastic-agent/internal/pkg/otel/manager.TestOTelManager_Run.func17(0xc000166000)
	/opt/buildkite-agent/builds/bk-agent-prod-gcp-1765273351968763419/elastic/elastic-agent/internal/pkg/otel/manager/manager_test.go:699 +0xc72
testing.tRunner(0xc000166000, 0xc0001740c0)
	/opt/buildkite-agent/.asdf/installs/golang/1.24.11/go/src/testing/testing.go:1792 +0x226
created by testing.(*T).Run in goroutine 142
	/opt/buildkite-agent/.asdf/installs/golang/1.24.11/go/src/testing/testing.go:1851 +0x8f3

```

https://buildkite.com/elastic/elastic-agent/builds/31698#019b027d-4c02-4a92-aa14-27107a3b7585/140-4642

Fix this panic by checking `status` first before use.

Signed-off-by: Florian Lehner <[email protected]>
@florianl florianl added the bug Something isn't working label Dec 10, 2025
@florianl florianl requested a review from a team as a code owner December 10, 2025 08:29
@mergify
Copy link
Contributor

mergify bot commented Dec 10, 2025

This pull request does not have a backport label. Could you fix it @florianl? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Dec 10, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a test fix, it doesn't need a changelog entry.

Signed-off-by: Florian Lehner <[email protected]>
Signed-off-by: Florian Lehner <[email protected]>
@florianl
Copy link
Member Author

@swiatekm I have removed the changelog entry and updated the tests to use require instead of assert.

@swiatekm
Copy link
Contributor

@florianl could you also make the EnsureHealthy method itself use require? I'm fairly certain it's a bug that it doesn't. None of these tests should proceed if the collector isn't healthy.

@florianl florianl enabled auto-merge (squash) December 10, 2025 16:48
@florianl
Copy link
Member Author

I don't have permission to retrigger Buildkite CI nor to look further into the reason of the failed Buildkite CI - therefore, looking for help @elastic/elastic-agent-control-plane

@pchila
Copy link
Member

pchila commented Dec 11, 2025

I don't have permission to retrigger Buildkite CI nor to look further into the reason of the failed Buildkite CI - therefore, looking for help @elastic/elastic-agent-control-plane

Buildkite had an issue grabbing an agent for one k8s integration test. I have retriggered the failed step and this time bk managed to get an agent. This should allow the build to go to completion

@elasticmachine
Copy link
Contributor

elasticmachine commented Dec 11, 2025

@florianl florianl merged commit 3fd8171 into main Dec 12, 2025
22 checks passed
@florianl florianl deleted the countHealthCheckExtensionStatuses-panic branch December 12, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-skip bug Something isn't working skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants