Skip to content

fix(plugins/k8saudit-ovh): add WebSocket reconnect loop and improve resilience#1270

Merged
poiana merged 2 commits intofalcosecurity:mainfrom
elprofessor-de:fix/k8saudit-ovh-reconnect
Apr 7, 2026
Merged

fix(plugins/k8saudit-ovh): add WebSocket reconnect loop and improve resilience#1270
poiana merged 2 commits intofalcosecurity:mainfrom
elprofessor-de:fix/k8saudit-ovh-reconnect

Conversation

@elprofessor-de
Copy link
Copy Markdown
Contributor

Problem

When the LDP WebSocket connection drops (timeout, network blip, server-side close), the plugin logged "Will try to reconnect after 1s..." but actually exited the goroutine without reconnecting. This caused eventC to be closed permanently, and Falco stopped receiving Kubernetes audit events indefinitely — until the pod was manually restarted.

Changes

Reconnect loop: wrapped the dial + read loop in an outer for loop so the connection is re-established automatically after any failure
Dial retry: on initial connection failure, the plugin now retries instead of pushing the error to eventC and exiting
Ping failure handling: if writing a ping fails after a timeout, the inner loop now breaks to trigger a reconnection instead of continuing on a dead connection
JSON error handling: added proper error checking on both json.Unmarshal calls — malformed messages are now logged and skipped cleanly
Magic number removed: replaced the hardcoded [12:] slice (which assumed a fixed-length _appID) with a strings.SplitN on the "> " separator — prevents potential panics on short strings

Testing

Tested against an OVHcloud MKS cluster with intentional WebSocket disconnections — the plugin reconnects automatically and resumes event ingestion without restarting the Falco pod.

@poiana
Copy link
Copy Markdown
Contributor

poiana commented Mar 30, 2026

Welcome @elprofessor-de! It looks like this is your first PR to falcosecurity/plugins 🎉

@poiana poiana added the size/XL label Mar 30, 2026
@irozzo-1A
Copy link
Copy Markdown
Contributor

Hey @elprofessor-de, thanks for your contribution. I have two preliminary comments:

  1. You should sign-off your commits
  2. Your fixed formatting on the go file, but this made your change hard to review. Would you mind keeping the actual fix and the formatting fix in distinct commits to make the review easier?

Thanks!

@elprofessor-de elprofessor-de force-pushed the fix/k8saudit-ovh-reconnect branch from 7cb31c1 to 9fbd8d0 Compare March 31, 2026 12:26
@elprofessor-de elprofessor-de force-pushed the fix/k8saudit-ovh-reconnect branch from 9fbd8d0 to 4cddf1f Compare March 31, 2026 12:50
@poiana poiana added size/L and removed size/XL labels Mar 31, 2026
@elprofessor-de
Copy link
Copy Markdown
Contributor Author

Done! I've split the commits and added the sign-off.
Let me know if anything else needs to be adjusted.

Copy link
Copy Markdown
Contributor

@irozzo-1A irozzo-1A left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR, I suggested a few changes to improve error handling

Comment thread plugins/k8saudit-ovh/pkg/k8sauditovh/k8sauditovh.go Outdated
Comment thread plugins/k8saudit-ovh/pkg/k8sauditovh/k8sauditovh.go Outdated
Comment thread plugins/k8saudit-ovh/pkg/k8sauditovh/k8sauditovh.go Outdated
Comment thread plugins/k8saudit-ovh/pkg/k8sauditovh/k8sauditovh.go Outdated
@elprofessor-de elprofessor-de force-pushed the fix/k8saudit-ovh-reconnect branch from 4cddf1f to 974130f Compare April 1, 2026 12:34
@poiana poiana added size/XL and removed size/L labels Apr 1, 2026
@ekoops
Copy link
Copy Markdown
Contributor

ekoops commented Apr 1, 2026

I guessed you squashed together the commit you spllited 😄

@elprofessor-de elprofessor-de force-pushed the fix/k8saudit-ovh-reconnect branch from 974130f to 182f5c7 Compare April 1, 2026 14:02
@elprofessor-de
Copy link
Copy Markdown
Contributor Author

Sorry about that!
I've re-split the commits — one for the fix and one for the CHANGELOG 🙏

@irozzo-1A
Copy link
Copy Markdown
Contributor

Thanks @elprofessor-de for addressing the review comments, sorry for being picky but there are some formatting changes again in the diff, if you could keep the changes to the bare minimum it would be good for tracking.
I would address the formatting change in a dedicated commit or even better with a dedicated PR.
After this change for me it's ready to go.

@ekoops
Copy link
Copy Markdown
Contributor

ekoops commented Apr 1, 2026

Yes... I guess it's IDE setting fault 😅

@irozzo-1A
Copy link
Copy Markdown
Contributor

Yes... I guess it's IDE setting fault 😅

definitely, not @elprofessor-de fault, it's annoying to deal with not well formatted code. We have to enforce formatting check in the CI

@elprofessor-de elprofessor-de force-pushed the fix/k8saudit-ovh-reconnect branch from 182f5c7 to 3d69ce9 Compare April 1, 2026 16:00
@poiana poiana added size/L and removed size/XL labels Apr 1, 2026
@elprofessor-de elprofessor-de force-pushed the fix/k8saudit-ovh-reconnect branch 4 times, most recently from aeea7e5 to ac46851 Compare April 1, 2026 16:14
@elprofessor-de elprofessor-de force-pushed the fix/k8saudit-ovh-reconnect branch from ac46851 to e9c2bd6 Compare April 1, 2026 16:17
@elprofessor-de
Copy link
Copy Markdown
Contributor Author

Applied gofmt — the diff should now be clean. Let me know if anything else needs to be adjusted 🙏

Comment thread plugins/k8saudit-ovh/CHANGELOG.md Outdated
* fix(plugins/k8saudit-ovh): retry dial on initial connection failure instead of exiting
* fix(plugins/k8saudit-ovh): break inner read loop when ping fails after timeout
* fix(plugins/k8saudit-ovh): add error handling on json.Unmarshal for log envelope and message payload
* fix(plugins/k8saudit-ovh): replace magic [12:] slice with dynamic separator-based extraction
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating also the changelog, but this should be automatically generated by issuing make changelog/k8saudit-ovh on the root folder of the project. Also, since you are updating the changelog, could you please also update the plugin version? The new version should be 0.4.1, since this is a non-breaking fix. You should update it also in plugins/k8saudit-ovh/plugin/main.go

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know about make changelog/k8saudit-ovh! I'll regenerate it properly and bump the version to 0.4.1 in main.go. 🙏

@elprofessor-de elprofessor-de force-pushed the fix/k8saudit-ovh-reconnect branch from e9c2bd6 to ffd3c32 Compare April 2, 2026 08:42
@ekoops ekoops added kind/bug Something isn't working area/plugins labels Apr 2, 2026
@scraly
Copy link
Copy Markdown
Contributor

scraly commented Apr 2, 2026

Thanks for the PR ♥️💪
About the new version, you can change it here: https://github.com/falcosecurity/plugins/blob/main/plugins/k8saudit-ovh/plugin/main.go#L32C2-L32C36

…esilience

Signed-off-by: Ilyass KAOUAM <ilyassikai@gmail.com>
…ANGELOG

Signed-off-by: Ilyass KAOUAM <ilyassikai@gmail.com>
@elprofessor-de elprofessor-de force-pushed the fix/k8saudit-ovh-reconnect branch from ffd3c32 to 4bd80c5 Compare April 2, 2026 12:18
@elprofessor-de
Copy link
Copy Markdown
Contributor Author

Done! Regenerated the CHANGELOG using make changelog/k8saudit-ovh and bumped the version to v0.4.1 in main.go. 🙏

Copy link
Copy Markdown
Contributor

@scraly scraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in a MKS cluster in 3AZ Paris, and it's OK 💪
Thanks :)

@poiana
Copy link
Copy Markdown
Contributor

poiana commented Apr 2, 2026

LGTM label has been added.

DetailsGit tree hash: b1a3e1849f0965b2fe9f83f521fbe6e9225bd51f

@poiana poiana added the approved label Apr 2, 2026
@irozzo-1A irozzo-1A self-requested a review April 2, 2026 13:55
Copy link
Copy Markdown
Contributor

@irozzo-1A irozzo-1A left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Copy Markdown
Contributor

@ekoops ekoops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Copy Markdown
Contributor

poiana commented Apr 7, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ekoops, elprofessor-de, irozzo-1A, scraly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 9d77db0 into falcosecurity:main Apr 7, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants