fix flake in test for restarted followers not causing an election by tgross · Pull Request #669 · hashicorp/raft

tgross · 2026-03-19T20:46:49Z

While working on #666, I encountered a test flake caused by a race condition in TestRaft_FollowerRemovalNoElection, which is also obscuring the test intent. After we restart the follower, typically the c.ConnectFully call will complete quickly enough that the follower never enters a candidate state, so we're not exercising the intended behavior. But if the c.ConnectFully call takes too long, the follower can miss heartbeats and start an election. This is expected behavior and its request for an election will be rejected. But in that case, the follower will not have settled on the leader by the time we check it and the test will fail when it shouldn't.

Add an extra sleep before we reconnect the cluster to force the restarted follower to become a candidate so that we're exercising the intent of the test, but then wait until the follower shows an event that it's got a leader before continuing to the remaining assertions.

Note that to hit this flake you need to run this test a lot. Ex. with go test -v -failfast -count=30 . -run TestRaft_FollowerRemovalNoElection. The work I'm doing in #666 changes timing slightly, and this caused it to happen maybe 10% of the time instead of 5%.

While working on #666, I encountered a test flake caused by a race condition in `TestRaft_FollowerRemovalNoElection`, which is also obscuring the test intent. After we restart the follower, typically the `c.ConnectFully` call will complete quickly enough that the follower never enters a candidate state, so we're not exercising the intended behavior. But if the `c.ConnectFully` call takes too long, the follower can miss heartbeats and start an election. This is expected behavior and its request for an election will be rejected. But in that case, the follower will not have settled on the leader by the time we check it and the test will fail when it shouldn't. Add an extra sleep before we reconnect the cluster to force the restarted follower to become a candidate so that we're exercising the intent of the test, but then wait until the follower shows an event that it's got a leader before continuing to the remaining assertions.

tgross marked this pull request as ready for review March 19, 2026 20:53

tgross requested review from a team as code owners March 19, 2026 20:53

tgross force-pushed the test-flake-follower-removal-no-election branch from 33f120f to e2cd0a7 Compare March 20, 2026 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix flake in test for restarted followers not causing an election#669

fix flake in test for restarted followers not causing an election#669
tgross wants to merge 1 commit intomainfrom
test-flake-follower-removal-no-election

tgross commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tgross commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tgross commented Mar 19, 2026 •

edited

Loading