Skip to content

Refactor outbound session lifecycle with disconnect-notice latch#9

Merged
henrikbjorn merged 4 commits intomasterfrom
outbound-session-lifecycle
Mar 20, 2026
Merged

Refactor outbound session lifecycle with disconnect-notice latch#9
henrikbjorn merged 4 commits intomasterfrom
outbound-session-lifecycle

Conversation

@henrikbjorn
Copy link
Copy Markdown
Member

@henrikbjorn henrikbjorn commented Mar 20, 2026

Why

The previous architecture had the read loop as the main fiber and run_session as a child task. When session_initiated returned, the session task was dead but the connection was still alive — events would arrive but nobody owned the lifecycle. This made event-driven patterns fragile: who calls disconnect, when, and from which event?

FreeSWITCH sends text/disconnect-notice after all events have been delivered (when linger is active). This is the protocol-level "session is done" signal. The library should use it to own the full session lifecycle rather than leaving it to user code.

What changed

  • Server owns the connection lifecycle instead of the listener. The read loop uses a countdown latch that breaks when both CHANNEL_HANGUP_COMPLETE and text/disconnect-notice arrive, ensuring all events are delivered when linger is active.
  • connection_closed hook on Base/Outbound closes queues from the read task's ensure block. This must live on the read fiber — not in handle_session's ensure — because handle_session is blocked on run_session, which is blocked on a queue dequeue. Closing the queue from the same fiber that's waiting on it would deadlock.
  • Server/Client extract handle_session, start_read_loop, read_messages for clear separation: accept/connect builds objects, handle_session orchestrates lifecycle, start_read_loop launches the reader fiber, read_messages handles protocol.
  • ConnectionError raised on nil dequeue (closed queue) so connection drops cleanly unblock send_message and application.
  • Client reconnect loop now rescues ConnectionError so a mid-handshake connection drop doesn't crash the inbound reconnect loop.

Server now owns the connection lifecycle instead of the listener.
The read loop uses a latch that waits for both CHANNEL_HANGUP_COMPLETE
and disconnect-notice before closing, ensuring all events are delivered
when linger is active.

- Add disconnect_notice? predicate to Response
- Add ConnectionError for clean shutdown on connection drop
- Add connection_closed hook (closes queues) to Base and Outbound
- Track event hooks via Async::Barrier instead of bare Async
- Server/Client: extract handle_session, start_read_loop, read_messages
- Queue close in read_task ensure to avoid deadlock with handle_session
- Outbound run_session auto-lingers and rescues connection errors
Client#run only rescued IOError and socket errors, not ConnectionError.
A mid-handshake connection drop would crash the reconnect loop instead
of retrying.
@henrikbjorn henrikbjorn requested a review from c960657 March 20, 2026 10:52
@henrikbjorn henrikbjorn self-assigned this Mar 20, 2026
Without barrier.wait in connection_closed, event hooks could be killed
mid-execution when the read loop latch triggers. Now connection_closed
closes queues (unblocking any hooks stuck on send_message) then waits
for all event hook fibers to finish via the Async::Barrier.
@henrikbjorn henrikbjorn merged commit ca348a4 into master Mar 20, 2026
3 checks passed
@henrikbjorn henrikbjorn deleted the outbound-session-lifecycle branch March 20, 2026 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants