fix: prevent orphaned BlueZ discovery sessions in continuous mode#81
fix: prevent orphaned BlueZ discovery sessions in continuous mode#81KristianP26 wants to merge 12 commits intomainfrom
Conversation
Two changes to handler-node-ble.ts: 1. Add a quiesce delay between stopDiscovery() and destroy() in the finally block. Without this, bluetoothd may not finish session teardown before the D-Bus socket closes, leaving an orphaned discovery that blocks future scans with "Operation already in progress". 2. In Tier 4 btmgmt recovery, destroy the stale D-Bus connection and create a fresh one. The kernel-level adapter reset causes BlueZ to recreate /org/bluez/hci0, invalidating the existing D-Bus proxies. Previously the code reused the stale connection, which always failed. Closes #80
There was a problem hiding this comment.
Pull request overview
Addresses BlueZ discovery sessions getting orphaned in continuous mode by ensuring discovery teardown completes before D-Bus disconnect, and by recreating the D-Bus connection after kernel-level adapter resets so subsequent discovery attempts don’t reuse stale proxies.
Changes:
- Add a mutable
DbusConnectionwrapper so Tier 4 (btmgmt) recovery can replace the D-Bus connection and proxies in-place. - Introduce a post-
stopDiscovery()quiesce delay before destroying the D-Bus connection to prevent orphaned discovery sessions. - Update README license badge URL to force-refresh.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/ble/handler-node-ble.ts | Adds D-Bus connection refresh logic during btmgmt recovery and delays D-Bus destroy after stopping discovery to avoid orphaned BlueZ sessions. |
| README.md | Updates the license badge URL query param (cache bust). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| // D-Bus socket. Without this delay, bluetoothd may not finish session | ||
| // teardown, leaving an orphaned discovery that blocks future scans. | ||
| await sleep(POST_DISCOVERY_QUIESCE_MS); | ||
| conn.destroy(); |
There was a problem hiding this comment.
await sleep(POST_DISCOVERY_QUIESCE_MS) runs even when btAdapter is null (e.g., if defaultAdapter() fails), adding an unconditional 500ms delay to error paths. Consider only sleeping when an adapter exists and/or when stopDiscovery() was actually attempted (or when discovery was started).
| const conn: DbusConnection = { bluetooth: null!, destroy: () => {} }; | ||
| try { | ||
| ({ bluetooth, destroy } = NodeBle.createBluetooth()); | ||
| const created = NodeBle.createBluetooth(); | ||
| conn.bluetooth = created.bluetooth; | ||
| conn.destroy = created.destroy; |
There was a problem hiding this comment.
Same as above: conn uses bluetooth: null! even though a valid bluetooth is always assigned immediately after createBluetooth() succeeds. Initializing conn from the created object would avoid the non-null assertion and reduce the chance of accidental runtime use before assignment.
| } | ||
| destroy(); | ||
| await sleep(POST_DISCOVERY_QUIESCE_MS); | ||
| conn.destroy(); |
There was a problem hiding this comment.
await sleep(POST_DISCOVERY_QUIESCE_MS) in scanDevices runs even when discovery was never started / btAdapter is null, which unnecessarily delays failures and short scans. Consider guarding the sleep behind if (btAdapter) (or a discoveryStarted flag) so only real discovery teardowns pay the quiesce delay.
| const conn: DbusConnection = { bluetooth: null!, destroy: () => {} }; | ||
| try { | ||
| ({ bluetooth, destroy } = NodeBle.createBluetooth()); | ||
| const created = NodeBle.createBluetooth(); | ||
| conn.bluetooth = created.bluetooth; | ||
| conn.destroy = created.destroy; |
There was a problem hiding this comment.
conn is initialized with bluetooth: null!, but NodeBle.createBluetooth() is called immediately after and the function throws on failure. Consider initializing conn from created directly (or using a let conn: DbusConnection assigned in the try) to avoid the non-null assertion and make the initialization safer/clearer.
…sessions (#80) Each scan cycle previously created and destroyed a D-Bus connection. BlueZ tracks discovery sessions per D-Bus client, and destroying the connection while bluetoothd is still cleaning up leaves an orphaned session that blocks all future scans with "Operation already in progress". The fix keeps a single D-Bus connection alive across scan cycles so the same client owns the discovery session. stopDiscovery() always works because the owning client is still alive. The connection is automatically reset if it becomes stale (e.g. after bluetoothd restart or btmgmt reset). scanDevices() (one-shot operations like diagnose/scan) keeps its own short-lived connection to avoid interfering with continuous mode.
…ers (#80) Keep both the D-Bus connection and adapter proxy alive across scan cycles, and leave discovery running between idle timeouts instead of stopping and restarting it every cycle. This minimizes the BlueZ state transitions that trigger the well-known Discovering property desync bug (bluez/bluez#807, bluez/bluer#47) where StartDiscovery returns "Operation already in progress" while StopDiscovery returns "No discovery started". Recovery tiers 5 (rfkill block/unblock) and 6 (systemctl restart bluetooth) provide deeper resets when the adapter-level tiers fail. Docker docs updated to mount /dev/rfkill for RF-level recovery access.
After GATT connect/disconnect on Broadcom adapters (RPi), BlueZ can enter a zombie state where Discovering=true but the HCI controller is no longer scanning. The persistent D-Bus connection survives but discovery never finds new devices. Reset the D-Bus connection and adapter proxy after every GATT operation (successful or failed). This guarantees a clean discovery state for the next scan cycle. For idle cycles with no connection, discovery is kept running to avoid the start/stop desync bug.
) D-Bus reset alone is not enough to recover from the zombie discovery state on Broadcom adapters. After 2-3 GATT cycles, bluetoothd keeps reporting Discovering=true while the HCI controller no longer runs LE scan, so a fresh D-Bus client + fresh startDiscovery() succeed on paper but never sees any advertisements. btmgmt power off/on hits the controller directly and clears the zombie, restoring scans for the next cycle. Only CAP_NET_ADMIN is needed so it works regardless of container USER. Also: - docker-compose.example.yml maps /dev/rfkill so Tier 5 recovery is available to existing users without re-reading the docs. - Troubleshooting page gains a dedicated BlueZ zombie section with the NOBLE_DRIVER=stoprocent escape hatch for users whose BlueZ keeps getting stuck despite the recovery tiers. Refs bluez/bluez#807, bluez/bluer#47.
…scovery # Conflicts: # CHANGELOG.md # docs/changelog.md
Summary
Fixes #80: BlueZ discovery sessions become orphaned in continuous mode, causing persistent "Operation already in progress" errors that survive container restarts.
Root cause: Each scan cycle creates and destroys a D-Bus connection. When
destroy()closes the socket immediately afterstopDiscovery(), BlueZ may not finish session teardown, leaving an orphaned discovery session owned by a dead client. No new client can stop it, and all recovery tiers fail.Two targeted fixes in
handler-node-ble.ts:Quiesce delay before destroy: add a 500ms delay between
stopDiscovery()anddestroy()in thefinallyblock, giving BlueZ time to clean up the session before the D-Bus socket closes. This prevents orphaned sessions in normal continuous mode operation.Fresh D-Bus connection in Tier 4 btmgmt recovery: after the kernel-level adapter reset, destroy the stale D-Bus connection and create a completely new one. Previously the code reused the invalidated connection, which always failed with "Operation already in progress" because the old proxies pointed to a destroyed
/org/bluez/hci0object.Test plan
npx tsc --noEmitpassesnpm run lintpassesnpm testpasses (1097/1097)typecheck-and-teston Node 20/22/24)