v8.13.0 by Cabecinha84 · Pull Request #1738 · RunOnFlux/flux

Cabecinha84 · 2026-05-25T10:29:21Z

v8.13.0

A major release focused on architectural overhaul of app state propagation, hash synchronization, node lifecycle management, and startup
orchestration. ~100 commits, +23.7k / -3k lines across 100+ files.

Highlights

App state event log — Replaces the dual-collection fluxapprunningbroadcasts + zelappslocation model with a single append-only
appstateevents log as source of truth (apprunning, sigterm, appremoved, evicted, ipchanged). zelappslocation is now a materialized cache derived
from the event log via aggregation, eliminating the orphaned-entry class of bugs where the two collections drifted out of sync. TTL switched from
mutating signed broadcastedAt to operational expireAt field.

Node confirmation service — New service with three-level status tracking (isConfirmed / canSendMessages / isDaemonStale). Outbound signed
messages, peering, and hash sync are gated on confirmation status. Daemon staleness >125min triggers app removal; >320min flips confirmation off.
Replaces 326k/day log-spam loop on expired nodes.

Hash sync rewrite — Multi-peer targeted requests (3 peers/round with poll-until-settled, event-driven 4s settle window), bulk threshold lowered
to 500, exponential backoff (0/50min/4h/21h/4d/17d/35d → permanent after ~1yr), ephemeral peer connections to deterministic node list as fallback,
and a fast bootstrap path via daemon address index (getaddresstxids + batch getrawtransaction) that cuts initial explorer sync from ~9.5h to
~4min.

FluxOS-managed container startup — Single ownership model replacing the split where Docker auto-started on powercut and FluxOS managed on clean
shutdown. Container restart policy default → no; FluxOS now owns all startup decisions. Boot context (heartbeat with machineBootId, shutdown
reason on SIGTERM) drives reconciliation: FluxOS restart skips recovery, expired locations trigger immediate removal, 5-min sync timeout removes
apps.

Orchestrator state machine — Formalized states (INITIALIZING / SYNCING / RESYNCING / READY / DEGRADED) with deterministic transitions. Boot path
gates: daemonReady → confirmed → dbReady → bootContainerStateSettled. Block-driven hash retry scheduling replaces the 30h reconstruct-tied
cycle. Peer loss during SYNCING/READY transitions to DEGRADED.

Signed sync requests — Binary frame extended with requestTimestamp + pubkey + signature (0x20-0x23 opcodes). Handlers verify identity
before opening MongoDB cursors, preventing unauthenticated peers from triggering expensive server-side work.

Performance

processMessages: batch existence checks via single $in per 2000-message chunk, eliminated duplicate verify/read passes, batch insertMany +
bulkWrite. ~58k individual ops → ~29 batch ops on bulk sync.
Removed unindexed zelAppSpecifications full-collection scan (legacy Zel→Flux rebrand, 0 results) — saved ~22 min per full hash sync.
appLocationFromEvents view: optimized aggregation (~2900ms → ~26ms for targeted queries) with name filter pushed into facet sub-pipelines.
Reconstruct audit: single bulkWrite + updateMany aggregation replacing 58k+ individual updateOne calls.
Eliminated 2-min blind daemon-wait at startup via waitForDaemonRpc.

Bug fixes

5 hash-sync signature verification edge cases: v7 marketplace team support address swap, enterprise v8 usersToExtend on non-ArcaneOS, missing
prevSpec decryption in processMessages, owner-change race (height-gated <2M for legacy network behavior).
Zombie apps: updateAppSpecifications split into insert (upsert) + update (no upsert) so the cache-update path can't resurrect
cancel/expire-deleted entries. Reconstruct cycle now invalidates hash sync via hashesReconstructed event so newly-eligible hashes get retried.
prevSpecsMap uses height-aware lookup for re-registered apps (was returning newest-by-name, picking wrong owner across registration cycles).
Sigterm event TTL extended to 125min so it outlives the apprunning events it suppresses (was 7min — apps reappeared after sigterm TTL'd).
messageNotFound block threshold corrected for 30s post-PON blocks (* 12 → * 48).
Dead peer detection uses ws.terminate() instead of ws.close() (~33s → ~4s).
Daemon info poll: setInterval → self-scheduling setTimeout to prevent concurrent RPCs.

Architecture / refactors

Broke circular dependencies: TTL constants moved from messageStore → appConstants; serialiseAndSignFluxBroadcast extracted to
fluxBroadcastHelper; deleteLoginPhrase moved from serviceHelper → idService.
appSyncEvents event bus replaces mutable module state setters (setOnSyncComplete, EventEmitter inheritance, ad-hoc thunks).
fluxEventBus publishes confirmation:changed, daemon:unreachable/recovered, orchestrator:stateChanged,
peers:thresholdReached/belowThreshold, boot:settled.
AsyncGate utility unifies the mixed resolver-array / EventEmitter awaitable patterns (waitForDaemonReady, waitForDbReady,
waitForBootComplete, waitForConfirmationStatus).
Block processor: eliminated self-referential setTimeout recursion, split into waitForDaemonSync / pollForNewBlocks / recoverAndRestart.
stoppedAppsRecovery → appStartupManager (manageAppsOnBoot / monitorAndRecoverApps); container health monitoring extracted to
containerHealthMonitor.
Narrowed module interfaces: AppSyncOrchestrator no longer receives full peerManager; appSpawner imports appInstaller/appUninstaller
directly.

Testing infrastructure

New test-infra/ directory: dockerized 16-node test network with daemon stub, external HTTP stub, per-node config generation, single-node and
full-network compose files.
7 new integration test suites covering orchestrator state machine transitions, boot manager decision tree, spawner gate conditions, confirmation
service windows, compound failures, and boundary conditions (53 tests).
explorer:ready / orchestrator:started / spawner:paused/resumed/blocked SSE events for deterministic test synchronization (no more
timing-based sleeps).
WS ping/pong intervals configurable via wsPingIntervalMs / wsMaxMissedPongs (2s/2 in test config for fast dead-peer detection).

Config

~25 timing constants / thresholds / intervals extracted from production code into config.fluxapps with ?? fallback defaults. New:
maxAppsPerNode: 200 enforced by spawner and storeAppRunningMessage.

Test plan

Bootstrap a fresh node from scratch — verify hash sync completes via daemon address index (~4min target) without ~9.5h block-by-block fallback
Run 16-node test-infra docker-compose network and confirm all 7 new integration suites pass
Verify FluxOS-restart boot path skips app recovery (preserves running containers across systemctl restart fluxos)
Verify unclean-shutdown / powercut boot path correctly reconciles via FluxOS rather than Docker auto-start
Confirm event log + materialized zelappslocation view stay in sync across gossip + sigterm + eviction
Validate node confirmation gate: unconfirmed nodes don't peer, don't send signed messages, but still receive passive gossip
Stress test: induce peer drop during SYNCING/RESYNCING and confirm DEGRADED transition + recovery to READY
Verify zombie-app recovery: simulate stale messageNotFound flags on upgrade and confirm cancel/expire messages are fetched

Three changes to eliminate orphaned entries between collections: 1. break → continue in storeAppRunningMessage loop: for v2 messages with multiple apps, skip apps that already have current data but keep processing the rest. Previously broke out of the entire loop. 2. storeAppRunningMessage returns { stored, rebroadcast } instead of true/false. The gossip handler only calls storeSignedAppRunningBroadcast when stored is true, ensuring both collections accept or reject together. 3. Remove redundant 5-minute gossip validity check from storeSignedAppRunningBroadcast — it's now gated on the location store's acceptance, eliminating the timing edge where one store accepts at the boundary and the other rejects milliseconds later. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The sigterm handler was mutating broadcastedAt on location records to force 7-minute TTL expiry. This broke the data contract — broadcastedAt is derived from signed data and should never change. Stale gossip could also overwrite the sigterm by passing the "is newer" check against the fake broadcastedAt value. Switch all 6 ephemeral collections to expireAt-based TTL (expireAt:0). expireAt is operational metadata we control, not part of the signed payload. Sigterm now sets expireAt = now + 7min on both locations and signed broadcasts without touching broadcastedAt. Also: split gossip validity (5min) from record expiry into named constants, add missing expireAt to error stores, fix empty-apps v2 handler to clean up signed broadcasts with broadcastedAt guard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

nodeStatusMonitor and storeAppRemovedMessage deleted from zelappslocation without touching fluxapprunningbroadcasts, leaving orphaned signed broadcasts (~44 per 20-minute monitor cycle). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- storeAppRemovedMessage: $addToSet excludedApps on v2 broadcast docs so the derived view skips removed apps without mutating signed data - storeSignedAppRunningBroadcast + batch sync: $unset excludedApps when a newer broadcast upserts (clears stale exclusions) - appLocationFromBroadcasts: filter out excluded apps after v2 unwind - reindexGlobalAppsLocation: also drop running broadcasts collection - explorer rescan: also drop running + installing broadcasts - Export handleMissingMasterSlaveContainer from stoppedAppsRecovery - Fix all 10 CI test failures, add excludedApps tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Single `appstateevents` collection replaces `fluxapprunningbroadcasts` as the source of truth. Five event types (apprunning, sigterm, appremoved, evicted) with dedupKey-based upserts and $cond timestamp guards. `zelappslocation` stays populated as materialized cache. - storeAppStateEvent() dispatcher with APP_STATE_EVENT_TYPES enum - storeBatchAppRunningEvents() for sync receiver - Gossip handler writes event unconditionally, then materializes location - Sigterm/appremoved/evicted all append events instead of mutating - Sync sender/receiver stream from event log - Remove storeSignedAppRunningBroadcast, excludedApps, gossip gating - 99 tests passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The view now filters appremoved, sigterm, and evicted events, excludes stale v1 broadcasts superseded by newer v2, and correctly handles expired shutdown events. Verified against charlie live data (0 diff). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The event store was accepting gossip up to 125min old (RUNNING_EXPIRY_MS) instead of 5min (GOSSIP_VALIDITY_MS). Only the batch sync path should accept older messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

nodeStatusMonitor deletes locations immediately on eviction, but the view was giving evicted IPs the same 7-minute grace period as sigterm. Eviction should be immediate — the monitor already verified the node is gone. Also extend eviction TTL to match apprunning (125min) so the eviction event outlives the apprunning events it suppresses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

storeSignedAppRunningBroadcast no longer exists — stub storeAppStateEvent instead. Sigterm handler now calls updateInDatabase once (location expiry only) not twice, and storeAppStateEvent needs stubbing to prevent throw. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix undefined appsRunningBroadcasts in apiServer.js sigterm handler, add storeAppStateEvent(SIGTERM) call for own shutdown - Escape regex in appLocationFromBroadcasts to prevent injection - Cap sync response batch size at 2500 in all 4 handlers - Add IPCHANGED event type with view remapping so IP changes are reflected in the event log view - Await all storeAppStateEvent calls (was fire-and-forget) - Use ?? instead of || for config fallbacks in orchestrator - Optimise appLocationFromBroadcasts pipeline: $arrayToObject/$getField for O(1) lookups instead of $filter scans (2900ms → 118ms), push name filter into facet sub-pipelines (2666ms → 26ms for targeted) - Standardise $gt (not $gte) for "only if newer" guards - Add {createdAt: 1} index for sync sender evicted event queries - Hash sync failure recovery: retry 3x with 5-min gap, block timer fallback if retries exhausted, background 20-min recheck on blockReceived for missing hashes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tests cover: retry on failure, block timer fallback when retries exhausted, readiness via block timer when hash sync never completes, and DB rebuild failure not blocking the state machine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract streamBatchedSync helper from 3 nearly identical respondWith* functions. Rename MIN_SYNC_PEERS to MIN_SYNC_COMPLETIONS for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

$getField with dynamic field references requires MongoDB 7.2+ (SERVER-74371). CI runs 7.0. Replaced $arrayToObject/$getField O(1) maps with $filter/$first lookups against small arrays. Structural optimization preserved: shutdown/v1 filtering at IP level before unwinding. Estimated ~200-300ms at full scale vs 118ms with $getField vs 2900ms with the original post-unwind approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- handleAppRunningEvent: reject empty-apps v2 when no prior events exist for that IP (matches location store behavior independently) - handleNodeSigtermMessage: check event log for app events instead of zelappslocation, so sigterm handling works without locations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename to reflect event log architecture (broadcasts no longer exist). Change signature from positional appname to options object { appname, ip } to support IP filtering. Sigterm handler now uses the full view derivation to check for apps instead of a naive event log findOne. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Stores the time each node received/processed the event, alongside the original broadcastedAt from the source node. The delta reveals gossip propagation latency and helps diagnose messages that arrive near the 5-minute validity boundary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Gossip path sets receivedAt on insert. Batch sync path preserves the sender's receivedAt so the original gossip reception time is retained across sync. Enables propagation latency diagnostics on installing and install error broadcasts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Sigterm events had 7-min TTL matching the grace period, but apprunning events have 125-min TTL. After the sigterm TTL'd away, apps reappeared in the view with nothing to suppress them. Same race as the evicted TTL bug. Fix: sigterm event expireAt uses RUNNING_EXPIRY_MS (125 min) so the document outlives every apprunning it suppresses. The 7-min grace period is computed from eventAt in the view pipeline, not from expireAt. Export SIGTERM_EXPIRY_MS and use it in fluxCommunication.js and apiServer.js instead of hardcoded 420*1000. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The previous hash sync sent fluxapprequest to a single random peer per attempt, with a fixed 30s wait that couldn't cover the 75s response time for 500 hashes (150ms per hash on the responder). It also broke out on zero progress and reused the same peers. New algorithm: - Bulk threshold lowered from 1000 to 500 (matching fluxapprequest v2 cap) - Targeted path sends to 3 peers per round with poll-until-settled - Timeout proportional to hash count (count × 150ms + 5s buffer) - Settle detection: exits early when no new responses for 4s - Tracks tried peers — never repeats across rounds - Continues through all rounds regardless of per-round progress - Excludes deterministic peers (same-provider neighbors) - Bulk path aggregates responses from all peers instead of picking largest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Moves the broadcast signing logic out of fluxCommunicationMessagesSender into utils/fluxBroadcastHelper. This breaks the circular dependency that prevented appHashSyncService from sending signed messages to peers (messageStore → messageVerifier → fluxCommunicationMessagesSender). appHashSyncService now uses fluxBroadcastHelper directly to sign and send fluxapprequest messages via peer.send(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The cycle messageVerifier → registryManager → messageStore → messageVerifier caused messageVerifier exports to be empty at load time, breaking checkAppMessageExistence for gossip message handling. Root cause: registryManager imported SIGTERM_EXPIRY_MS from messageStore, which created a circular require chain during module initialization. Fix: Move all TTL/expiry constants (GOSSIP_VALIDITY_MS, RUNNING_EXPIRY_MS, INSTALLING_EXPIRY_MS, INSTALLING_ERRORS_EXPIRY_MS, SIGTERM_EXPIRY_MS, EVICTED_EXPIRY_MS) from messageStore to appConstants. Update all consumers to import from appConstants instead. Also extracts serialiseAndSignFluxBroadcast into utils/fluxBroadcastHelper to cleanly separate broadcast signing from peer routing logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The ephemeral sync receiver stored appremoved/sigterm/evicted events to the event log but didn't apply their location side-effects. This caused syncing nodes to have stale locations that the sender had already deleted. - appremoved: delete location entry for {ip, appName} - sigterm: update expireAt on all locations for that IP - evicted: delete all locations for that IP Also gates ephemeral sync on network state readiness — the orchestrator now requires both peer threshold AND node list populated before firing sync requests. Prevents verification failures from unloaded node list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

apiServer.handleSigterm used the old { ip, broadcastedAt, envelope } format and referenced messageStore.SIGTERM_EXPIRY_MS which was moved to appConstants. Updated to pass { message, envelope } so the full signed payload is stored for sync re-verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sigterm, appremoved, and ipchanged event handlers were stripping type and version fields from the stored data. When these events were synced to another node, re-verification failed because the signature was computed over the original full message, not the stripped version. Now stores the complete message object as data so envelope + data can be reconstructed for verification during sync. Also updates all callers to pass { message, envelope } instead of individual fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add TTL constants to appConstants stub (moved from messageStore) - Update sigterm test to use { message, envelope } format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

processMessages now checks permanent message existence in chunks of 2000 using a single $in query instead of individual findOne per message. Existing hashes are batch-marked as message:true via bulkWrite. Only genuinely new messages go through the sequential storeAppTemporaryMessage + checkAndRequestApp path. For the common case (most messages already exist), this reduces ~58k individual DB reads + writes to ~29 batch operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The update path in checkAndRequestApp did two queries per message to find the latest permanent message for an app name: 1. find({appSpecifications.name}) — loaded all docs, iterated in JS 2. find({zelAppSpecifications.name}) — full collection scan (no index, 0 results — legacy field from Zel→Flux rebrand, never populated) Combined cost: ~48ms per message on 35k-doc collection. For 58k messages during bulk hash sync, this added ~22 minutes of pure waste. Fix: - Remove zelAppSpecifications query entirely (dead code) - Replace find-all + JS iterate with findOne using sort:{height:-1} which leverages the existing {appSpecifications.name:1, height:-1} compound index Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replaces the storeAppTemporaryMessage + checkAndRequestApp per-message flow with a single-pass verify-and-batch-insert approach: - Skips temp message storage entirely (was write + immediate read-back) - Eliminates 3 duplicate DB reads per message (existence checks done twice, getPreviousAppSpecifications done twice) - Eliminates duplicate signature verification - Pre-loads previous app specs for update messages per chunk (one $in query replaces N individual find-all queries) - Batch inserts permanent messages via insertMany - Batch marks hashes via bulkWrite - Keeps: hash verification, signature verification, app spec validation, price validation, name conflict checks for registers Also removes dead zelAppSpecifications query in messageVerifier checkAndRequestApp (unindexed full collection scan, 0 results). Replaces find-all + JS iterate with indexed findOne for update path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

validatePrice called appPricePerMonth (async) without await, causing price comparisons against Promise objects. Also restores specificationFormatter for consistent spec formatting before signature verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Registrations verified within a chunk are added to the prevSpecsMap so that updates later in the same chunk can find their previous specs without a DB round-trip. Eliminates the 30% failure rate where updates couldn't find registrations from the same chunk. The map is pre-loaded from DB per chunk (for cross-chunk lookups) and grown as registrations are verified. Memory bounded by unique app names per chunk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds nodeConfigOverrides option to createTestEnv — a map of node index to config that merges on top of the global configOverrides. This allows setting different config on specific nodes, e.g. appSyncMinCompletions=3 only on the joining node without affecting source nodes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Only one joining node is needed. Also set appSyncPeerThreshold=3 so the peer threshold fires after 3 peers connect, matching the appSyncMinCompletions=3 requirement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Health check timeout (5s) exceeded interval (3s), causing Docker's health state machine to produce spurious "unhealthy" on container restart. Reduced timeout to 2s across all container health checks. Docker's CloseMonitorChannel sets health status to "unhealthy" during monitor teardown (moby/daemon/container/health.go:80). On restart, HealthCheckWaitStrategy sees this transient state and destroys the container. Replaced restartNode to swap in an HTTP-polling wait strategy that bypasses Docker's health state machine entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bulk permanent message fetch now partitions missing hashes across peers and streams in parallel via Promise.allSettled, instead of sequential single-peer streaming. Each stream maintains its own 500-message backpressure — peak memory is ~1500 messages vs 500 previously. Targeted fetch and ephemeral rounds now chunk hashes into groups of 500 before calling broadcastHashRequest, fixing a latent bug where >500 hashes would exceed the fluxapprequest v2 message cap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Parallel bulk fetch caused ~10-25% failure rate per batch because update messages couldn't find predecessor specs processed on other streams. Reverted to sequential streaming which maintains height ordering across all messages. Kept the broadcastHashRequest chunking at 500 for targeted fetch rounds and ephemeral rounds (latent bug fix). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The first checkAndNotifyPeersOfRunningApps call was triggered by peer threshold, before appStartupManager finished reconciling containers. This caused the broadcast to report 0 apps because Docker containers hadn't been started yet. The next broadcast wouldn't fire for an hour (peerNotifyIntervalMs). Gate the first broadcast behind waitForBootContainerStateSettled() so it runs after reconciliation completes and Docker state is accurate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The first broadcast was racing with appStartupManager and reporting 0 apps. This test verifies the app:running SSE event includes the reconciled app after a simulated reboot, catching the race if the broadcast gate on boot:settled is removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

If the HTTP poll times out, log a warning instead of throwing. Throwing triggers testcontainers' waitForContainer error handler which destroys the container, making the failure undiagnosable. The test's own assertions will catch the actual problem. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

appHashSyncService.js: messageStore, globalState appStartupManager.js: decryptEnterpriseApps, appUsesGSyncthingMode serviceManager.js: hashSyncIntervalMs, peerNotifyIntervalMs, locationTtlS, installingTtlS, installErrorTtlS, removalSpacingMs (dead — old interval logic moved to orchestrator) nodeStatusMonitor.js: fluxEventBus messageVerifier.js: scannedHeightCollection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Prevents a re-processed registration from overwriting a newer update spec. Mirrors the existing guard in updateAppSpecifications. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Documents the known divergence between secure and non-secure nodes for enterprise usersToExtend updates, and the planned resolution via Arcane attestations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The return value of apps.filter() was discarded, causing already-resolved apps to be re-requested via checkAndRequestMultipleApps. Idempotent but wasteful. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chunk.toString() can corrupt multi-byte UTF-8 characters split across chunk boundaries. StringDecoder buffers incomplete characters across writes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pre-existing test fixture private key, not introduced by this PR but file was modified. Added to GitGuardian ignored_paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The broadcast gate change requires the globalState stub to provide waitForBootContainerStateSettled, otherwise the broadcast promise never resolves and the test fails. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace .catch(() => {}) with warnings that include the network name and component. Silent swallowing masked resource leaks that caused intermittent failures in later suites. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: event-driven app state sync with event log

An app owner can link an app to other apps by embedding a token in the app description text: networkWith:[appA,appB] (brackets required, quotes optional, key case-insensitive, comma separated). This is purely node-local behaviour — no app specification field, no validation change, no network consensus impact. When the token is present: - Before install/redeploy, the node verifies every named app is installed locally and owned by the same owner; otherwise the operation fails. - Each of the app's component containers is attached to the private docker network of every linked app (fluxDockerNetwork_<linked>), so it can reach that app's components by docker DNS name flux<component>_<linkedApp>, as if both apps were a single app. - When a linked-to app is (re)deployed, any locally installed app that is networked with it is reconnected to its network. New module appNetworkLinker.js holds the parser, the install gate, and the forward/reverse network wiring. The gate and forward wiring run in installApplicationHard/installApplicationSoft (the only callers of appDockerCreate), so every container-creation path is covered, including direct callers that bypass registerAppLocally (container health recovery and legacy v<=3 redeploys). Reverse wiring runs in registerAppLocally and softRegisterAppLocally; a boot-time reconcile sweep re-applies all links. dockerService gains an idempotent appDockerNetworkConnect helper. Adds tests/unit/appNetworkLinker.test.js (parser, gate, wiring, reconcile) and appDockerNetworkConnect coverage in dockerService.test.js.

- extract APP_NAME_REGEX (v8+) and APP_NAME_REGEX_LEGACY (v<=7 / components) into appConstants; consume from appValidator and appNetworkLinker - move getAppContainerNames / getAppContainerObjects into dockerService; anchor the multi-component match to ^(?:flux|zel)[a-zA-Z0-9]+_<app>$ and escape regex metacharacters in the app name; refactor getNextAvailableIPForApp to use the same helper - rewrite appDockerNetworkConnect to inspect the container's NetworkSettings.Networks first and skip the connect when already attached; drop the blanket 403 catch (overloaded by docker) in favour of a narrow already-exists message match as a TOCTOU race fallback - update affected unit tests

When a SEND component is being installed in an app whose own compose has no LOG=COLLECT component, walk every app it is networkWith-linked to and ship to the first linked app that exposes a collector. Reachability is provided by the existing networkWith wiring (sender's container is already attached to the linked app's private docker network). Enterprise linked apps whose compose is blanked in the local DB and cannot be decrypted on this node are skipped — the SEND container falls back to json-file logging with a warning. Same fallback applies if the collector container is not reachable at install time. - new appNetworkLinker.findLinkedAppLogCollector(fullAppSpecs) that resolves the linked app + component name (handles the legacy enviromentParameters typo too) - appDockerCreate calls it as a fallback after the existing in-compose collector lookup, only for SEND components

feat: app-to-app network linking via networkWith description token

gitguardian · 2026-05-25T10:30:17Z

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
32907135	Triggered	Generic Private Key	`83c22a1`	test-infra/fixtures/registry-tls/server-key.pem	View secret
10071586	Triggered	Generic High Entropy Secret	`0da0c94`	tests/unit/fluxCommunicationMessagesSender.test.js	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secrets safely. Learn here the best practices.
Revoke and rotate these secrets.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

alihm

Ack

MorningLightMountain713

Ack

MorningLightMountain713 and others added 30 commits May 14, 2026 13:38

refactor: deduplicate sync responders, rename MIN_SYNC_COMPLETIONS

d2bd383

Extract streamBatchedSync helper from 3 nearly identical respondWith* functions. Rename MIN_SYNC_PEERS to MIN_SYNC_COMPLETIONS for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: update messageStore tests for new event format and constants

415ee3a

- Add TTL constants to appConstants stub (moved from messageStore) - Update sigterm test to use { message, envelope } format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MorningLightMountain713 and others added 22 commits May 20, 2026 20:58

fix: add height-downgrade guard to insertAppSpecifications

6034bc6

Prevents a re-processed registration from overwriting a newer update spec. Mirrors the existing guard in updateAppSpecifications. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use filter result in insertAndRequestAppHashes

ba70bd7

The return value of apps.filter() was discarded, causing already-resolved apps to be re-requested via checkAndRequestMultipleApps. Idempotent but wasteful. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use StringDecoder for stream chunk decoding in bulk hash fetch

0da58d6

chunk.toString() can corrupt multi-byte UTF-8 characters split across chunk boundaries. StringDecoder buffers incomplete characters across writes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: whitelist test key in fluxCommunicationMessagesSender.test.js

2560a30

Pre-existing test fixture private key, not introduced by this PR but file was modified. Added to GitGuardian ignored_paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #1726 from RunOnFlux/feature/event-log-running-state

9ec8911

feat: event-driven app state sync with event log

Merge pull request #1736 from RunOnFlux/dependson

0df5a64

feat: app-to-app network linking via networkWith description token

bump version

837c220

Cabecinha84 requested review from MorningLightMountain713, TheTrunk, XK4MiLX and alihm May 25, 2026 11:00

alihm approved these changes May 25, 2026

View reviewed changes

MorningLightMountain713 approved these changes May 25, 2026

View reviewed changes

Cabecinha84 merged commit 3a48aa1 into master May 26, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v8.13.0#1738

v8.13.0#1738
Cabecinha84 merged 401 commits into
masterfrom
development

Cabecinha84 commented May 25, 2026 •

edited

Loading

Uh oh!

gitguardian Bot commented May 25, 2026

Uh oh!

alihm left a comment

Uh oh!

MorningLightMountain713 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Cabecinha84 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

v8.13.0

Highlights

Performance

Bug fixes

Architecture / refactors

Testing infrastructure

Config

Test plan

Uh oh!

gitguardian Bot commented May 25, 2026

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Uh oh!

alihm left a comment

Choose a reason for hiding this comment

Uh oh!

MorningLightMountain713 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cabecinha84 commented May 25, 2026 •

edited

Loading