Skip to content

feat: stable random ports#290

Merged
jason-lynch merged 1 commit intomainfrom
feat/PLAT-417/stable-random-ports
Mar 12, 2026
Merged

feat: stable random ports#290
jason-lynch merged 1 commit intomainfrom
feat/PLAT-417/stable-random-ports

Conversation

@jason-lynch
Copy link
Member

Summary

Adds a service to allocate random ports for instances from a configurable range. These ports are then persisted with the instance spec. The next time the instance spec is evaluated, we copy the ports from the persisted copy of the spec.

The end result is that when we assign a random port to an instance, it will retain that port assignment until it is deleted or the user sets the port to null or a non-zero value in the database/node spec. Since this mechanism does not rely on Docker or the OS to assign ports, it will also work for the SystemD orchestrator.

Testing

You will now be able to run the E2Es against the compose-dev environment.

# in one terminal
make dev-watch

# in another terminal
make test-e2e

You can also create read-replicas in the compose-dev environment by setting port to 0.

You will also notice that randomly assigned ports will now persist across restarts:

# create a database with a random exposed port
cp1-req create-database <<EOF | cp-follow-task
{
  "id": "storefront",
  "spec": {
    "database_name": "storefront",
    "database_users": [
      {
        "username": "admin",
        "password": "password",
        "db_owner": true,
        "attributes": ["SUPERUSER", "LOGIN"]
      }
    ],
    "port": 0,
    "nodes": [
      { "name": "n1", "host_ids": ["host-1"] }
    ]
  }
}
EOF

# once that completes successfully, get the database connection info
cp1-req get-database storefront | jq '.instances[].connection_info'

# Then stop the database instance
cp1-req stop-instance storefront storefront-n1-689qacsi | cp-follow-task

# Then start it again
cp1-req start-instance storefront storefront-n1-689qacsi | cp-follow-task

# Finally, get the connection info and observe that the port is the same as before
cp1-req get-database storefront | jq '.instances[].connection_info'

PLAT-417
PLAT-236

@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

Adds a ports subsystem with bitmap-backed allocators, persistence, and DI wiring; introduces per-instance InstanceSpec persistence and reconciliation that allocates/releases ports; adds RandomPorts config and updates workflows to reconcile specs before resource generation.

Changes

Cohort / File(s) Summary
Configuration & Docs
docs/installation/configuration.md, server/internal/config/config.go
Docs: added ports_service to components list. Config: added RandomPorts (Min/Max), validation, defaulting, and integration into Config.
Ports Core & Tests
server/internal/ports/ports.go, server/internal/ports/ports_test.go
New PortRange bitmap allocator with Allocate/AllocateNext/Release/Snapshot/Restore and comprehensive unit tests covering edge cases and restoration.
Ports Service & Storage
server/internal/ports/service.go, server/internal/ports/store.go, server/internal/ports/provide.go
New ports Service (host-scoped allocators, OS availability checks, persistent store, concurrency/backoff) plus etcd-backed Store and DI Provide wiring.
Database: InstanceSpec Persistence & Port Integration
server/internal/database/instance_spec_store.go, server/internal/database/store.go, server/internal/database/spec.go, server/internal/database/service.go, server/internal/database/provide.go
New InstanceSpec etcd store and Store wiring. Database Service gains Config and ports.Service fields; adds ReconcileInstanceSpec/DeleteInstanceSpec, port allocation/release logic, reconcilePort/CopySettingsFrom helpers, and provider updated to inject ports service.
CLI / DI Registration
server/cmd/root.go
Registered ports provider in root initialization (import and Provide call).
Workflows
server/internal/workflows/activities/get_instance_resources.go, server/internal/workflows/activities/get_restore_resources.go
Activities now call DatabaseService.ReconcileInstanceSpec and use the reconciled spec (and cloned reconciled spec for restore) before generating resources.

Poem

🐇 I nibble on bits in a port-filled heap,
Etcd keeps secrets I quietly keep.
Reconcile whispers, allocate with care,
Release what’s unused, leave none in the air.
Hop, hop, the rabbit guards each tiny pair.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: stable random ports' clearly and concisely summarizes the main change: adding stable port allocation for instances from a configurable range. It follows Conventional Commits conventions and directly relates to the changeset.
Description check ✅ Passed The PR description includes a clear summary explaining the feature, comprehensive testing instructions with example commands, and references linked issues (PLAT-417, PLAT-236). While the Checklist section uses unchecked boxes, the required content elements are present and complete.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/PLAT-417/stable-random-ports
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch from db5226b to 674a998 Compare March 9, 2026 22:37
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/stable-random-ports branch 2 times, most recently from aaf891b to 46a8e09 Compare March 10, 2026 13:10
@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch from cf89c8a to 4cfbac6 Compare March 10, 2026 14:27
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/stable-random-ports branch 3 times, most recently from cf18355 to 1c79e0e Compare March 10, 2026 14:57
@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch from 6ac6ac9 to 584efda Compare March 10, 2026 16:39
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/stable-random-ports branch from 1c79e0e to f0c226e Compare March 10, 2026 16:39
@jason-lynch
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
server/internal/ports/service.go (1)

219-235: Consider bounded retries in allocateAvailablePort.

The loop continues until AllocateNext returns ErrFull, but if many ports are bound by external processes, this could result in many iterations before exhaustion. The current design marks externally-bound ports as allocated (line 228 doesn't release them), which is intentional per the comment, but could exhaust the range faster than expected.

This is likely acceptable given typical deployment scenarios, but consider logging a warning when a significant portion of attempts fail the OS check.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/ports/service.go` around lines 219 - 235,
allocateAvailablePort can loop many times if many ports are externally bound;
add a bounded-retry/attempt counter inside allocateAvailablePort that increments
on each failed s.portChecker(port) check and, when it crosses a configurable
threshold, emits a s.logger.Warn message (including port, attempts, and range
identifier) so operators see many OS-check failures; continue to call
r.AllocateNext() and stop only on ErrFull as before, but expose the threshold as
a Service field or constant so it’s adjustable and use the
PortRange.AllocateNext, s.portChecker, and s.logger methods to implement this
warning behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/internal/config/config.go`:
- Around line 202-214: The RandomPorts.validate method currently rejects ranges
where Max == Min but NewPortRange (and ports_test) allow single-port ranges;
update RandomPorts.validate to accept Min==Max by changing the check from "if
r.Max <= r.Min" to only reject when r.Max < r.Min (i.e., ensure the validation
in RandomPorts.validate permits min==max), keeping the other bounds checks
intact so Min >=1 and Max <=65535 remain enforced.

In `@server/internal/database/instance_spec_store.go`:
- Around line 53-55: InstanceSpecStore.Put currently writes StoredInstanceSpec
unconditionally (uses storage.NewPutOp), ignoring
StoredInstanceSpec.Spec.Version, which allows races during reconcile; change Put
to perform conditional writes: when creating new specs use create-if-absent
semantics and when updating use a version-checked compare-and-swap using the
StoredInstanceSpec.Spec.Version (or store a separate version/ETag) via the
storage client conditional/compare API, and add retry logic on
conditional-failure to re-read the latest StoredInstanceSpec, recompute any port
allocation idempotently, and retry the conditional Put; update references to
InstanceSpecStore.Key and storage.NewPutOp accordingly to locate where to wire
the conditional create/update and retry loop.

In `@server/internal/database/service.go`:
- Around line 536-556: The code currently allocates ports via
s.portsSvc.AllocatePort for spec.Port and spec.PatroniPort before calling
s.store.InstanceSpec.Put(...).Exec(ctx), which can leak ports if Put() fails;
modify the logic to track which ports were newly allocated (e.g., record values
returned from s.portsSvc.AllocatePort in a local slice or flags) and if
s.store.InstanceSpec.Put(...).Exec(ctx) returns an error, call the corresponding
release method (e.g., s.portsSvc.ReleasePort(ctx, port)) for each newly
allocated port before returning the error; alternatively (preferred) move
allocation until after a successful conditional write/transaction/CAS or wrap
the Put() in a transaction so ports are only allocated when persistence is
guaranteed — update the code around spec.Port, spec.PatroniPort,
s.portsSvc.AllocatePort, s.portsSvc.ReleasePort, and the StoredInstanceSpec
Put().Exec call to implement this cleanup or transactional allocation.

---

Nitpick comments:
In `@server/internal/ports/service.go`:
- Around line 219-235: allocateAvailablePort can loop many times if many ports
are externally bound; add a bounded-retry/attempt counter inside
allocateAvailablePort that increments on each failed s.portChecker(port) check
and, when it crosses a configurable threshold, emits a s.logger.Warn message
(including port, attempts, and range identifier) so operators see many OS-check
failures; continue to call r.AllocateNext() and stop only on ErrFull as before,
but expose the threshold as a Service field or constant so it’s adjustable and
use the PortRange.AllocateNext, s.portChecker, and s.logger methods to implement
this warning behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f726b821-6a7e-4135-bdfe-96961c2bb9d8

📥 Commits

Reviewing files that changed from the base of the PR and between 584efda and f0c226e.

📒 Files selected for processing (15)
  • docs/installation/configuration.md
  • server/cmd/root.go
  • server/internal/config/config.go
  • server/internal/database/instance_spec_store.go
  • server/internal/database/provide.go
  • server/internal/database/service.go
  • server/internal/database/spec.go
  • server/internal/database/store.go
  • server/internal/ports/ports.go
  • server/internal/ports/ports_test.go
  • server/internal/ports/provide.go
  • server/internal/ports/service.go
  • server/internal/ports/store.go
  • server/internal/workflows/activities/get_instance_resources.go
  • server/internal/workflows/activities/get_restore_resources.go

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/stable-random-ports branch from f0c226e to 13e6ece Compare March 10, 2026 19:42
@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch from 584efda to a145735 Compare March 11, 2026 12:57
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/stable-random-ports branch from 13e6ece to c3bdad2 Compare March 11, 2026 12:58
@jason-lynch jason-lynch force-pushed the feat/PLAT-398/instance-record-management branch from a145735 to f3db136 Compare March 11, 2026 13:06
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/stable-random-ports branch from c3bdad2 to 40aab30 Compare March 11, 2026 13:25
Base automatically changed from feat/PLAT-398/instance-record-management to main March 11, 2026 13:42
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/stable-random-ports branch from 40aab30 to 7a71fcb Compare March 11, 2026 13:43
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/internal/database/service.go (1)

134-175: ⚠️ Potential issue | 🟠 Major

Don't free ports before the delete is durable.

Both delete paths release ports before the corresponding InstanceSpec removal commits. If the later etcd delete/transaction fails — or DeleteDatabase reaches ErrDatabaseNotFound after only finding orphaned instance specs — those ports become reusable while the persisted specs still reference them. Move the allocator update into the same transaction as the spec delete, or release only after the delete has committed successfully.

Also applies to: 579-599

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/service.go` around lines 134 - 175, The code
currently calls releaseInstancePorts(ctx, spec.Spec) before persisting the
spec/db deletes in DeleteDatabase; move port releases so they occur only after
the delete transaction commits (or make allocator updates part of the same
storage transaction if the allocator supports transactional ops). Concretely: in
DeleteDatabase, do not call releaseInstancePorts inside the initial loop over
specs; instead collect the ports/instance IDs from each spec (from the
spec.Spec) and proceed to build ops (s.store.Spec.Delete(...),
s.store.Instance.DeleteByDatabaseID(...),
s.store.InstanceSpec.DeleteByDatabaseID(...),
s.store.InstanceStatus.DeleteByDatabaseID(...)) and call
s.store.Txn(...).Commit(ctx) first, then on successful commit iterate the
collected ports and call releaseInstancePorts for each; also ensure the
early-return ErrDatabaseNotFound path does not release ports. This preserves
durability and prevents freeing ports before deletes are persisted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/internal/database/service.go`:
- Around line 564-574: The current flow persists the new spec before releasing
ports and blindly calls portsSvc.Release for any non-zero Port/PatroniPort, and
releasePreviousSpecPorts uses new.HostID which leaks allocations when moving
hosts; change ReconcileInstanceSpec and releasePreviousSpecPorts to persist
allocation provenance when allocating (mark ports as allocator-owned), only call
portsSvc.Release for ports that were allocated by our allocator (check the saved
provenance on the previous Instance.Spec), and call Release against
previous.HostID (not new.HostID) so the old host's allocations are freed; apply
the same allocator-owned check and previous-host release logic to the other
similar blocks mentioned (around 604-629 and 632-645).

In `@server/internal/ports/service.go`:
- Around line 96-105: The allocator is currently keyed by s.cfg.ClientAddress()
and NewPortRange(min,max) then calling restoreAllocator(ctx, r, name) which
drops the snapshot on Restore failure; this makes allocator identity change when
addresses or ranges change and can reissue in-use ports. Modify the logic to
key/memoize allocators by a stable host identifier (e.g., hostID) instead of
ClientAddress(), and change restoreAllocator (and any code paths around
NewPortRange/restoreAllocator) to migrate or seed existing persisted state into
the new range when ranges/addresses change rather than discarding the snapshot
on Restore failure; ensure functions/variables referenced include
s.cfg.ClientAddress(), s.cfg.RandomPorts.Min/Max, NewPortRange, restoreAllocator
and the hostID source so state is loaded/migrated under the stable host key and
only reset when explicitly requested.

---

Outside diff comments:
In `@server/internal/database/service.go`:
- Around line 134-175: The code currently calls releaseInstancePorts(ctx,
spec.Spec) before persisting the spec/db deletes in DeleteDatabase; move port
releases so they occur only after the delete transaction commits (or make
allocator updates part of the same storage transaction if the allocator supports
transactional ops). Concretely: in DeleteDatabase, do not call
releaseInstancePorts inside the initial loop over specs; instead collect the
ports/instance IDs from each spec (from the spec.Spec) and proceed to build ops
(s.store.Spec.Delete(...), s.store.Instance.DeleteByDatabaseID(...),
s.store.InstanceSpec.DeleteByDatabaseID(...),
s.store.InstanceStatus.DeleteByDatabaseID(...)) and call
s.store.Txn(...).Commit(ctx) first, then on successful commit iterate the
collected ports and call releaseInstancePorts for each; also ensure the
early-return ErrDatabaseNotFound path does not release ports. This preserves
durability and prevents freeing ports before deletes are persisted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 01852eb3-ac93-4c5c-be21-c662b3c81604

📥 Commits

Reviewing files that changed from the base of the PR and between f0c226e and 7a71fcb.

📒 Files selected for processing (15)
  • docs/installation/configuration.md
  • server/cmd/root.go
  • server/internal/config/config.go
  • server/internal/database/instance_spec_store.go
  • server/internal/database/provide.go
  • server/internal/database/service.go
  • server/internal/database/spec.go
  • server/internal/database/store.go
  • server/internal/ports/ports.go
  • server/internal/ports/ports_test.go
  • server/internal/ports/provide.go
  • server/internal/ports/service.go
  • server/internal/ports/store.go
  • server/internal/workflows/activities/get_instance_resources.go
  • server/internal/workflows/activities/get_restore_resources.go
🚧 Files skipped from review as they are similar to previous changes (9)
  • docs/installation/configuration.md
  • server/internal/ports/ports_test.go
  • server/internal/ports/store.go
  • server/internal/workflows/activities/get_instance_resources.go
  • server/internal/ports/ports.go
  • server/internal/database/spec.go
  • server/internal/database/store.go
  • server/internal/ports/provide.go
  • server/internal/workflows/activities/get_restore_resources.go

Comment on lines +590 to +592
if err := s.releaseInstancePorts(ctx, spec.Spec); err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this logic after DeleteByKey?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same logic that I stated above applies here as well: if the release happens after the delete, it cannot be retried. This operation is more likely to fail during the release, so I think it makes more sense to attempt it first.

Comment on lines +141 to +145
for _, spec := range specs {
if err := s.releaseInstancePorts(ctx, spec.Spec); err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be moved after completing the transaction and before returning? what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense at the beginning because of the retry case. We're using the simplest form of deletion (just by key with no constraints) for all database entities, and I would only expect them to fail if Etcd were unhealthy. The release will also fail if Etcd is unhealthy, but it has an additional failure mode when many instances are being created and deleted simultaneously. So, as a whole, this method is more likely to fail during the release than during database entity deletes.

With the release at the top here, the spec still exists in Etcd even if the release fails. So, on retry, we're still able to read the spec from storage and retry the release operation.

If we did this the other way around and the entity deletes happened first, then on retry, the spec would no longer exist, so we wouldn't be able to retry the port release. That would have to happen thousands of times before you would run out of random ports in the default range, but still, since we can prevent it, I think it makes more sense to try the more error-prone operation at the beginning.

return nil
}
if portShouldBeReleased(previous.Port, new.Port) {
err := s.portsSvc.ReleasePortIfDefined(ctx, new.HostID, previous.Port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this previous.HostID or new.HostID?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this looks confusing, so I'll change it.

But just for background about why this works as-is: this is the previous and new spec for one instance, so the host IDs are identical. Host ID is a component of the instance identity, meaning that it's impossible for an instance to change hosts. We made that decision because database instances are stateful - they depend on data that lives on a disk that's attached to a specific host.

Copy link
Member Author

@jason-lynch jason-lynch Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in 30edbed.

}
}
if portShouldBeReleased(previous.PatroniPort, new.PatroniPort) {
err := s.portsSvc.ReleasePortIfDefined(ctx, new.HostID, previous.PatroniPort)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also previous.HostID or new.HostID?

Copy link
Member Author

@jason-lynch jason-lynch Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in 30edbed.

Copy link
Contributor

@tsivaprasad tsivaprasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verification

  1. Created a cluster using the cluster initialization script.
  2. Created a database with port: 0 and verified that a random port was automatically assigned.
  3. Verified the assigned port using the get-database command.
    restish control-plane-local-1 get-database storefront
HTTP/1.1 200 OK
Content-Length: 737
Content-Type: application/json
Date: Thu, 12 Mar 2026 12:11:35 GMT

{
  created_at: "2026-03-12T12:04:28Z"
  id: "storefront"
  instances: [
    {
      connection_info: {
        addresses: ["127.0.0.1"]
        port: 15279
      }
      created_at: "2026-03-12T12:04:36Z"
      host_id: "host-1"
      id: "storefront-n1-689qacsi"
      node_name: "n1"
      postgres: {
        patroni_state: "running"
        role: "primary"
        version: "18.2"
      }
      spock: {
        read_only: "off"
        version: "5.0.5"
      }
      state: "available"
      status_updated_at: "2026-03-12T12:11:35Z"
      updated_at: "2026-03-12T12:10:45Z"
    }
  ]
  spec: {
    database_name: "storefront"
    database_users: [
      {
        attributes: ["SUPERUSER", "LOGIN"]
        db_owner: true
        username: "admin"
      }
    ]
    nodes: [
      {
        host_ids: ["host-1"]
        name: "n1"
      }
    ]
    port: 0
    postgres_version: "18.2"
    spock_version: "5"
  }
  state: "available"
  updated_at: "2026-03-12T12:04:28Z"
}
  1. Stopped the database instance using stop-instance.
  2. Started the instance again using start-instance.
  3. Verified through get-database / list-databases that the same port (15279) was retained after restart.
restish control-plane-local-1 list-databases                                  
HTTP/1.1 200 OK
Content-Length: 523
Content-Type: application/json
Date: Thu, 12 Mar 2026 12:19:19 GMT

{
  databases: [
    {
      created_at: "2026-03-12T12:04:28Z"
      id: "storefront"
      instances: [
        {
          connection_info: {
            addresses: ["127.0.0.1"]
            port: 15279
          }
          created_at: "2026-03-12T12:04:36Z"
          host_id: "host-1"
          id: "storefront-n1-689qacsi"
          node_name: "n1"
          postgres: {
            patroni_state: "running"
            role: "primary"
            version: "18.2"
          }
          spock: {
            read_only: "off"
            version: "5.0.5"
          }
          state: "available"
          status_updated_at: "2026-03-12T12:19:15Z"
          updated_at: "2026-03-12T12:19:05Z"
        }
      ]
      state: "available"
      updated_at: "2026-03-12T12:04:28Z"
    }
  ]
}

Adds a service to allocate random ports for instances from a
configurable range. These ports are then persisted with the instance
spec. The next time the instance spec is evaluated, we copy the ports
from the persisted copy of the spec.

The end result is that when we assign a random port to an instance, it
will retain that port assignment until it is deleted or the user
sets the port to `null` or a non-zero value in the database/node spec.
Since this mechanism does not rely on Docker or the OS to assign ports,
it will also work for the SystemD orchestrator.

PLAT-417
PLAT-236
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/stable-random-ports branch from 7a71fcb to 30edbed Compare March 12, 2026 19:28
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
server/internal/ports/service.go (1)

219-235: Consider a safety limit on OS-unavailable port retries.

allocateAvailablePort loops indefinitely until it finds an OS-available port. If many ports within [min, max] are bound by external processes, this could exhaust the entire range and return ErrFull. However, ports marked as unavailable remain allocated in the bitmap, which could slowly fill the range.

This is likely acceptable given the range size (default 10,000 ports), but worth noting for environments with heavy port usage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/ports/service.go` around lines 219 - 235, The
allocateAvailablePort loop in Service.allocateAvailablePort can spin
indefinitely if many ports are OS-unavailable; add a safety retry limit and
return an error once exceeded. Specifically, track attempts inside
allocateAvailablePort while calling PortRange.AllocateNext and s.portChecker,
and stop after a bounded number (e.g. the range size or a configurable
maxRetries constant), returning a clear error (or ErrFull) instead of looping
forever; keep the existing Debug log when skipping ports and ensure the new
limit uses AllocateNext and s.portChecker as the loop’s break conditions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@server/internal/ports/service.go`:
- Around line 219-235: The allocateAvailablePort loop in
Service.allocateAvailablePort can spin indefinitely if many ports are
OS-unavailable; add a safety retry limit and return an error once exceeded.
Specifically, track attempts inside allocateAvailablePort while calling
PortRange.AllocateNext and s.portChecker, and stop after a bounded number (e.g.
the range size or a configurable maxRetries constant), returning a clear error
(or ErrFull) instead of looping forever; keep the existing Debug log when
skipping ports and ensure the new limit uses AllocateNext and s.portChecker as
the loop’s break conditions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3da02593-82a4-4bbd-8c86-c104585ac17e

📥 Commits

Reviewing files that changed from the base of the PR and between 7a71fcb and 30edbed.

📒 Files selected for processing (15)
  • docs/installation/configuration.md
  • server/cmd/root.go
  • server/internal/config/config.go
  • server/internal/database/instance_spec_store.go
  • server/internal/database/provide.go
  • server/internal/database/service.go
  • server/internal/database/spec.go
  • server/internal/database/store.go
  • server/internal/ports/ports.go
  • server/internal/ports/ports_test.go
  • server/internal/ports/provide.go
  • server/internal/ports/service.go
  • server/internal/ports/store.go
  • server/internal/workflows/activities/get_instance_resources.go
  • server/internal/workflows/activities/get_restore_resources.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • server/internal/ports/store.go
  • server/internal/database/spec.go
  • server/internal/ports/provide.go
  • server/internal/database/store.go
  • docs/installation/configuration.md

@jason-lynch jason-lynch merged commit 8c2f5cf into main Mar 12, 2026
3 checks passed
@jason-lynch jason-lynch deleted the feat/PLAT-417/stable-random-ports branch March 12, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants