Skip to content

feat: SOAP faults, SCPD capabilities, IGD v2, security hardening#10

Open
MorningLightMountain713 wants to merge 37 commits into
RunOnFlux:mainfrom
MorningLightMountain713:feature/soap-faults-and-new-actions
Open

feat: SOAP faults, SCPD capabilities, IGD v2, security hardening#10
MorningLightMountain713 wants to merge 37 commits into
RunOnFlux:mainfrom
MorningLightMountain713:feature/soap-faults-and-new-actions

Conversation

@MorningLightMountain713

@MorningLightMountain713 MorningLightMountain713 commented Apr 5, 2026

Copy link
Copy Markdown

Summary

Major rewrite of nat-upnp:

  • SOAP fault handling — parse and surface UPnP error codes instead of swallowing failures
  • SCPD capability detection — query the device's service description to know which actions are supported before calling them
  • IGD v2 support — discover and use WANIPConnection:2 when available
  • Security hardening — validate SSDP Location headers, limit XML parsing depth, timeout on local address resolution, reject pending requests on close
  • Bug fixes — unhandled exception in SSDP handler, getMappings iteration errors, socket lifecycle issues, race conditions
  • New API surfacegetMapping() for O(1) lookup, UpnpInfo.getAll() convenience method, device info/capabilities via lazy getters
  • Test suite — 400+ unit tests against 13 real router fixtures
  • CI/CD — GitHub Actions for Node 20/22/24, publish via OIDC trusted publishing (no NPM_TOKEN)
  • Housekeeping — removed committed build artifacts from git, added .gitignore, added BlueOak 1.0.0 license (retaining original MIT), rewrote README

Router compatibility

Every SOAP action is tested against real XML responses captured from each router. Fixtures cover root description parsing, SCPD capability detection, all core SOAP actions, fault handling, and edge cases.

Router Manufacturer IGD Version Actions TTL Leases
OPNsense FreeBSD v1 11 Temporary ✓
pfSense 2.7 FreeBSD v1 10 Temporary ✓
pfSense 2.8 FreeBSD v1 10 Temporary ✓
ASUS RT-AX55 ASUSTeK v1 11 Temporary ✓
NEC SH621A1 NEC v1 9 Temporary ✓
Sagemcom Livebox Sagemcom v1 (v2 actions) 20+ Temporary ✓
Sagemcom F5685 Sagemcom v1 11 Temporary ✓
Nokia IGD v2 Nokia v2 17+ Temporary ✓
MikroTik RouterOS MikroTik v1 11 Permanent only (725)
Freebox Server Freebox v1 11 Temporary ✓
Linux IGD Linux UPnP IGD v1 11 Temporary ✓
Sercomm FG824CD Sercomm v2 14+ Temporary ✓
Technicolor MediaAccess Technicolor v2 14+ Temporary ✓

Additionally, integration tests (npm run flux-test) have been run against 400+ live Flux nodes with diverse router hardware.

Pre-publish setup required

Before the first npm publish from CI after merging, npm trusted publishing must be configured for the @runonflux/nat-upnp package. This is a one-time, ~2 minute setup:

  1. Go to https://www.npmjs.com → package settings for @runonflux/nat-upnp → "Publishing access"
  2. Add a trusted publisher:
    • Repository: RunOnFlux/nat-upnp
    • Workflow: ci.yml
    • Environment: (leave blank)
  3. That's it — the CI workflow already uses --provenance and OIDC id-token: write

Without this, the publish step will fail with a 403 because there's no NPM_TOKEN secret — the workflow uses OIDC instead.

Test plan

  • npm run test — 400+ unit tests pass against all 13 router fixtures
  • npm run flux-test — integration tests pass against live routers on the Flux network
  • Verify CI passes on this PR (Node 20/22/24)
  • After merge: configure npm trusted publishing, then create a GitHub release to trigger publish

🤖 Generated with Claude Code

David White and others added 30 commits January 8, 2024 14:00
This could be better implemented using the GetListOfPortMappings
api. However, it does require a second parsing of the returned
CDATA, and the returned object is of a different format.

For miniupnpd, the current implementation will return an error
with the text `ArrayIndexInvalid` - so on the first iteration, we
can use this to determine if there are no records available, instead
of failing on the next test and returning an error - when there isn't.

If there is another upnp implementation being used, it will fall back
to the prior behavior.
This is handy for a user to be able to cache the router url. Sometimes,
multicast will work for the first 300 seconds until the join times
out, in this case - the user can cache the URL, and if it fails when
using SSDP, it can set the url and try again without multicast.
…y hardening

Library rewrite based on survey of 400+ routers across the Flux network (13 router
models with full XML fixtures captured from real devices).

SOAP fault handling:
- Detect faults in both HTTP 500 and HTTP 200 responses
- Structured UpnpError class with code, description, action
- Single extractFaultInfo/throwSoapFault utility (no duplication)

Device info and capabilities:
- Parse rootDesc.xml into GatewayDevice (manufacturer, model, daemon info)
- Parse SCPD actionList to discover supported actions
- ServiceCapabilities with convenience booleans (supportsAddAnyPortMapping, etc.)
- Lazy loading with promise caching — no redundant fetches

IGD v2 actions (capability-gated):
- AddAnyPortMapping (router assigns port if taken)
- DeletePortMappingRange (bulk delete)
- GetListOfPortMappings (single-call range query)
- Actions check SCPD before executing; throw if unsupported or unknown

New v1 actions:
- GetSpecificPortMappingEntry — O(1) mapping lookup by port
- GetStatusInfo — WAN connection status and uptime for reboot detection

Security:
- XXE protection (processEntities: false on XMLParser)
- XML escaping on SOAP argument values (prevents injection)
- Response size limits (2MB cap via axios maxContentLength)
- getMappings iteration capped at 10,000 (prevents infinite loop from malicious router)

Networking:
- HTTP Agent with keepAlive: false — prevents Node.js 19+ socket hang up errors
  against miniupnpd (which always responds with Connection: close)
- Single SSDP socket bound to 0.0.0.0 (was N sockets for N interfaces)
- Local address resolved via UDP connect (zero-packet kernel route query) —
  the standard technique used by miniupnpc, Python, Go, Docker, Kubernetes

Bug fixes:
- normalizeOptions defaults internal port to match external (fixes Sagemcom 402)
- parseMapping handles missing NewProtocol gracefully
- parseDescription filters non-object entries from service list
- Service preference order: WANIPConnection:2 > :1 > WANPPPConnection:1

Tests:
- 252 unit tests across 13 router fixtures (170 XML files)
- Fixtures from: OPNsense, pfSense 2.7/2.8, ASUS RT-AX55, NEC SH621A1,
  Sagemcom Livebox, Sagemcom F5685, Nokia IGD v2, MikroTik, Freebox,
  Linux IGD, Sercomm GPON, Technicolor
- Integration test suite updated for new async UpnpInfo API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add GitHub Actions workflow: build + test on push (Node 18/20/22),
  publish to npm on GitHub release with provenance
- Remove build/ from git tracking (built by CI and npm prepare)
- Add build/ to .gitignore
- Check in package-lock.json for reproducible installs
- Update all dependencies to latest:
  - axios ^1.6.5 (latest 1.x)
  - fast-xml-parser ^5.5.9 (major bump from 4.x, fixes CVEs)
  - typescript ^6.0.2 (major bump from 5.x)
  - @types/node ^25.5.0
- Update tsconfig for TypeScript 6 compatibility (target es2020, types: ["node"])
- Add prepare script so npm install from git triggers build
- Add types field for TypeScript consumers
- Package name set to @megachips/nat-upnp for testing before PR to @RunOnFlux
- Zero npm audit vulnerabilities

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- actions/checkout@v6, actions/setup-node@v6 (fixes Node.js 20 deprecation warnings)
- Test matrix: Node 20, 22, 24 (drop 18, add 24)
- Publish on Node 24

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
npm version errors when version unchanged. Add --allow-same-version flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Complete README rewrite documenting all new APIs
- References @runonflux/nat-upnp as upstream
- BlueOak Model License 1.0.0 (modern permissive, MIT-compatible)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security:
- Add HTTP timeout (10s) and redirect limit (2) to all axios requests
- Validate SSDP Location header (reject non-http URLs)
- Validate Device constructor URL
- Guard against empty/IPv6 hostname in resolveLocalAddress
- Type-safe axios error handling (isAxiosErrorWithData guard)

Race conditions:
- SSDP socket only stored after bind succeeds (not before)
- Concurrent getGateway() calls share one pending promise
- All lazy-cached promises clear on error to allow retry

Types:
- RawResponse simplified to Record<string, unknown> (was incorrectly typed)
- Mapping, StatusInfo, GatewayDevice, ServiceCapabilities use readonly
- catch(err: unknown) instead of catch(err: any)
- extractFaultInfo typed with Record<string, unknown>

Code quality:
- Remove dead getService() method from IDevice interface
- Error messages include URLs for production debugging
- Remove listener before setting ended flag (SSDP cleanup ordering)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runtime hardening from Node.js expert review:

- resolveLocalAddress: add 5s timeout with proper settled guard to prevent
  socket leak on unreachable IPs
- Client.close(): set closed flag, getGateway() throws if called after close
- Prevents callers from hanging forever when close() is called during
  gateway discovery

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
getMappings() previously swallowed ALL errors silently, returning partial
results with no indication of incompleteness. Now:
- End-of-list errors (713, 714) break cleanly as before
- First-iteration errors (empty list) break cleanly as before
- Mid-iteration errors (network failure, timeout) THROW so the caller
  knows they have incomplete data, not an empty router

This prevents a dangerous scenario where a network hiccup at iteration 5
of 25 causes the caller to act on 5 mappings thinking that's all of them.

Also:
- SSDP socket calls unref() after bind so forgetting close() doesn't
  prevent the Node process from exiting. The socket stays functional
  but won't keep the event loop alive on its own.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SSDP socket management:
- Create fresh Ssdp instance per gateway discovery instead of holding
  one for the Client lifetime. Socket is closed automatically when
  discovery completes (success or failure). No socket leak even if
  the caller never calls close().
- Removed unref() hack — proper cleanup is better than masking leaks.
- Client.close() now just sets the closed flag (no SSDP to clean up).

Convenience method:
- UpnpInfo.getAll() resolves device, capabilities, and localAddress
  in parallel and returns a ResolvedGatewayInfo object. For callers
  who don't need lazy loading:
    const info = await client.getGateway();
    const { device, capabilities, localAddress } = await info.getAll();

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MorningLightMountain713 and others added 7 commits March 29, 2026 10:25
- Document getAll() convenience method
- Explain SSDP socket auto-cleanup (no leak on forgotten close)
- Add Resource Management section
- Document getMappings() mid-iteration error behavior
- Update Security section with timeout, redirect, and validation details

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add missing error codes 713, 728, 729 to table
- Clarify getMapping() can throw UpnpError, not just return null
- Document numberOfPorts parameter on getMappingRange
- Document default TTL (1800s / 30 minutes) for port mappings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… logic

Bug 1: discoverGateway device event handler had no try-catch. If
Device constructor threw (e.g., malformed SSDP Location header), the
exception was unhandled, crashing the Node process. The promise never
settled and the SSDP socket leaked. Now wrapped in try-catch with
proper reject().

Bug 2: getMappings first-iteration error handling was wrong. If the
router returned error 501 (Action Failed) on the first iteration, the
code silently returned [] instead of throwing. This conflated
"unsupported action" with "empty mapping list." Now only 713/714
(end-of-list) cause a break; all other errors throw regardless of
iteration index.

Both found during final pre-production review round.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SSDP: require Location header to be present AND start with "http"
  (was allowing undefined/empty locations through, which would crash
  Device constructor)
- parseDescription: cap recursion at 10 levels to prevent stack
  overflow from malicious/broken router XML with deeply nested devices

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The original nat-upnp was MIT-licensed and its terms require the notice
be preserved in derivative works.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant