Skip to content

fix: fail fast when amendment blocked instead of degraded state#707

Draft
sublimator wants to merge 5 commits into
devfrom
fail-fast-amendment-blocking
Draft

fix: fail fast when amendment blocked instead of degraded state#707
sublimator wants to merge 5 commits into
devfrom
fail-fast-amendment-blocking

Conversation

@sublimator

@sublimator sublimator commented Mar 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Amendment-blocked nodes currently enter a degraded state: they stop validating
but keep running, serving data from ledgers produced under rules they don't
understand. This can lead to crashes
(#706) when the node encounters
unknown serialized fields in post-amendment ledgers.

This PR replaces the degraded state with a graceful shutdown. Operators get
warnings during the 2-week amendment majority window, and the server exits
cleanly rather than continuing to operate in an unsafe state.

Problem

When an unsupported amendment activates, STObject::set throws
std::runtime_error on unknown fields:

📍 src/libxrpl/protocol/STObject.cpp:249-257

 249         if (fn.isInvalid())
 250         {
 251             JLOG(debugLog().error()) << "Unknown field: field_type=" << type
 252                                      << ", field_name=" << field;
 253             std::stringstream ss;
 254             ss << "Unknown field in Object t=" << type << " f=" << field;
 255 
 256             Throw<std::runtime_error>(ss.str().c_str());
 257         }

Crash path (issue #706)

The JUMP path in switchLastClosedLedger calls processClosedLedger, which
iterates view.txs to compute fee metrics — deserializing every transaction:

📍 src/xrpld/app/misc/detail/TxQ.cpp:109-116

 109     std::vector<FeeLevel64> feeLevels;
 110     auto const txBegin = view.txs.begin();
 111     auto const txEnd = view.txs.end();
 112     auto const size = std::distance(txBegin, txEnd);
 113     feeLevels.reserve(size);
 114     std::for_each(txBegin, txEnd, [&](auto const& tx) {
 115         feeLevels.push_back(getFeeLevelPaid(view, *tx.first));
 116     });

This triggers deserializeTxPlusMetaSTTxSTObject::set → crash.
doAdvance has a try/catch that survives this, but switchLastClosedLedger
did not.

Why fail-fast is safer than the degraded state

The amendment-blocked degraded state was introduced in 2017 but has not been
exercised in practice — binaries on the network tend to support all known
fields. When it is triggered by a genuinely unknown amendment, the behavior
is unsafe.

Post-amendment ledgers are stored before the amendment check

setValidLedger stores new validated ledgers before checking amendment
blocked status. Post-amendment ledgers accumulate and become queryable:

📍 src/xrpld/app/ledger/detail/LedgerMaster.cpp:278-291

 278     mValidLedger.set(l);
 279     mValidLedgerSign = signTime.time_since_epoch().count();
 280     XRPL_ASSERT(
 281         mValidLedgerSeq || !app_.getMaxDisallowedLedger() ||
 282             l->info().seq + max_ledger_difference_ >
 283                 app_.getMaxDisallowedLedger(),
 284         "ripple::LedgerMaster::setValidLedger : valid ledger sequence");
 285     (void)max_ledger_difference_;
 286     mValidLedgerSeq = l->info().seq;
 287 
 288     app_.getOPs().updateLocalTx(*l);
 289     app_.getSHAMapStore().onLedgerClosed(getValidatedLedger());
 290     mLedgerHistory.validatedLedger(l, consensusHash);
 291     app_.getAmendmentTable().doValidatedLedger(l);

Inconsistent RPC behavior

The RPC condition check only blocks RPCs that require network/ledger state:

📍 src/xrpld/rpc/detail/Handler.h:85-89

  85     if (context.app.getOPs().isAmendmentBlocked() &&
  86         (condition_required != NO_CONDITION))
  87     {
  88         return rpcAMENDMENT_BLOCKED;
  89     }

But the majority of RPCs — including all account and ledger queries — are
registered with NO_CONDITION and bypass this check:

📍 src/xrpld/rpc/detail/Handler.cpp:84-95

  84     {"account_info", byRef(&doAccountInfo), Role::USER, NO_CONDITION},
  85     {"account_currencies",
  86      byRef(&doAccountCurrencies),
  87      Role::USER,
  88      NO_CONDITION},
  89     {"account_lines", byRef(&doAccountLines), Role::USER, NO_CONDITION},
  90     {"account_namespace", byRef(&doAccountNamespace), Role::USER, NO_CONDITION},
  91     {"account_channels", byRef(&doAccountChannels), Role::USER, NO_CONDITION},
  92     {"account_nfts", byRef(&doAccountNFTs), Role::USER, NO_CONDITION},
  93     {"account_objects", byRef(&doAccountObjects), Role::USER, NO_CONDITION},
  94     {"account_offers", byRef(&doAccountOffers), Role::USER, NO_CONDITION},
  95     {"account_tx", byRef(&doAccountTxJson), Role::USER, NO_CONDITION},

Along with ledger_data, ledger_entry, ledger_header, book_offers,
gateway_balances, transaction_entry, and many more.

This means an amendment-blocked node gives inconsistent responses — some
RPCs return rpcAMENDMENT_BLOCKED while others serve data from ledgers
produced under rules the server doesn't understand. It also means other
deserialization paths may be exposed to unknown fields:

pubLedgerAcceptedLedger constructor (with active subscribers):

📍 src/xrpld/app/misc/NetworkOPs.cpp:2922-2929

2922     std::shared_ptr<AcceptedLedger> alpAccepted =
2923         app_.getAcceptedLedgerCache().fetch(lpAccepted->info().hash);
2924     if (!alpAccepted)
2925     {
2926         alpAccepted = std::make_shared<AcceptedLedger>(lpAccepted, app_);
2927         app_.getAcceptedLedgerCache().canonicalize_replace_client(
2928             lpAccepted->info().hash, alpAccepted);
2929     }

📍 src/xrpld/app/ledger/AcceptedLedger.cpp:26-48

  26 AcceptedLedger::AcceptedLedger(
  27     std::shared_ptr<ReadView const> const& ledger,
  28     Application& app)
  29     : mLedger(ledger)
  30 {
  31     transactions_.reserve(256);
  32 
  33     auto insertAll = [&](auto const& txns) {
  34         for (auto const& item : txns)
  35             transactions_.emplace_back(std::make_unique<AcceptedLedgerTx>(
  36                 ledger, item.first, item.second));
  37     };
  38 
  39     transactions_.reserve(256);
  40     insertAll(ledger->txs);
  41 
  42     std::sort(
  43         transactions_.begin(),
  44         transactions_.end(),
  45         [](auto const& a, auto const& b) {
  46             return a->getTxnSeq() < b->getTxnSeq();
  47         });
  48 }

LedgerReplay constructor (during ledger replay sync):

📍 src/xrpld/app/ledger/detail/LedgerReplay.cpp:25-36

  25 LedgerReplay::LedgerReplay(
  26     std::shared_ptr<Ledger const> parent,
  27     std::shared_ptr<Ledger const> replay)
  28     : parent_{std::move(parent)}, replay_{std::move(replay)}
  29 {
  30     for (auto const& item : replay_->txMap())
  31     {
  32         auto txPair = replay_->txRead(item.key());  // non-const so can be moved
  33         auto const txIndex = (*txPair.second)[sfTransactionIndex];
  34         orderedTxns_.emplace(txIndex, std::move(txPair.first));
  35     }
  36 }

LedgerHistory::log_one (on ledger mismatch):

📍 src/xrpld/app/ledger/LedgerHistory.cpp:158-178

 158 static void
 159 log_one(
 160     ReadView const& ledger,
 161     uint256 const& tx,
 162     char const* msg,
 163     beast::Journal& j)
 164 {
 165     auto metaData = ledger.txRead(tx).second;
 166 
 167     if (metaData != nullptr)
 168     {
 169         JLOG(j.debug()) << "MISMATCH on TX " << tx << ": " << msg
 170                         << " is missing this transaction:\n"
 171                         << metaData->getJson(JsonOptions::none);
 172     }
 173     else
 174     {
 175         JLOG(j.debug()) << "MISMATCH on TX " << tx << ": " << msg
 176                         << " is missing this transaction.";
 177     }
 178 }

A clean shutdown avoids all of these concerns.

Changes

1. Graceful shutdown on amendment activation

setAmendmentBlocked() now calls signalStop() to initiate a clean shutdown
instead of leaving the server in a degraded CONNECTED mode. The
amendmentBlocked_ flag is still set so RPCs get rpcAMENDMENT_BLOCKED
during the shutdown window. Skipped in standalone mode for test compatibility.
Logs at fatal level (NetworkOPs:FTL) so the reason is unmissable.

📍 src/xrpld/app/misc/NetworkOPs.cpp:1632-1647

1632 void
1633 NetworkOPsImp::setAmendmentBlocked()
1634 {
1635     amendmentBlocked_ = true;
1636     setMode(OperatingMode::CONNECTED);
1637     if (!app_.config().standalone())
1638     {
1639         JLOG(m_journal.fatal())
1640             << "One or more unsupported amendments activated. "
1641                "Shutting down. Upgrade the server to remain "
1642                "compatible with the network.";
1643         app_.signalStop(
1644             "One or more unsupported amendments activated. "
1645             "Server must be upgraded to remain compatible with the network.");
1646     }
1647 }

2. Early shutdown before amendment activation

When an unsupported amendment is within 1 minute of its expected activation
time, shut down early. Since amendments require a 2-week majority period,
operators have had ample warning. The early shutdown avoids the server
encountering post-amendment ledger data it cannot deserialize.

📍 src/xrpld/app/ledger/detail/LedgerMaster.cpp:316-327

 316                 using namespace std::chrono_literals;
 317                 auto const now = app_.timeKeeper().closeTime();
 318                 if (*first > now && (*first - now) <= 1min)
 319                 {
 320                     // Shut down just before the amendment activates to
 321                     // avoid processing ledgers with unknown fields.
 322                     JLOG(m_journal.error())
 323                         << "Unsupported amendment activating imminently "
 324                            "at "
 325                         << to_string(*first) << ". Shutting down.";
 326                     app_.getOPs().setAmendmentBlocked();
 327                 }

3. Try/catch in switchLastClosedLedger

Safety net for the case where the server missed the early shutdown window
(e.g., it was restarted right as the amendment activates). Only swallows the
exception when amendment blocked; rethrows otherwise:

📍 src/xrpld/app/misc/NetworkOPs.cpp:1806-1817

1806     try
1807     {
1808         app_.getTxQ().processClosedLedger(app_, *newLCL, true);
1809     }
1810     catch (std::runtime_error const& e)
1811     {
1812         if (!amendmentBlocked_)
1813             throw;
1814         JLOG(m_journal.error())
1815             << "Failed to process closed ledger: " << e.what();
1816         return;
1817     }

4. Amendment warning visible to all RPC users

The unsupported-majority warning was previously admin-only. Now all RPC
users see it, giving operators the best chance to upgrade during the 2-week
window.

Test plan

  • Existing ripple.server.ServerStatus and ripple.app.AmendmentBlocked
    tests pass (standalone mode skips signalStop)
  • 5-node testnet: enable amendment with unknown fields, verify old node
    shuts down gracefully instead of crashing
  • Verify amendment warning appears in non-admin server_info responses
    during the majority period
  • Verify early shutdown triggers ~1 minute before expected activation
  • Verify NetworkOPs:FTL log line appears on shutdown

@sublimator sublimator changed the title fix: fail fast when amendment blocked instead of zombie state fix: fail fast when amendment blocked instead of degraded state Mar 9, 2026
- signalStop() for graceful shutdown when unsupported amendment activates
- early shutdown ~1 minute before expected activation to avoid race
- try/catch in switchLastClosedLedger to survive unknown field crashes
  during shutdown window
- show amendment warning to all RPC users, not just admin

Fixes: #706
@sublimator sublimator force-pushed the fail-fast-amendment-blocking branch from 4cf2be8 to 1e6cda4 Compare June 19, 2026 02:35
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 8.69565% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.3%. Comparing base (639ea34) to head (1e6cda4).

Files with missing lines Patch % Lines
src/xrpld/app/ledger/detail/LedgerMaster.cpp 0.0% 11 Missing ⚠️
src/xrpld/app/misc/NetworkOPs.cpp 16.7% 10 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             dev    #707     +/-   ##
=======================================
- Coverage   77.3%   77.3%   -0.0%     
=======================================
  Files        837     837             
  Lines      78585   78602     +17     
  Branches   11559   11572     +13     
=======================================
  Hits       60738   60738             
- Misses     17837   17854     +17     
  Partials      10      10             
Files with missing lines Coverage Δ
src/xrpld/app/misc/NetworkOPs.cpp 66.1% <16.7%> (-0.3%) ⬇️
src/xrpld/app/ledger/detail/LedgerMaster.cpp 44.0% <0.0%> (-0.3%) ⬇️

... and 5 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant