Skip to content

refactor: Separate adding/removing cycles from refunding/consuming cycles#9717

Open
dsarlis wants to merge 1 commit intodfinity:masterfrom
dsarlis:dimitris/add-remove-cycles
Open

refactor: Separate adding/removing cycles from refunding/consuming cycles#9717
dsarlis wants to merge 1 commit intodfinity:masterfrom
dsarlis:dimitris/add-remove-cycles

Conversation

@dsarlis
Copy link
Copy Markdown
Contributor

@dsarlis dsarlis commented Apr 2, 2026

This PR refactors how cycles changes to balance and metrics are happening via the SystemState. Specifically, it more clearly separates adding/removing cycles to the balance from refunding/consuming cycles which on top of the balance updates the respective consumed metrics as well.

The idea is to separate in two pairs of APIs that will be responsible for each type of update:

fn add_cycles(amount: Cycles);
fn remove_cycles(amount: Cycles);

to handle balance only changes while

fn refund_cycles<T>(prepayment: CompoundCycles<T>, refund: CompoundCycles<T>);
fn consume_cycles<T>(amount: CompoundCycles<T>);

will handle metrics being updated as well.

This change has a two fold benefit.

On one hand, it allows us to perform direct updates to balance (e.g. when depositing cycles or attaching cycles in calls) without having to construct CompoundCycles<NonConsumed> which was a somewhat hacky way to say "no need to update metrics". In fact, after this change, NonConsumed can be completely removed from CyclesUseCases.

On the other, it allows us to require a prepayment for refunding cycles which would be very helpful in a follow up where a metric which acts as a counter will be introduced and the amount consumed will need to be computed based on prepayment and refund. This way the call sites that don't need or should not deal with prepayments (i.e. ones that need only add_cycles), they don't have to be exposed to any of this.

The meat of the changes are in SystemState while there are some updates in production code and tests to match the refined API. In many cases, it ends up in a simpler version of the code as a bunch of CompoundCycles::new calls are eliminated or the cost schedule does not need to be piped through various calls.

@dsarlis dsarlis requested a review from a team as a code owner April 2, 2026 11:42
@cgundy cgundy added the security-review-passed IDX or InfraSec have concluded it's safe to run CI on the external PR. label Apr 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

@cgundy
Copy link
Copy Markdown
Contributor

cgundy commented Apr 2, 2026

Seeing this error:

Clippy violations found!

To automatically fix many of these, run:

    cargo clippy --fix --locked --all-features --workspace --all-targets --keep-going -- --deny warnings --deny clippy::all --deny clippy::mem_forget --deny clippy::unseparated_literal_suffix --allow clippy::uninlined_format_args

Comment on lines +1851 to +1852
/// Decreases 'cycles_balance' for 'requested_amount' from the canister's
/// main balance.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence reads a bit strange

/// The resource use cases first drain the `reserved_balance` and only after
/// that drain the main `cycles_balance`.
pub fn remove_cycles<T: CyclesUseCaseKind>(&mut self, requested_amount: CompoundCycles<T>) {
pub fn consume_cycles<T: CyclesUseCaseKind>(&mut self, requested_amount: CompoundCycles<T>) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the doc comment here could be updated as well analogously to add_cycles/refund_cycles.

"Non-consumed cycles should not be tracked in the canister metrics."
);

// `prepayment`` must be greater or equal to `refund`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// `prepayment`` must be greater or equal to `refund`.
// `prepayment` must be greater or equal to `refund`.

Comment on lines +1944 to +1945
prepayment: NominalCycles,
refund: NominalCycles,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we distinguish between prepayment and refund here when

  • we require refund to be zero if consuming_cycles is Prepayment;
  • and do not use prepayment if consuming_cycles is Refund?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's because of this:

On the other, it allows us to require a prepayment for refunding cycles which would be very helpful in a follow up where a metric which acts as a counter will be introduced and the amount consumed will need to be computed based on prepayment and refund.

But then I wonder why we don't update the metrics only when we know the refund amount, i.e., why we ever call consume_cycles only to later call refund_cycles instead of calling remove_cycles and later refund_cycles (updating the metrics according to the difference between prepaid and refunded cycles).

Copy link
Copy Markdown
Contributor Author

@dsarlis dsarlis Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and do not use prepayment if consuming_cycles is Refund?

prepayment will be used in a follow up when we have a metric for consumed cycles that acts as a counter, i.e. only gets updated once when you know the actual amount.

The current metrics follow the pattern of updating during the prepayment and then fixing them up based on the refund amount.

In order to update metrics only during refund_cycles you'd have to perform a transition over to another set of metrics that don't follow that pattern. The code prepares for such a metric addition, whether or not you can cleanly convert over is unclear -- basically you can't really enforce that all callbacks for messages that have been sent with the existing pattern will have returned to be able to do a clean transition.

Basically, any outstanding refund has to be accounted for correctly before you can start updating things only during the refund phase. To do that you need to set some point where you know if the metrics have or have not been updated at the time the request was sent because that determines whether you need to subtract the refund (current state) or add the consumed (prepayment - refund) directly (potential future state). It's unclear to me whether such a point can be set definitively in the presence of unbounded wait calls.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether or not you can cleanly convert over is unclear

Very good question! I think you can convert if we just subtract all prepayments once after a subnet upgrade and before we run the first round after the subnet upgrade - then we're immediately in the "new" world, no?

Copy link
Copy Markdown
Contributor Author

@dsarlis dsarlis Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem is that I don't know for sure if we have any lingering responses before this point.

In any case, I would not convert the existing metrics as it carries over some risk and potentially it can make things look weird on dashboards etc if we do such a "reset". I would add the new metrics that behave like counters and make a conversion if the above is either confirmed to not be a problem or if people are ok missing requests like the above.

Note that conversion if you add the new metrics is optional in the case where you want to eliminate the old metrics entirely but that would be something to do after lots of verifications that they behave equivalently to the old (i.e. no cases are missed). You should also consider that no matter whether you remove the old metrics from the code or not, they would anyway be kept at least on some prometheus recording rules to ensure that you can show cycles consumption historically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor refactor security-review-passed IDX or InfraSec have concluded it's safe to run CI on the external PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants