Skip to content

Add Circuit Breaker per Subgraph/Endpoint #489

@kamilkisiela

Description

@kamilkisiela

We should introduce a circuit breaker mechanism per subgraph endpoint to protect the router from cascading failures when a subgraph service starts misbehaving (timeouts, 5xx errors, etc).

Basically, when a subgraph keeps failing or timing out above a certain threshold, the router should trip the circuit for that target - fast-failing requests for a short cooldown period instead of queuing or retrying forever.

Then after cooldown, the circuit goes half-open, sends a few probe requests, and either closes (if healthy) or opens again.

This should be configurable per subgraph and integrated into the router metrics (prom metrics to allow users to observe it) and config system.

Benefits:

  • Prevent a single bad subgraph from tanking overall request latency
  • Protect connection pools from overload

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions