Add Circuit Breaker per Subgraph/Endpoint

We should introduce a circuit breaker mechanism per subgraph endpoint to protect the router from cascading failures when a subgraph service starts misbehaving (timeouts, 5xx errors, etc).

Basically, when a subgraph keeps failing or timing out above a certain threshold, the router should trip the circuit for that target - fast-failing requests for a short cooldown period instead of queuing or retrying forever.

Then after cooldown, the circuit goes half-open, sends a few probe requests, and either closes (if healthy) or opens again.

This should be configurable per subgraph and integrated into the router metrics (prom metrics to allow users to observe it) and config system.

Benefits:
- Prevent a single bad subgraph from tanking overall request latency
- Protect connection pools from overload

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Circuit Breaker per Subgraph/Endpoint #489

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Circuit Breaker per Subgraph/Endpoint #489

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions