Description:
We've set up local workflows that use the DGX spark as primary inference server and fall back to Scaleway as a backup. We want to be able to stress-test this before we send it out to the users.
Acceptance criteria:
- Automatic stress-test of different types of load on the endpoints
- Measure how often we spillover onto Scaleway
- Telemetry-results for tokens-per-second, e2e latency, time-to-first-token, etc. logged to a safe place.
Technical details:
Optional technical details for context.
Design:
Optional details on design for context.
Description:
We've set up local workflows that use the DGX spark as primary inference server and fall back to Scaleway as a backup. We want to be able to stress-test this before we send it out to the users.
Acceptance criteria:
Technical details:
Optional technical details for context.
Design:
Optional details on design for context.