A minimal working example of a service that can be deployed on the HaaS (Hosting as a Service) platform. It implements a simple arithmetic calculator to demonstrate the required structure, interface contract, and profiling setup.
When you push a commit to a registered GitHub repository, the HaaS pipeline runs automatically:
GitHub Push
↓
Listener — validates the webhook and records the commit
↓
Builder — clones the repo, builds a Docker image and pushes to registry
↓
GPU Orchestrator — creates a serverless endpoint from the image
↓
Profiling — calls the endpoint with profile.json to measure baseline execution time
↓
Service Ready — endpoint is cached and available for inference requests
Your repository only needs to satisfy two requirements for this to work:
- A
Dockerfilethat produces an image runningrunpod_handler.py - A
profile.jsonwith a representative input the profiling step can use
example_service/
├── customer_main.py # Your model logic — the only file you need to change
├── runpod_handler.py # HaaS runtime wrapper — do not modify
├── profile.json # Sample input used by the profiling
├── Dockerfile
└── requirements.txt
The single entry point HaaS expects is a run function:
def run(input_data: dict) -> any:
...| Argument | Type | Description |
|---|---|---|
input_data |
dict |
The "input" field from the RunPod request body |
| return value | any JSON-serialisable value | Wrapped into {"result": <value>} by the handler |
This example implements four arithmetic operations:
# Request
{"input": {"a": 10, "b": 4, "op": "add"}}
# Response
{"result": 14}Supported operations: add, sub, mul, div.
You do not need to touch this file. It:
- Bootstraps the RunPod serverless listener
- Imports and calls
customer_main.run(input_data)on every request - Captures stdout from your code and forwards it to Sentry as structured log entries
- Handles exceptions: returns
{"error": ..., "error_type": ..., "logs": ...}instead of crashing the worker
The GPU Orchestrator's profiling service calls the deployed endpoint once with the contents of profile.json immediately after deployment. The measured execution time is stored and used for capacity planning.
The file must contain a valid request body in RunPod format:
{
"input": { ... }
}Choose an input that is representative of a typical request — not a trivial no-op, but not the slowest possible case either.
- Replace
customer_main.pywith your model logic. Keep therun(input_data)signature. - Update
requirements.txtwith your dependencies. Keeprunpod==1.7.12andsentry-sdk==2.46.0(required for metrics tracking). - Update
profile.jsonwith a representative input for your model. - Keep
runpod_handler.pyandDockerfileunchanged.