-
Notifications
You must be signed in to change notification settings - Fork 220
Description
Hi!
I would like to open a discussion to gather some advices, in case anyone has had similar experiences or has better ideas than mine.
I am currently facing a use case where we need to deploy multiple computer vision models that perform different tasks for different type of products. For example, we will run a different anomaly detection model for each side of a product (front, back, left, right, top, bottom).
The idea is that we will develop an API gateway which will receive the image from the client and some parameters such as the side (e.g. front), and the type of product under inspection. The API gateway will the retrieve a model ID according to the parameters received: for example, if we receive front as a side and product_a as the type of product, we will make a lookup and retrieve the ID of the model that has to be used for that side/product. The gateway will then redirect the inference request to a custom MLServer which will run the inference and return the result to the gateway which, in turn, will then return the result to the client.
I have successfully implemented a simple POC locally, where the API gateway is a FastAPI application, and the model server uses MLServe. Currently, the model server simply loads all the models that it finds locally. However, the problem is that in production we are going to have a lot of such models, which cannot all be hosted and loaded into the same service. The ideal solution would be to have multiple replicas of the model server based on MLServe and distribute the models across these replicas. Of course, this probably has to be done in Kubernetes or similar.
Perhaps I should this question on KServe's github, in case I apologize, however I would like to ask if anyone has ever faced a similar issue. Do you know if there's a way to distribute different models across different instances of an MLServe model server according to some kind of configuration?
Thanks a lot in advance!