Model Life Cycle Management
Using RocketML, your organization can manage the end-to-end lifecycle of a machine learning model. An end-to-end life cycle typically includes four stages:
Building and registering a model
Model as a REST API on a single pod for staging/testing workloads
Model as a REST API on multiple pods with horizontal autoscaling policies for production/client workloads
Infrastructure monitoring of the model
As shown in the figure below, a) Decision Scientists (DS) will be responsible for building the model and deploying the model as a REST API for staging and testing purposes, and b) Infrastructure Administrators or DS managers will be responsible for promoting a staging model to production and monitoring the model. In the remainder of the document, we will explain these phases in more detail.
Stage 1: Building and registering a model
In this stage, DS users will build pre-processing steps, like filtering columns or creating new columns using custom functions, etc., and fraud probability models using open-source Python libraries like LightGBM, XGBoost, etc. Once the pre-processing steps and models are built, they will register the model along with request/response body schemas i.e., column names and data types in a database using mlflow Python APIs. Images below show the RocketML’s user interface where DS users can track their experiments with different hyperparameters, metrics, and artifacts. You can find basic to advanced tutorials no building and registering models here.
Stage 2: Model as a REST API on a single pod for staging/testing workloads
DS users can deploy models from a run in the experiment if it contains a model that is logged using mlflow APIs. By clicking on “Convert to Model”, a model is deployed on a container cluster and the model’s state will switch to Starting and then to ON. All the models created from the project's experiments will be listed in the Models tab. Models that are in the ON state will have a REST API endpoint listed for integration into business applications. The following screenshots show the steps involved in deploying a model as a REST API by a DS user.
Once the model is in the ON state, the REST API endpoint can be used for testing purposes. In addition, specific details like model path, compute configuration, docker container logs, and the signature are also available on the UI (screenshots below).
Model Signature
In the below example, input schema and output schema were defined by the DS user while building and registering a model. These signatures are not predetermined and a process/procedure has to be established by the team after discussing with the respective product or application stakeholders. In order to provide maximum flexibility, input column names/data types and output column names/data types are highly configurable using Python scripts used for registering the model. More details on model signature can be found here.
Stage 3: Model as a REST API on multiple pods with horizontal autoscaling policies for production/client workloads
All staging models built by all the DS users are available for DS managers and infrastructure administrators to be deployed for production. In the screenshot below, we show the administrator view of the platform. Within the model management tab, admins can view a list of models, their respective versions, DS owner, end point, status, and other fields.
By clicking on the deploy to production button for any of the staging models, the admin can deploy another standalone REST API for that model. This model has its own life cycle, i.e, model artifacts (pickle files, conda YAML files, requirements files) are copied to a location that is only accessible by the admins. Admin can choose the memory, number of CPUs, min/max pods, and autoscaling policies for REST API deployment on multiple pods.
All the production models can be viewed and managed from the production tab under model management (snapshot below). These production models will have a different REST API endpoint. A single staging model can have multiple production model REST APIs using different CPUs, memory, and autoscaling policies. This will allow maximum flexibility of model REST APIs for different customers. For example, based on your requirements, you can have a single model deployed for a single customer, multiple models deployed for a single customer, or a single model deployed for multiple customers.
Stage 4: Monitoring infrastructure used for model deployment
Once the production models are deployed, it is extremely important to monitor the infrastructure on which the models are running for identifying and mitigating potential issues. Model deployment infrastructure is continuously monitored using industry-standard tools - Prometheus, Grafana, and Jaeger. Below is a snapshot of infrastructure monitoring using Grafana.
Last updated
Was this helpful?