Handling your Machine Learning Workflows is Now Easier, Thanks to Bitnami-Packaged MLflow
MLflow, an open source platform for managing end-to-end machine learning workflows, is one such recent addition. MLflow helps users keep track of the experiments performed and their results while providing an easy way to reproduce or compare them. It includes components for tracking experiments, packaging code into reproducible runs, and sharing and deploying models. It is composed of two parts, a server where you can track and observe the results of the experiments you run, and one Python module called mlflow to interface to that server.
In this blog post, we will walk through a basic blueprint to help you in the process of integrating the mlflow module, how to obtain the Bitnami-packaged MLflow Helm chart from Bitnami Application Catalog and Tanzu Application Catalog, how to deploy the Helm chart, and finally, how to run some ML experiments to gather metrics.
Assumptions and prerequisites
This guide assumes that you have the following:
- An up and running Kubernetes cluster
- kubectl CLI and Helm v3.x package manager installed and configured to work with your Kubernetes clusters. If you need help with these steps you can check our article Learn how to install kubectl and Helm v3.x.
- A Python environment, which can be your local computer, a Python container, or even a pod. The only requirement is that the exposed service in the Kubernetes cluster is reachable from it.
- Access to the MLflow GitHub repository to download the Wine Quality example source code.
- (For Tanzu Application Catalog users only) MLflow Helm chart added to your private catalog
MLflow integration
To integrate your ML experiment with MLflow, you will need to add a few lines to your existing code. To do so, import the mlflow module and indicate which parameters and metrics you want to track. A generic example of this is shown in the snipped code below. Take into account that the experiment name and run names are optional, in case you don’t set these values, the MLflow module will randomly generate them.
The example used in this blog post—Wine Quality—uses scikit-learn, but the mlflow module supports integration with many other ML libraries such as PyTorch or TensorFlow.
mport mlflow
# (optional) Set the experiment name
mlflow.set_experiment("MyExperiment")
# (optional) set the run name
mlflow.start_run("First run")
# Log parameters (key-value pairs)
mlflow.log_param("num_dimensions", 8)
...
mlflow.log_param("regularization", 0.1)
# Log a metric; metrics can be updated throughout the run
mlflow.log_metric("accuracy", 0.1)
...
mlflow.log_metric("mse", 0.45)
# Log artifacts (output files)
mlflow.log_artifact("roc.png")
...
mlflow.log_artifact("model.pkl")
Deploying the Bitnami package for MLflow Helm chart
Once you have your experiment set up and ready to allow MLflow to gather its data, it is time to get the Bitnami-packaged MLflow and deploy it in your Kubernetes cluster.
The sections below describe the steps to configure the deployment, get and deploy the Bitnami-package MLflow Helm chart, and obtain its external IP address to access the MLflow dashboard.
Setting the credentials
First, we need to create the configuration for the deployment, in this case, we will be setting the user credentials for MLflow. We will be examining other options in the following examples.
$ cat > ml_config.yaml <<EOF
# User MLflow username and password
tracking:
auth:
username: mluser
password: testpwd
# No need to create a job to run the experiment
run:
enabled: false
EOF
Getting and deploying the Bitnami package for MLflow Helm chart
Deploying the community version of the chart through Bitnami Application Catalog
To deploy the chart in its namespace, run the following commands:
$ kubectl create namespace mlflow
$ helm install mymlflow --namespace mlflow bitnami/mlflow -f ./ml_config.yaml
Deploying the enterprise version of the chart through Tanzu Application Catalog
The following steps describe how to navigate to the Tanzu Application Catalog and get the instructions to deploy MLflow in your cluster. This example shows an MLflow chart built using Ubuntu 22 as the base OS image, but feel free to customize the chart depending on your needs.
-
Navigate to app-catalog.vmware.com and sign in to your catalog with your VMware account.
Tanzu Application Catalog MLflow listing
- In the My Applications section, search for the MLflow chart and click Details.
On the next screen, you will find the instructions for deploying the chart on your cluster. Make sure that your cluster is up and running. - Execute kubectl cluster-info, then run the commands you will find in the Consume your Helm chart section
MLflow details page on Tanzu Application Catalog
After this, the steps for deploying the chart will be the same as the described in the following sections to deploy its community version.
Obtaining the external IP address and logging in to MLflow
Wait for the deployment to complete and check that service mymlflow-mlflow-tracking has assigned an external IP.
$ kubectl get pods,svc --namespace mlflow
NAME READY STATUS
pod/mymlflow-minio-c86c96564-rvm8v 1/1 Running
pod/mymlflow-minio-provisioning-h5djd 0/1 Completed
pod/mymlflow-mlflow-tracking-b46498b8c-fqz4j 1/1 Running
pod/mymlflow-postgresql-0 1/1 Running
NAME TYPE CLUSTER-IP EXTERNAL-IP
service/mymlflow-minio ClusterIP 10.110.230.43 <none>
service/mymlflow-mlflow-tracking LoadBalancer 10.97.120.239 192.168.49.102
service/mymlflow-postgresql ClusterIP 10.101.51.172 <none>
service/mymlflow-postgresql-hl ClusterIP None <none>
Browse to the external IP address of the service and log in using the credentials you have previously set. You will be able to see the MLflow UI, which will not have any experiments yet.
MLflow UI
Creating an MLflow experiment
As mentioned earlier, this demonstration uses the scikit-learn Wine Quality example from the MLflow repository. This example uses the Wine Quality dataset and Elastic Net (a regression method) to predict wine quality.
$ curl -O https://raw.githubusercontent.com/mlflow/mlflow/master/examples/sklearn_elasticnet_wine/train.py
$ pip install mlflow
Collecting mlflow
Downloading mlflow-2.7.1-py3-none-any.whl (18.5 MB)
...
With a Python environment
To run the experiment, it is necessary to set some environment variables in our Python environment to point to the exposed MLflow service.
$ export MLFLOW_TRACKING_URI=http://192.168.49.102/
$ export MLFLOW_TRACKING_USERNAME=mluser
$ export MLFLOW_TRACKING_PASSWORD=testpwd
You are now ready to run the code and track the results in MLflow:
# Running with the default parameters: alpha=0.5 l1_ratio=0.5
$ python train.py
Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
RMSE: 0.793164022927685
MAE: 0.6271946374319586
R2: 0.10862644997792636
Without a Python environment
Alternatively, if you don’t have a Python environment, or if you want to run the experiment in an isolated Python environment, you can make use of Bitnami’s MLflow container image to run the training using the following steps instead:
# Get the python code
$ mkdir code
$ curl -o code/train.py https://raw.githubusercontent.com/mlflow/mlflow/master/examples/sklearn_elasticnet_wine/train.py
# Run the experiment
$ docker run -it --rm --name experiment \
-v $PWD/code:/app \
-e MLFLOW_TRACKING_URI=http://192.168.49.102/ \
-e MLFLOW_TRACKING_USERNAME=mluser \
-e MLFLOW_TRACKING_PASSWORD=testpwd \
bitnami/mlflow:2.7.1 \
/app/train.py
Checking the metrics and comparing results
After the first run, some data has been recorded in MLflow.
First results recorded in MLFlow
It is time to run the experiment several times with different parameters so we can later compare the results of the different runs:
# Running with parameters: alpha=0.3 l1_ratio=0.2
$ python train.py 0.3 0.2
Elasticnet model (alpha=0.300000, l1_ratio=0.200000):
RMSE: 0.7397486012946922
MAE: 0.5704931175017443
R2: 0.22464242411894253
Registered model 'ElasticnetWineModel' already exists. Creating a new version of this model...
2023/10/26 15:40:51 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for mod
el version to finish creation. Model name: ElasticnetWineModel, version 2
Created version '2' of model 'ElasticnetWineModel'.
# or alternatively using the container image:
docker run -it --rm --name experiment \
-v $PWD/code:/app \
-e MLFLOW_TRACKING_URI=http://192.168.49.102/ \
-e MLFLOW_TRACKING_USERNAME=mluser \
-e MLFLOW_TRACKING_PASSWORD=testpwd \
bitnami/mlflow:2.7.1 \
/app/train.py 0.3 0.2
As the model was previously stored, this new run MLflow will create a new version of the model and keep it for us, this way we can come back at any moment to this version, retrieve, and use it.
Let’s run the experiment a couple of more times:
$ python train.py 0.1 0.5
Elasticnet model (alpha=0.100000, l1_ratio=0.500000):
RMSE: 0.7308996187375898
MAE: 0.5615486628017713
R2: 0.2430813606733676
Registered model 'ElasticnetWineModel' already exists. Creating a new version of this model...
2023/10/26 15:41:24 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for mod
el version to finish creation. Model name: ElasticnetWineModel, version 3
Created version '3' of model 'ElasticnetWineModel'.
# Running with parameters: alpha=0.1 l1_ratio=0.5
$ python train.py 0.8 0.8
Elasticnet model (alpha=0.800000, l1_ratio=0.800000):
RMSE: 0.8330972295348316
MAE: 0.669813814205792
R2: 0.016611535037920344
Registered model 'ElasticnetWineModel' already exists. Creating a new version of this model...
2023/10/26 15:42:01 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for mod
el version to finish creation. Model name: ElasticnetWineModel, version 4
Created version '4' of model 'ElasticnetWineModel'.
If we refresh the MLflow view, we can see the new metrics gathered.
Results from all experiments runs stored in MLFlow
Also, in the models tab, we can check all the new versions created.
Registered Models
As there is some data, we can configure the chart to show a comparison between the different runs. For this, click the Configure Chart button and add the parameters and metrics you want to explore. In this example, we have added all the parameters and metrics available.
Parallel Coordinates chart added to dashboard
Any of those experiment runs can be retrieved at any moment from MLflow, so you can tune the code or parameters, run it again, and store the new results in MLflow.
$ mlflow artifacts download -u "mlflow-artifacts:/0/f3e498bf6f3b488d9c754c864afec277/artifacts/model" -d .
$ ls model/
MLmodel conda.yaml model.pkl python_env.yaml requirements.txt
You can find the URI to download the model in the run tab.
Model detail from experiment
You can also run the experiments in the Kubernetes cluster in a more advanced usage of the chart. This option could be interesting if the runs are long or if you don’t have the needed resources locally. To run this training example in a job, you can use the following ml_config.yaml file when deploying the chart.
$ cat > ml_config.yaml <<EOF
tracking:
auth:
username: mluser
password: testpwd
run:
enabled: true
useJob: true
source:
type: "git"
git:
repository: "https://github.com/mlflow/mlflow.git"
launchCommand: "python /app/examples/sklearn_elasticnet_wine/train.py"
EOF
This is only the tip of the iceberg of what can be done with MLflow. From here, you can explore how to run and manage your MLflow projects, add tags, or move your models to stage or production, and even serve them and do this all from inside your Python code.
Support and resources
The Bitnami package for MLflow is available in both the community version—through the Bitnami GitHub repository—as well as the enterprise version, Tanzu Application Catalog. Learn more about the differences between these two catalogs in this blog post.
Other ML-related images like PyTorch or TensorFlow can also be found in the Bitnami GitHub or Docker Hub repositories.
To solve the problems you may have with the Bitnami community packages—including deployment support, operational support, and bug fixes—please open an issue in the Bitnami Helm charts or containers GitHub repositories. Also, if you want to contribute to the catalog, feel free to send us a pull request, and the team will check it and guide you in the process for a successful merge.
Check out the MLflow official documentation for detailed indications on how to use the module for these calls in your code or processes.
If you are interested in learning more about the Tanzu Application Catalog in general, check out the product webpage, Tech Zone page, technical documentation, and additional resources. If you would like to get in touch, contact us.
Last, be sure to read the news announced by the VMware Tanzu team at VMware Explore 2023!