November 21, 2023

Handling your Machine Learning Workflows is Now Easier, Thanks to Bitnami-Packaged MLflow

As the use cases of machine learning (ML) are growing by the day, software developers are increasingly turning to the open source software ecosystem for finding efficient ways of handling processes like model training, selection, and evaluation. So to help our customers and users working on building ML-related software applications, we have been adding several new assets to the open source Bitnami Application Catalog and its enterprise edition—VMware Tanzu Application Catalog.

MLflow, an open source platform for managing end-to-end machine learning workflows, is one such recent addition. MLflow helps users keep track of the experiments performed and their results while providing an easy way to reproduce or compare them. It includes components for tracking experiments, packaging code into reproducible runs, and sharing and deploying models. It is composed of two parts, a server where you can track and observe the results of the experiments you run, and one Python module called mlflow to interface to that server.

In this blog post, we will walk through a basic blueprint to help you in the process of integrating the mlflow module, how to obtain the Bitnami-packaged MLflow Helm chart from Bitnami Application Catalog and Tanzu Application Catalog, how to deploy the Helm chart, and finally, how to run some ML experiments to gather metrics.

Assumptions and prerequisites

This guide assumes that you have the following:

  • An up and running Kubernetes cluster
  • kubectl CLI and Helm v3.x package manager installed and configured to work with your Kubernetes clusters. If you need help with these steps you can check our article Learn how to install kubectl and Helm v3.x.
  • A Python environment, which can be your local computer, a Python container, or even a pod. The only requirement is that the exposed service in the Kubernetes cluster is reachable from it.
  • Access to the MLflow GitHub repository to download the Wine Quality example source code.
  • (For Tanzu Application Catalog users only) MLflow Helm chart added to your private catalog

MLflow integration

To integrate your ML experiment with MLflow, you will need to add a few lines to your existing code. To do so, import the mlflow module and indicate which parameters and metrics you want to track. A generic example of this is shown in the snipped code below. Take into account that the experiment name and run names are optional, in case you don’t set these values, the MLflow module will randomly generate them.

The example used in this blog post—Wine Quality—uses scikit-learn, but the mlflow module supports integration with many other ML libraries such as PyTorch or TensorFlow.

mport mlflow

# (optional) Set the experiment name 
mlflow.set_experiment("MyExperiment")

# (optional) set the run name
mlflow.start_run("First run")

# Log parameters (key-value pairs)
mlflow.log_param("num_dimensions", 8)
...
mlflow.log_param("regularization", 0.1)

# Log a metric; metrics can be updated throughout the run
mlflow.log_metric("accuracy", 0.1)
...
mlflow.log_metric("mse", 0.45)

# Log artifacts (output files)
mlflow.log_artifact("roc.png")
...
mlflow.log_artifact("model.pkl")

Deploying the Bitnami package for MLflow Helm chart

Once you have your experiment set up and ready to allow MLflow to gather its data, it is time to get the Bitnami-packaged MLflow and deploy it in your Kubernetes cluster.

The sections below describe the steps to configure the deployment, get and deploy the Bitnami-package MLflow Helm chart, and obtain its external IP address to access the MLflow dashboard.

Setting the credentials

First, we need to create the configuration for the deployment, in this case, we will be setting the user credentials for MLflow. We will be examining other options in the following examples.

$ cat > ml_config.yaml <<EOF
# User MLflow username and password
tracking:
  auth:
    username: mluser
    password: testpwd

# No need to create a job to run the experiment
run:
  enabled: false
EOF

Getting and deploying the Bitnami package for MLflow Helm chart

Deploying the community version of the chart through Bitnami Application Catalog

To deploy the chart in its namespace, run the following commands:

$ kubectl create namespace mlflow
$ helm install mymlflow --namespace mlflow bitnami/mlflow -f ./ml_config.yaml

Deploying the enterprise version of the chart through Tanzu Application Catalog

The following steps describe how to navigate to the Tanzu Application Catalog and get the instructions to deploy MLflow in your cluster. This example shows an MLflow chart built using Ubuntu 22 as the base OS image, but feel free to customize the chart depending on your needs.

  1. Navigate to app-catalog.vmware.com and sign in to your catalog with your VMware account.

    MLflow listing in Tanzu Application Catalog

    Tanzu Application Catalog MLflow listing 

  2. In the My Applications section, search for the MLflow chart and click Details.

    On the next screen, you will find the instructions for deploying the chart on your cluster. Make sure that your cluster is up and running.
  3. Execute kubectl cluster-info, then run the commands you will find in the Consume your Helm chart section

MLflow details page on Tanzu Application Catalog

MLflow details page on Tanzu Application Catalog

After this, the steps for deploying the chart will be the same as the described in the following sections to deploy its community version.

Obtaining the external IP address and logging in to MLflow

Wait for the deployment to complete and check that service mymlflow-mlflow-tracking has assigned an external IP.

$ kubectl get pods,svc --namespace mlflow
NAME                                           READY   STATUS
pod/mymlflow-minio-c86c96564-rvm8v             1/1 Running
pod/mymlflow-minio-provisioning-h5djd         0/1 Completed
pod/mymlflow-mlflow-tracking-b46498b8c-fqz4j   1/1 Running
pod/mymlflow-postgresql-0                     1/1 Running

NAME                           TYPE         CLUSTER-IP EXTERNAL-IP
service/mymlflow-minio         ClusterIP   10.110.230.43   <none>
service/mymlflow-mlflow-tracking   LoadBalancer   10.97.120.239   192.168.49.102
service/mymlflow-postgresql     ClusterIP   10.101.51.172   <none>
service/mymlflow-postgresql-hl ClusterIP   None         <none>

Browse to the external IP address of the service and log in using the credentials you have previously set. You will be able to see the MLflow UI, which will not have any experiments yet.

MLflow UI

MLflow UI
 

Creating an MLflow experiment

As mentioned earlier, this demonstration uses the scikit-learn Wine Quality example from the MLflow repository. This example uses the Wine Quality dataset and Elastic Net (a regression method) to predict wine quality.

$ curl -O https://raw.githubusercontent.com/mlflow/mlflow/master/examples/sklearn_elasticnet_wine/train.py

$ pip install mlflow
Collecting mlflow
 Downloading mlflow-2.7.1-py3-none-any.whl (18.5 MB)
...

With a Python environment

To run the experiment, it is necessary to set some environment variables in our Python environment to point to the exposed MLflow service.

$ export MLFLOW_TRACKING_URI=http://192.168.49.102/

$ export MLFLOW_TRACKING_USERNAME=mluser
$ export MLFLOW_TRACKING_PASSWORD=testpwd

You are now ready to run the code and track the results in MLflow:

# Running with the default parameters: alpha=0.5 l1_ratio=0.5
$ python train.py
Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
 RMSE: 0.793164022927685
 MAE: 0.6271946374319586
 R2: 0.10862644997792636

Without a Python environment

Alternatively, if you don’t have a Python environment, or if you want to run the experiment in an isolated Python environment, you can make use of Bitnami’s MLflow container image to run the training using the following steps instead:

# Get the python code
$ mkdir code
$ curl -o code/train.py https://raw.githubusercontent.com/mlflow/mlflow/master/examples/sklearn_elasticnet_wine/train.py

# Run the experiment
$ docker run -it --rm --name experiment \
  -v $PWD/code:/app \
  -e MLFLOW_TRACKING_URI=http://192.168.49.102/  \
  -e MLFLOW_TRACKING_USERNAME=mluser \
  -e MLFLOW_TRACKING_PASSWORD=testpwd \
  bitnami/mlflow:2.7.1 \
  /app/train.py

Checking the metrics and comparing results

After the first run, some data has been recorded in MLflow.

MLflow first results

First results recorded in MLFlow

It is time to run the experiment several times with different parameters so we can later compare the results of the different runs:

# Running with parameters: alpha=0.3 l1_ratio=0.2
$ python train.py 0.3 0.2
Elasticnet model (alpha=0.300000, l1_ratio=0.200000):
 RMSE: 0.7397486012946922
 MAE: 0.5704931175017443
 R2: 0.22464242411894253
Registered model 'ElasticnetWineModel' already exists. Creating a new version of this model...
2023/10/26 15:40:51 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for mod
el version to finish creation. Model name: ElasticnetWineModel, version 2
Created version '2' of model 'ElasticnetWineModel'.

# or alternatively using the container image:
docker run -it --rm --name experiment \
  -v $PWD/code:/app \
  -e MLFLOW_TRACKING_URI=http://192.168.49.102/  \
  -e MLFLOW_TRACKING_USERNAME=mluser \
  -e MLFLOW_TRACKING_PASSWORD=testpwd \
  bitnami/mlflow:2.7.1 \
  /app/train.py 0.3 0.2

As the model was previously stored, this new run MLflow will create a new version of the model and keep it for us, this way we can come back at any moment to this version, retrieve, and use it.

Let’s run the experiment a couple of more times:

$ python train.py 0.1 0.5
Elasticnet model (alpha=0.100000, l1_ratio=0.500000):
 RMSE: 0.7308996187375898
 MAE: 0.5615486628017713
 R2: 0.2430813606733676
Registered model 'ElasticnetWineModel' already exists. Creating a new version of this model...
2023/10/26 15:41:24 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for mod
el version to finish creation. Model name: ElasticnetWineModel, version 3
Created version '3' of model 'ElasticnetWineModel'.


# Running with parameters: alpha=0.1 l1_ratio=0.5
$ python train.py 0.8 0.8
Elasticnet model (alpha=0.800000, l1_ratio=0.800000):
 RMSE: 0.8330972295348316
 MAE: 0.669813814205792
 R2: 0.016611535037920344
Registered model 'ElasticnetWineModel' already exists. Creating a new version of this model...
2023/10/26 15:42:01 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for mod
el version to finish creation. Model name: ElasticnetWineModel, version 4
Created version '4' of model 'ElasticnetWineModel'.

If we refresh the MLflow view, we can see the new metrics gathered.

Results from all experiments runs stored in MLFlow

Results from all experiments runs stored in MLFlow

Also, in the models tab, we can check all the new versions created.

Registered Models

Registered Models

As there is some data, we can configure the chart to show a comparison between the different runs. For this, click the Configure Chart button and add the parameters and metrics you want to explore. In this example, we have added all the parameters and metrics available.

Parallel Coordinates chart added to dashboard

Parallel Coordinates chart added to dashboard

Any of those experiment runs can be retrieved at any moment from MLflow, so you can tune the code or parameters, run it again, and store the new results in MLflow.

$ mlflow artifacts download -u "mlflow-artifacts:/0/f3e498bf6f3b488d9c754c864afec277/artifacts/model" -d .

$ ls model/
MLmodel  conda.yaml  model.pkl  python_env.yaml  requirements.txt

You can find the URI to download the model in the run tab.

Model detail from experiment

Model detail from experiment

You can also run the experiments in the Kubernetes cluster in a more advanced usage of the chart. This option could be interesting if the runs are long or if you don’t have the needed resources locally. To run this training example in a job, you can use the following ml_config.yaml file when deploying the chart.

$ cat > ml_config.yaml <<EOF
tracking:
  auth:
    username: mluser
    password: testpwd
run:
  enabled: true
  useJob: true
  source:
    type: "git"
    git:
      repository: "https://github.com/mlflow/mlflow.git"
    launchCommand: "python /app/examples/sklearn_elasticnet_wine/train.py"
EOF

This is only the tip of the iceberg of what can be done with MLflow. From here, you can explore how to run and manage your MLflow projects, add tags, or move your models to stage or production, and even serve them and do this all from inside your Python code.

Support and resources

The Bitnami package for MLflow is available in both the community version—through the Bitnami GitHub repository—as well as the enterprise version, Tanzu Application Catalog. Learn more about the differences between these two catalogs in this blog post.

Other ML-related images like PyTorch or TensorFlow can also be found in the Bitnami GitHub or Docker Hub repositories.

To solve the problems you may have with the Bitnami community packages—including deployment support, operational support, and bug fixes—please open an issue in the Bitnami Helm charts or containers GitHub repositories. Also, if you want to contribute to the catalog, feel free to send us a pull request, and the team will check it and guide you in the process for a successful merge.

Check out the MLflow official documentation for detailed indications on how to use the module for these calls in your code or processes.

If you are interested in learning more about the Tanzu Application Catalog in general, check out the product webpageTech Zone pagetechnical documentation, and additional resources. If you would like to get in touch, contact us.

Last, be sure to read the news announced by the VMware Tanzu team at VMware Explore 2023!

Filter Tags

Tanzu VMware Tanzu Application Catalog Blog Announcement Quick-Start Advanced Deploy