February 13, 2023

Secure Package Management on TMC Managed Clusters

Preface Having self service capabilities for cluster provisioning is a great thing for many organizations, but deploying a cluster is not enough. We need the ability to have certain software installed in every cluster, before our end user begins working with it. This could be infra services like an Ingress Controller, External DNS or Cert Manager. It could also be monitoring tools like Prometheus or Grafana, logging tools like FluentBit or Loki, and many times it could be security tooling like Aqua or Prisma. Many organizations due to the complexity of this setup, decide to have some sort of pipeline that provisions clusters, installs software imperatively, and then sends a notification to the requesting user with the details for connecting to this cluster. In this post, we will see how this can be achieved in a declarative approach, without custom automation, using TMC. TMC Continuous Delivery – Initial Release TMC back in July 2022 added a new feature "Continuous Delivery". This feature is in essence a wrapper implementation of FluxCD, which is one the 2 most common GitOps controllers alongside ArgoCD. TMC in the initial release of this feature supported the ability to define raw YAML manifests or Kustomize configurations to be deployed to a cluster from a Git repository. The configuration of what to deploy was done at the cluster level. We also needed to first enable the service on a cluster manually, as it is not deployed by default when a cluster is created or attached to TMC. While this was a great feature, it simply did not solve many issues we still had, one of which is the title and subject of this post. TMC Continuous Delivery – January 31st 2023 On January 31st 2023, VMware released a major update to TMC which included 2 key features that are relevant to this post: Added support for installing a Helm chart from a Git repository Added support for continuous delivery to cluster groups Lets take a look into what each of these features includes. Helm Chart Support FluxCD supports mainly 2 deployment mechanisms, Kustomize and Helm. Kustomize is a very common tool, and the integration in FluxCD allows us to deploy either raw YAML definitions or Kustomize configurations to our clusters. While kustomize is a great solution in some cases, most software we consume from vendors or OSS projects today are packaged as helm charts. FluxCD supports the deployment of helm charts that are sourced from Git repositories, and in this new release of TMC, we now have the ability to configure Helm Releases as well, just like we would Kustomizations. This opens up additional options of what we can easily integrate into deployment and management flows using TMC. While adding in support for Helm is great, just like Kustomize in the past, it requires us to enable the service on a cluster by cluster basis, and the configuration is also at the cluster level only. Cluster Group Continuous Delivery The main feature in my mind that will give huge value to customers from the January 31st release, is this feature. In TMC we have always had the concept of a Cluster Group, which is a grouping of clusters under a single logical entity, that we can apply policies to such as security policies, RBAC and much more. In this release, we now have the ability to enable Continuous Delivery at the Cluster Group level. This means we can now say that any cluster that gets added to a specific cluster group, will automatically have the CD service enabled. We also have the ability to define Kustomization configurations at the Cluster Group level, which will be automatically applied to all clusters in the relevant Cluster Group! This feature opens up some really awesome capabilities. Installing Carvel Packages on all clusters TMC beyond providing FluxCD also supports my favorite package management solution Carvel. Not only does TMC install and provide the APIs and UI for installing Carvel packages, it also configures the "Tanzu Standard" package repository which includes common tooling needed for kubernetes clusters such as Contour, Cert Manager, Harbor, Prometheus, Grafana, FluentBit, External DNS and more. The great thing is, is that these packages are supported by VMware! The difficulty is that just like with the CD service, the package management capabilities are cluster scoped. While this is true, in the end, Carvel package management is all kubernees resources in the end. Recently I have begun playing around with deploying carvel package installations declaratively instead of doing it imperatively via the Tanzu CLI or the TMC GUI, and doing so actually offers us more advanced capabilities and fine grained control, as we can utilize advanced fields in the resource specs, that we cant configure via the Tanzu CLI or via the TMC GUI. What gets create when we install a package via TMC When we create a package installation, 4 or 5 resources are created for us: Service Account – A service account that is used to perform the installation. Cluster Role – A cluster role with Cluster Admin credentials used to allow the installation of the packages resources. Cluster Role Binding – A Cluster Role Binding which binds our new service account, with the new Cluster Role from above. Package Install – The actual package install resource. (optional) Secret – A secret containing the values file we want to use for the installation. How do we do this declaratively The first step, is to create the relevant manifests. In this example we will configure the contour package from the Tanzu Standard Repository. While not needed, in the example bellow i have added annotations to the resources, to match what the Tanzu CLI does, as well as named the resources in the same manner. This just seems like a beneficial thing to do in my mind, but is not a requirement. First is the Service Account YAML: apiVersion: v1 kind: ServiceAccount metadata: annotations: tkg.tanzu.vmware.com/tanzu-package: contour-tkg-packages name: contour-tkg-packages-sa namespace: tkg-system Next is the Cluster Role YAML apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system name: contour-tkg-system-cluster-role rules: - apiGroups: - '*' resources: - '*' verbs: - '*' Next is the Cluster Role Binding apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: annotations: tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system name: contour-tkg-system-cluster-rolebinding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: contour-tkg-system-cluster-role subjects: - kind: ServiceAccount name: contour-tkg-system-sa namespace: tkg-system Next is the Secret with the data values for contour on vSphere with NSX ALB apiVersion: v1 kind: Secret metadata: name: contour-tkg-system-values namespace: tkg-system type: Opaque stringData: contour-data-values.yaml: | infrastructure_provider: vsphere contour: configFileContents: {} useProxyProtocol: false replicas: 2 pspNames: "vmware-system-restricted" logLevel: info envoy: service: type: LoadBalancer annotations: {} nodePorts: http: null https: null externalTrafficPolicy: Cluster disableWait: false hostPorts: enable: false hostNetwork: false terminationGracePeriodSeconds: 300 logLevel: info pspNames: null certificates: duration: 8760h renewBefore: 360h Finally is the Package Install itself apiVersion: packaging.carvel.dev/v1alpha1 kind: PackageInstall metadata: annotations: tkg.tanzu.vmware.com/tanzu-package-ClusterRole: contour-tkg-system-cluster-role tkg.tanzu.vmware.com/tanzu-package-ClusterRoleBinding: contour-tkg-system-cluster-rolebinding tkg.tanzu.vmware.com/tanzu-package-Secret: contour-tkg-system-values tkg.tanzu.vmware.com/tanzu-package-ServiceAccount: contour-tkg-system-sa name: contour namespace: tkg-system spec: packageRef: refName: contour.tanzu.vmware.com versionSelection: constraints: 1.20.2+vmware.1-tkg.1 prereleases: {} serviceAccountName: contour-tkg-system-sa values: - secretRef: name: contour-tkg-system-values But why Cluster Admin??????? Indeed, all Tanzu Package installation performed via TMC or Tanzu CLI, use Cluster Admin permissions. This is because they don’t know what permissions any specific package may or may not need, in order to be installed. While that may be acceptable for some, I wanted to perform least privilege concepts, and that is where I found a great tool called audit2rbac. What is Audit 2 RBAC Audit 2 RBAC is an amazing tool put together by Jordan Liggitt. The tool can auto generate RBAC resource YAMLs for a specific user or service account based on a kubernetes Audit log. Luckily, TKG which is the distribution I am using, enables us to have audit logging enabled on our clusters. Once you have an audit log file, you simply pass the file to audit2rbac and tell the CLI tool, what user or service account you are interested in, and it will create the YAML definitions for the relevant RBAC needs. Using audit2rbac for the Contour Package After installing the contour package in one of my clusters I copied the audit.log file from the Control Plane node in my cluster to my local machine where I have audit2rbac installed. The audit log in TKG is configured to be placed at the path /var/log/kubernetes/audit.log on the Control Plane nodes. The next step was to simply run the following command: audit2rbac -f audit.log --serviceaccount tkg-system:contour-tkg-system-sa \ --generate-annotations tkg.tanzu.vmware.com/tanzu-package=contour-tkg-system \ --generate-labels="" \ --generate-name contour-tkg-system > contour-pkgi-rbac.yaml If we take a look at what we receive as output we can look at the file we just created and see that it created 4 resources: Cluster Role apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system name: contour-tkg-system rules: - apiGroups: - "" resources: - configmaps verbs: - create - get - list - patch - update - watch - apiGroups: - "" resources: - namespaces - nodes - pods - serviceaccounts - services verbs: - get - list - watch - apiGroups: - "" resourceNames: - tanzu-system-ingress resources: - namespaces verbs: - create - apiGroups: - apiextensions.k8s.io resources: - customresourcedefinitions verbs: - create - get - list - patch - update - watch - apiGroups: - apps resources: - daemonsets - deployments - replicasets verbs: - get - list - watch - apiGroups: - cert-manager.io resources: - certificates - issuers verbs: - get - list - watch - apiGroups: - rbac.authorization.k8s.io resources: - clusterrolebindings - clusterroles verbs: - create - get - list - patch - update - watch - apiGroups: - rbac.authorization.k8s.io resources: - rolebindings - roles verbs: - get - list - watch Cluster Role Binding apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: annotations: tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system name: contour-tkg-system roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: contour-tkg-system subjects: - kind: ServiceAccount name: contour-tkg-system-sa namespace: tkg-system Role apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: annotations: tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system name: contour-tkg-system namespace: tanzu-system-ingress rules: - apiGroups: - "" resourceNames: - tanzu-system-ingress resources: - namespaces verbs: - get - patch - update - apiGroups: - "" resources: - serviceaccounts - services verbs: - create - get - patch - update - apiGroups: - apps resourceNames: - envoy resources: - daemonsets verbs: - create - get - patch - update - apiGroups: - apps resourceNames: - contour resources: - deployments verbs: - create - get - patch - update - apiGroups: - cert-manager.io resources: - certificates - issuers verbs: - create - get - patch - update - apiGroups: - rbac.authorization.k8s.io resourceNames: - contour-rolebinding resources: - rolebindings verbs: - create - get - patch - update - apiGroups: - rbac.authorization.k8s.io resourceNames: - contour resources: - roles verbs: - create - get - patch - update Role Binding apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: annotations: tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system name: contour-tkg-system namespace: tanzu-system-ingress roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: contour-tkg-system subjects: - kind: ServiceAccount name: contour-tkg-system-sa namespace: tkg-system The issue While this output is much better then giving cluster admin credentials, this configuration, will require you to pre create the tanzu-system-ingress namespace. as well as it will require some additional permissions. The reason additional permissions are needed in many cases, is that in kubernetes, you cant create a cluster role or role that give higher permissions then what you yourself have. If the package includes RBAC resources, you must add the permissions from them, into your package service accounts RBAC, otheriwse Kapp Controller will fail to reconcile the package installation. There is also a bug in the audit2rbac tool where it sometimes can make incorrect over zealous configurations, which simply cant work. The main change one must make in the manifests is that if we take this snippet for example: - apiGroups: - rbac.authorization.k8s.io resourceNames: - contour-rolebinding resources: - rolebindings verbs: - create - get - patch - update The issue is that the "create" verb does not work in kubernetes RBAC in conjunction with the resourceNames section. the solution to this with the highest level of security is: - apiGroups: - rbac.authorization.k8s.io resources: - rolebindings verbs: - create - apiGroups: - rbac.authorization.k8s.io resourceNames: - contour-rolebinding resources: - rolebindings verbs: - get - patch - update - delete Here we are creating 2 sets of permissions, 1 for creating the resource which cant be scoped down to a specific resources name, and then the rest of the verbs in a seperate rule, where we do scope the permissions to the specific resource names. The reason we need to pre-create the namespace is that as this is a least privilege approach, it has split the RBAC into both cluster wide and namespace specific permissions, and as the installation targets the tanzu-system-ingress namespace, that is where the role and role binding are targeted at. Solving the missing permissions issue The easiest way to solve this requires some scripting and a few CLI tools. the CLI tools I use for this process are: kubectl kubectl split-yaml plugin imgpkg ytt jq The general flow is: set a few environment variables export PKG_NAME=contour.tanzu.vmware.com export PKG_VERSION=1.22.3+vmware.1-tkg.1 export PKG_NS=tkg-system pull down the imgpkg bundle of this package export BUNDLE_URI=$(kubectl get pkg $PKG_NAME.$PKG_VERSION -n $PKG_NS -ojson | \ jq -r .spec.template.spec.fetch[0].imgpkgBundle.image) imgpkg pull -b $BUNDLE_URI -o ./contour-package-bundle cd contour-package-bundle create a file with the values you would supply when applying a package cat <<EOF > my-custom-values.yaml infrastructure_provider: vsphere contour: configFileContents: {} useProxyProtocol: false replicas: 2 pspNames: "vmware-system-restricted" logLevel: info envoy: service: type: LoadBalancer annotations: {} nodePorts: http: null https: null externalTrafficPolicy: Cluster disableWait: false hostPorts: enable: false hostNetwork: false terminationGracePeriodSeconds: 300 logLevel: info pspNames: null certificates: duration: 8760h renewBefore: 360h template out the package and retrieve all RBAC relevant resources ytt -f my-custom-values.yaml -f config/ | kubectl split-yaml -f - cd split-yaml if [ -d "rbac.authorization.k8s.io_v1--ClusterRole" ] then mkdir ../additional-cluster-roles/ mv rbac.authorization.k8s.io_v1--ClusterRole/* ../additional-cluster-roles/ fi if [ -d "rbac.authorization.k8s.io_v1--Role" ] then mkdir ../additional-roles/ mv rbac.authorization.k8s.io_v1--Role/* ../additional-roles/ fi cd .. rm -rf split-yaml now you can either write a custom script or manually take the values from the roles and clusterroles in the new folders and update the baseline recieved from the audit2rbac tool. create overlay to remove resourceNames field when verb is create cat <<EOF > remove-resource-name-config-when-create.yaml #@ load("@ytt:overlay","overlay") #@ def bad_rule(): verbs: - create #@ end #@overlay/match by=overlay.all --- rules: #@overlay/match by=overlay.subset(bad_rule()), expects="1+" - #@overlay/remove resourceNames: [] EOF run the overlay against all rbac files and then add those updated files to your git repo. Solving the Role vs Cluster Role issue Solution 1 – Pre Create The Namespace The easiest solution is to pre create the namespace. while this requires a step before installation, you could also simply add the namespace configuration to the FluxCD Kustomization repo as well. Solution 2 – Move everything to a Cluster Role You could also simply get rid of the seperation made between Cluster Role and Role permissions and just merge the 2 sets of permissions. While this is less resources to maintain, and aligns to the resources created by Tanzu CLI and the TMC GUI, it veers away from the least privilege model, and is a suboptimal choice. None the less, this is still a much better choice from a security perspective, over the default Cluster Admin approach! What would a file manifest look like In the end after configuring RBAC correctly and everything else, you can see an example of a working configuration in the following Git repo. Summary While there is definitely a learning curve with these things, the ability to configure flux at the cluster group level is huge. I think that overtime we will see the carvel packaging ecosystem evolve to more easily support fine grained RBAC as we have tried to do here.

Having self service capabilities for cluster provisioning is a great thing for many organizations, but deploying a cluster is not enough.

We need the ability to have certain software installed in every cluster, before our end user begins working with it.

This could be infra services like an Ingress Controller, External DNS or Cert Manager. It could also be monitoring tools like Prometheus or Grafana, logging tools like FluentBit or Loki, and many times it could be security tooling like Aqua or Prisma.

Many organizations due to the complexity of this setup, decide to have some sort of pipeline that provisions clusters, installs software imperatively, and then sends a notification to the requesting user with the details for connecting to this cluster.

In this post, we will see how this can be achieved in a declarative approach, without custom automation, using TMC.

Continue to read more about this post at VRABBI'S BLOG where it was originally published.

Filter Tags

Tanzu Community Content