February 13, 2023

Trusting Private CAs in TKG 2.1 clusters

Preface TKG 2.1 has a very different configuration mechanism for clusters in comparison to previous versions of TKG. In this post we will discuss how to add trust for a CA certificate to a cluster in TKG 2.1. What needs to trust our CA In a TKG cluster, there are 2 components we will want to add our CA to: Containerd – the container runtime that pulls the images from our registry Kapp Controller – the package management tool used for deploying Tanzu Packages on our clusters Lets try and see how this can be done For those that read my blog post on my initial impressions for TKG 2.1 will have seen that I call out that this is now possible without custom overlays like were needed in TKG 1.x. While this is true, there are some caveats and a few limitations to this, that can easily be solved without needing to write anything custom, as long as we do things in a very specific manner. Making Containerd trust our CA This is the easiest to do, but watch out for the catch. In TKG 2.1, Clusters are now provisioned using ClusterClass as the templating mechanism, which when it comes to templating has 2 important concepts: variables – Variables defines the variables which can be configured in the Cluster object and are then used in patches patches – Patches defines the patches which are applied to customize referenced templates of a ClusterClass. With this in mind, we can get a list of all the top level variables in the default TKG Cluster Class by running a command like: kubectl get clusterclass tkg-vsphere-default-v1.0.0 -o json | \ jq -r '.spec.variables[].name' | sort When we run this command we will receive a list of the variable names. One of them that should catch our eyes is the one called "trust". To take a deeper look at the trust variable we can run the following command: kubectl get clusterclass tkg-vsphere-default-v1.0.0 -o json | \ jq -r '.spec.variables[] | select(.name=="trust")' The result should like like: { "name": "trust", "required": false, "schema": { "openAPIV3Schema": { "properties": { "additionalTrustedCAs": { "items": { "properties": { "data": { "type": "string" }, "name": { "type": "string" } }, "required": [ "name", "data" ], "type": "object" }, "type": "array" } }, "type": "object" } } } As we can see, under the trust variable, we have a property called additionalTrustedCAs, which accepts an array of properties each with a name and data field. When I saw this, I got excited that it seemed so easy, so I created a cluster with the following in its configuration: spec: topology: class: tkg-vsphere-default-terasky-v1.0.0 variables: - name: trust value: additionalTrustedCAs: - data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUZUakNDQkRhZ0F3SUJBZ0lUZGdBQUFHOFJVVnBwTGhrWEJRQUFBQUFBYnpBTkJna3Foa2lHOXcwQkFR.......... name: my-ca-cert Unfortunately, when I tried to run an image from my Harbor registry signed by that CA in the new cluster, it failed. It complained about the registry being signed by an untrusted CA. This got me confused as this seemed like it should be straight forward, but it was not. To understand what was going on, I took a deeper look at the ClusterClass resource. In this case I wanted to see in which patches this variable was being used, and what it was doing. after searching through the ClusterClass YAML, i found that in 2 patches this variable is being used: registryCACert httpProxyCACert By the looks of it the registryCACert is what I thought made sense to look at first. the definition of this patch can be pulled out with a command like: kubectl get clusterclass tkg-vsphere-default-v1.0.0 -o yaml | \ yq '.spec.patches[] | select(.name=="registryCACert")' This will return the following YAML output: definitions: - jsonPatches: - op: add path: /spec/template/spec/kubeadmConfigSpec/preKubeadmCommands/- valueFrom: template: | {{- $host := index (or .imageRepository.host (index .TKR_DATA .builtin.controlPlane.version).kubernetesSpec.imageRepository | splitList "/") 0 -}} echo '[plugins."io.containerd.grpc.v1.cri".registry.configs." {{- $host -}} ".tls]' >> /etc/containerd/config.toml - op: add path: /spec/template/spec/kubeadmConfigSpec/preKubeadmCommands/- valueFrom: template: | {{- $host := index (or .imageRepository.host (index .TKR_DATA .builtin.controlPlane.version).kubernetesSpec.imageRepository | splitList "/") 0 }} {{- $val := list "ca_file = \"/etc/containerd/" $host ".crt\"" | join "" }} {{- with .imageRepository }} {{- if .tlsCertificateValidation | eq false }} {{- $val = "insecure_skip_verify = true" }} {{- end }} {{- end -}} {{- define "echo" -}} echo ' {{ . -}} ' >> /etc/containerd/config.toml {{- end }} {{- template "echo" $val -}} - op: add path: /spec/template/spec/kubeadmConfigSpec/preKubeadmCommands/- value: systemctl restart containerd - op: add path: /spec/template/spec/kubeadmConfigSpec/files/- valueFrom: template: | path: /etc/containerd/ {{- index (or .imageRepository.host (index .TKR_DATA .builtin.controlPlane.version).kubernetesSpec.imageRepository | splitList "/") 0 -}} .crt {{- $proxy := "" }} {{- $image := "" }} {{- range .trust.additionalTrustedCAs }} {{- if eq .name "proxy" }} {{- $proxy = .data }} {{- end }} {{- if eq .name "imageRepository" }} {{- $image = .data }} {{- end }} {{- end }} content: {{or $proxy $image}} encoding: base64 permissions: "0444" selector: apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: KubeadmControlPlaneTemplate matchResources: controlPlane: true - jsonPatches: - op: add path: /spec/template/spec/preKubeadmCommands/- valueFrom: template: | {{- $host := index (or .imageRepository.host (index .TKR_DATA .builtin.machineDeployment.version).kubernetesSpec.imageRepository | splitList "/") 0 -}} echo '[plugins."io.containerd.grpc.v1.cri".registry.configs." {{- $host -}} ".tls]' >> /etc/containerd/config.toml - op: add path: /spec/template/spec/preKubeadmCommands/- valueFrom: template: | {{- $host := index (or .imageRepository.host (index .TKR_DATA .builtin.machineDeployment.version).kubernetesSpec.imageRepository | splitList "/") 0 }} {{- $val := list "ca_file = \"/etc/containerd/" $host ".crt\"" | join "" }} {{- with .imageRepository }} {{- if .tlsCertificateValidation | eq false }} {{- $val = "insecure_skip_verify = true" }} {{- end }} {{- end -}} {{- define "echo" -}} echo ' {{ . -}} ' >> /etc/containerd/config.toml {{- end }} {{- template "echo" $val -}} - op: add path: /spec/template/spec/preKubeadmCommands/- value: systemctl restart containerd - op: add path: /spec/template/spec/files/- valueFrom: template: | path: /etc/containerd/{{ index (or .imageRepository.host (index .TKR_DATA .builtin.machineDeployment.version).kubernetesSpec.imageRepository | splitList "/") 0 }}.crt {{- $proxy := "" }} {{- $image := "" }} {{- range .trust.additionalTrustedCAs }} {{- if eq .name "proxy" }} {{- $proxy = .data }} {{- end }} {{- if eq .name "imageRepository" }} {{- $image = .data }} {{- end }} {{- end }} content: {{or $proxy $image}} encoding: base64 permissions: "0444" selector: apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 kind: KubeadmConfigTemplate matchResources: machineDeploymentClass: names: - tkg-worker enabledIf: '{{ not (empty .trust.additionalTrustedCAs) }}' name: registryCACert The important parts of this are the following: This line defines when this should be run, which is set to be when the variable trust has a non empty array of additionalTrustedCAs enabledIf: '{{ not (empty .trust.additionalTrustedCAs) }}' This line tells us that the expected format of the content being set in the variable is the base64 encoded value of our certificate: encoding: base64 permissions: "0444" The following is where the issue lies - op: add path: /spec/template/spec/files/- valueFrom: template: | path: /etc/containerd/{{ index (or .imageRepository.host (index .TKR_DATA .builtin.machineDeployment.version).kubernetesSpec.imageRepository | splitList "/") 0 }}.crt {{- $proxy := "" }} {{- $image := "" }} {{- range .trust.additionalTrustedCAs }} {{- if eq .name "proxy" }} {{- $proxy = .data }} {{- end }} {{- if eq .name "imageRepository" }} {{- $image = .data }} {{- end }} {{- end }} content: {{or $proxy $image}} encoding: base64 permissions: "0444" In the above section, this is where we are adding in the CA data as a file on the nodes. The issue is that the templating has 2 very annoying and seemingly pointless conditionals: {{- if eq .name "proxy" }} {{- $proxy = .data }} {{- end }} and {{- if eq .name "imageRepository" }} {{- $image = .data }} {{- end }} Basically, these conditional statements make it so that only if the name given to the certificate in the cluster YAML, is "imageRepository" or "proxy" will the contents be added to the node. This is a very odd and limiting configuration, I see no logical reason for, but this is the situation. With that being the case, I decided to look at what this patch does beyond creating the file. After looking at the patch and understanding the Go Templating in it, the general idea is that it creates the file in the containerd folder, and adds configuration for the cert in the containerd configuration file, and finally restarts containerd. This is sufficient for containerd to pull images, but is not optimal, as we want the node to trust our CA in general, as otherwise debugging becomes difficult as things like curl wont trust the certificate. With that being the case i decided to take a look at the second patch mentioned above that uses the "trust" variable called "httpProxyCACert". I pulled out its configuration via the following command: kubectl get clusterclass tkg-vsphere-default-v1.0.0 -o yaml | \ yq '.spec.patches[] | select(.name=="httpProxyCACert")' which returned the following YAML definition: definitions: - jsonPatches: - op: add path: /spec/template/spec/kubeadmConfigSpec/preKubeadmCommands/- value: '! which rehash_ca_certificates.sh 2>/dev/null || rehash_ca_certificates.sh' - op: add path: /spec/template/spec/kubeadmConfigSpec/preKubeadmCommands/- value: '! which update-ca-certificates 2>/dev/null || (mv /etc/ssl/certs/tkg-custom-ca.pem /usr/local/share/ca-certificates/tkg-custom-ca.crt && update-ca-certificates)' - op: add path: /spec/template/spec/kubeadmConfigSpec/preKubeadmCommands/- value: '! which update-ca-trust 2>/dev/null || (update-ca-trust force-enable && mv /etc/ssl/certs/tkg-custom-ca.pem /etc/pki/ca-trust/source/anchors/tkg-custom-ca.crt && update-ca-trust extract)' - op: add path: /spec/template/spec/kubeadmConfigSpec/preKubeadmCommands/- value: systemctl restart containerd - op: add path: /spec/template/spec/kubeadmConfigSpec/files/- valueFrom: template: | path: /etc/ssl/certs/tkg-custom-ca.pem {{- $proxy := "" }} {{- range .trust.additionalTrustedCAs }} {{- if eq .name "proxy" }} {{- $proxy = .data }} {{- end }} {{- end }} content: {{ $proxy }} encoding: base64 permissions: "0444" selector: apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: KubeadmControlPlaneTemplate matchResources: controlPlane: true - jsonPatches: - op: add path: /spec/template/spec/preKubeadmCommands/- value: '! which rehash_ca_certificates.sh 2>/dev/null || rehash_ca_certificates.sh' - op: add path: /spec/template/spec/preKubeadmCommands/- value: '! which update-ca-certificates 2>/dev/null || (mv /etc/ssl/certs/tkg-custom-ca.pem /usr/local/share/ca-certificates/tkg-custom-ca.crt && update-ca-certificates)' - op: add path: /spec/template/spec/preKubeadmCommands/- value: '! which update-ca-trust 2>/dev/null || (update-ca-trust force-enable && mv /etc/ssl/certs/tkg-custom-ca.pem /etc/pki/ca-trust/source/anchors/tkg-custom-ca.crt && update-ca-trust extract)' - op: add path: /spec/template/spec/preKubeadmCommands/- value: systemctl restart containerd - op: add path: /spec/template/spec/files/- valueFrom: template: | path: /etc/ssl/certs/tkg-custom-ca.pem {{- $proxy := "" }} {{- range .trust.additionalTrustedCAs }} {{- if eq .name "proxy" }} {{- $proxy = .data }} {{- end }} {{- end }} content: {{ $proxy }} encoding: base64 permissions: "0444" selector: apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 kind: KubeadmConfigTemplate matchResources: machineDeploymentClass: names: - tkg-worker enabledIf: '{{ $hasProxyCert := false }} {{- range .trust.additionalTrustedCAs }} {{- if .name | eq "proxy" }} {{- $hasProxyCert = true }} {{- end }} {{- end }} {{- $hasProxyCert }}' name: httpProxyCACert If we look at what is happening here, we have a similar conditional for creating the file except that it only has a single if condition for a certificate field named "proxy". The conditional for this patch to be run is a bit more complex though: enabledIf: '{{ $hasProxyCert := false }} {{- range .trust.additionalTrustedCAs }} {{- if .name | eq "proxy" }} {{- $hasProxyCert = true }} {{- end }} {{- end }} {{- $hasProxyCert }}' My Go Templating skills are not great so I turned to google for some help and found a really awesome online tool for playing with go templating, where you can paste in the template and write a YAML definition and it will return the result. The general logic of this conditional is that if the trust.additionalTrustedCAs array is not empty and there is an item in that array with the name "proxy", it will evaluate as true, otherwise it will evaluate as false. The difference in this patch over the image registry patch, is that with this patch, we add the file to a different location: - op: add path: /spec/template/spec/files/- valueFrom: template: | path: /etc/ssl/certs/tkg-custom-ca.pem And then we run the relevant commands for adding the cert to the systems trust store by running: - op: add path: /spec/template/spec/preKubeadmCommands/- value: '! which rehash_ca_certificates.sh 2>/dev/null || rehash_ca_certificates.sh' which is the relevant command for Photon OS and then: - op: add path: /spec/template/spec/preKubeadmCommands/- value: '! which update-ca-certificates 2>/dev/null || (mv /etc/ssl/certs/tkg-custom-ca.pem /usr/local/share/ca-certificates/tkg-custom-ca.crt && update-ca-certificates)' which is the right command for Ubuntu. The final command which is added to the bootstrap process for this patch is a restart of containerd so that containerd reloads the system certificate trust store. With all this knowledge in place, we can simply say that in order to make our nodes trust our private CA, we can simply add a certificate as I did initially, however the name field must be set to "proxy" or else it won’t work. Making Kapp Controller trust our CA Making Kapp Controller trust our Certificates is actually quite easy, but has a catch as well. As Kapp Controller as well as other resource like CPI, CSI, CNI, and AKO are not CAPI objects, the ClusterClass mechanism does not help us in configuring these resources. Instead, we have a set of CRDs and controllers, that are specific to TKG which handle this for us. These resource by default are created for us if we don’t supply them, however they will be created with the default configuration, and in our case one of these resource we need to customize. The main resources are: ClusterBootstrap – A CR which pairs 1-to-1 with a cluster and must be named the same as the cluster itself. this is a meta object which defines the different components it will manage to bootstrap a cluster VSphereCPIConfig – defines the vSphere CPI configuration VSphereCSIConfig – defines the vSphere CSI configuration CalicoConfig – defines the Calico configuration if the cluster is configured to use Calico as the CNI AntreaConfig – defines the Antrea configuration if the cluster is configured to use Antrea as the CNI KubevipCPIConfig – defines the KubeVIP CPI configuration, if using KubeVIP for Service Type Load Balancer (Tech Preview) AKODeploymentConfig – defines the AKO configuration for the cluster if using NSX ALB for service type Load Balancer KappControllerConfig – defines the configuration of Kapp Controller that will be deployed in the cluster As we are talking about getting Kapp Controller to trust our CA certificate the resource we must look at is the KappControllerConfig resource. This resource is pretty straight forward, and to understand what the spec of this resource is we can run: kubectl explain kappcontrollerconfig --recursive This will return the following: KIND: KappControllerConfig VERSION: run.tanzu.vmware.com/v1alpha3 DESCRIPTION: KappControllerConfig is the Schema for the kappcontrollerconfigs API FIELDS: apiVersion <string> kind <string> metadata <Object> annotations <map[string]string> clusterName <string> creationTimestamp <string> deletionGracePeriodSeconds <integer> deletionTimestamp <string> finalizers <[]string> generateName <string> generation <integer> labels <map[string]string> managedFields <[]Object> apiVersion <string> fieldsType <string> fieldsV1 <map[string]> manager <string> operation <string> subresource <string> time <string> name <string> namespace <string> ownerReferences <[]Object> apiVersion <string> blockOwnerDeletion <boolean> controller <boolean> kind <string> name <string> uid <string> resourceVersion <string> selfLink <string> uid <string> spec <Object> kappController <Object> config <Object> caCerts <string> dangerousSkipTLSVerify <string> httpProxy <string> httpsProxy <string> noProxy <string> createNamespace <boolean> deployment <Object> apiPort <integer> concurrency <integer> hostNetwork <boolean> metricsBindAddress <string> priorityClassName <string> tolerations <[]map[string]string> globalNamespace <string> namespace <string> status <Object> secretRef <string> As we can see under spec.kappController.config we have a field called caCerts. This field expects to get the CA data as a string (not Base64 encoded) similar to bellow: spec: kappController: config: caCerts: | -----BEGIN CERTIFICATE----- MIIFmjCCBIKgAwIBAgITdgAAAHt2KlxrwUL3bgAAAAAAezANBgkqhkiG9w0BAQsF ADBOMRUwEwYKCZImiZPyLGQBGRYFbG9jYWwxFzAVBgoJkiaJk/IsZAEZFgd0ZXJh c2t5MRwwGgYDVQQDExN0ZXJhc2t5LUxBQi1BRDAxLUNBMB4XDTIzMDIwMjA5MDUy NloXDTI1MDIwMTA5MDUyNlowbDELMAkGA1UEBhMCSUwxCzAJBgNVBAgTAlBUMRMw -----END CERTIFICATE----- While it seems pretty simple, we actually need to pass in a few more parameters as we are setting the entire values structure here for kapp-controller so I have simply based it on the default config that is generated for a new cluster and added in the CA field. An example for what this should look like is: apiVersion: run.tanzu.vmware.com/v1alpha3 kind: KappControllerConfig metadata: name: vrabbi-wld-cls-05-kapp-controller-package namespace: default spec: config: caCerts: | -----BEGIN CERTIFICATE----- MIIFmjCCBIKgAwIBAgITdgAAAHt2KlxrwUL3bgAAAAAAezANBgkqhkiG9w0BAQsF ADBOMRUwEwYKCZImiZPyLGQBGRYFbG9jYWwxFzAVBgoJkiaJk/IsZAEZFgd0ZXJh c2t5MRwwGgYDVQQDExN0ZXJhc2t5LUxBQi1BRDAxLUNBMB4XDTIzMDIwMjA5MDUy NloXDTI1MDIwMTA5MDUyNlowbDELMAkGA1UEBhMCSUwxCzAJBgNVBAgTAlBUMRMw -----END CERTIFICATE----- kappController: createNamespace: false deployment: apiPort: 10100 concurrency: 4 hostNetwork: true metricsBindAddress: "0" priorityClassName: system-cluster-critical tolerations: - key: CriticalAddonsOnly operator: Exists - effect: NoSchedule key: node-role.kubernetes.io/control-plane - effect: NoSchedule key: node-role.kubernetes.io/master - effect: NoSchedule key: node.kubernetes.io/not-ready - effect: NoSchedule key: node.cloudprovider.kubernetes.io/uninitialized value: "true" globalNamespace: tkg-system namespace: tkg-system the only value that will change per cluster, is the name of the resource, which must match the following pattern: ${CLUSTER_NAME}-kapp-controller-package This name is critical, as if we name it anything else, the clusterbootstrap controller wont pick this up and will simply create a default one with this name. Summary With all that being said, the steps in the end are actually pretty simple. We add the CA to our cluster manifest (with the name being proxy) and create a KappControllerConfig resource with a specific naming convention that also includes the CA, and we are ready to go. As mentioned in my initial blog post on TKG 2.1, there are definitely growing pains in this release, and this post shows this I believe very clearly, however with a bit of time, and a few minor releases, I believe this UX will get better, and in the end we will have an even platform then before. The main issue I believe beyond the weird conditionals, and the fact that I need to supply my CA twice, once base64 encoded and once not base64 encoded, just to make this scenario work, is that the documentation is currently extremely lacking. Hopefully overtime the documentation will improve, as in order to get to this solution, I needed to do some serious reverse engineering, most of which I have not even delved into in this post. TKG 2.1, is a big milestone and seeing the direction it is going in makes me excited about the future, but there is definitely a steep learning curve and some rough edges at the moment.

TKG 2.1 has a very different configuration mechanism for clusters in comparison to previous versions of TKG.
In this post we will discuss how to add trust for a CA certificate to a cluster in TKG 2.1.

What needs to trust our CA

In a TKG cluster, there are 2 components we will want to add our CA to:

  1. Containerd – the container runtime that pulls the images from our registry
  2. Kapp Controller – the package management tool used for deploying Tanzu Packages on our clusters

Lets try and see how this can be done

For those that read my blog post on my initial impressions for TKG 2.1 will have seen that I call out that this is now possible without custom overlays like were needed in TKG 1.x.

While this is true, there are some caveats and a few limitations to this, that can easily be solved without needing to write anything custom, as long as we do things in a very specific manner.

Continue reading this post at VRABBI'S BLOG where this post was originally published.

Filter Tags

Tanzu Community Content