Kubernetes Cluster Creation in VMware Cloud on AWS with CAPV: Part 2

In Part 1 of my series on deploying Kubernetes clusters to VMware on AWS environments with ClusterAPI Provider vSphere, I detailed the processes required to stand up the CAPV management plane. After completing those steps, I am ready to provision a workload cluster to VMC using CAPV.

Creating Workload Clusters

The CAPV cluster is the brains of the operations but I still need to deploy some workload clusters for my teams of developers to deploy their applications onto. The management cluster helps automate the provisioning of all of the provider components to support my workload clusters as well as instantiating the VMs provisioned as a Kubernetes cluster. The basic use case here is that I, as the infrastructure admin, am responsible for utilizing the CAPV management cluster to provision multiple workload clusters that can support individual teams of developers, individual application deployments, etc. The CAPV management cluster allows me to easily deploy a consistent cluster in a repeatable fashion with very little manual effort. I can quickly deploy a test, dev, prod set of clusters for a team or deploy 5 different workload clusters for 5 different groups of developers.

Another usage pattern, and probably more aligned with the DevOps mentality, is to configure authentication to the Management cluster and use Kubernetes RBAC constructs to assign teams the ability to create workload clusters in their respective namespaces. This way, developers have full control over when and what they provision as long as it fits within the limitations established for them by the infrastructure team. True self service!!

In this exercise, I’m going to deploy a workload cluster named “prod-cluster”, composed of 1 master node and 4 worker nodes. I’ll start by using the same docker “manifests” image I used in Part 1 of the series, along with the same envvars.txt file, to create the .yaml file scaffolding for my workload cluster. Note, I’m using a different cluster name (prod-cluster) so all of my config files will be stored in a new directory:

# docker run --rm \
  -v "$(pwd)":/out \
  -v "$(pwd)/envvars.txt":/envvars.txt:ro \
  gcr.io/cluster-api-provider-vsphere/release/manifests:v0.5.4 \
  -c prod-cluster

Just as in the management cluster example, the “manifests” docker image creates the .yaml files I’ll need to create my workload cluster. The first thing I’ll do is create the Cluster resource using the cluster.yaml file:

# kubectl apply -f ./out/prod-cluster/cluster.yaml 

cluster.cluster.x-k8s.io/prod-cluster created
vspherecluster.infrastructure.cluster.x-k8s.io/prod-cluster created

In ClusterAPI terms, the Cluster resource defines cluster-wide configuration such as generic networking concepts like pod and service ranges or DNS domain.

Next, I am ready to create the control plane node with the controlplane.yaml file. This file defines the configuration of a vSphere virtual machine as well as a kubeadm bootstrap config that instantiates the VM as a Kubernetes master node:

# kubectl apply -f ./out/prod-cluster/controlplane.yaml 

kubeadmconfig.bootstrap.cluster.x-k8s.io/prod-cluster-controlplane-0 created
machine.cluster.x-k8s.io/prod-cluster-controlplane-0 created
vspheremachine.infrastructure.cluster.x-k8s.io/prod-cluster-controlplane-0 created

Note the output of the kubectl command, which informs me that a kubeadm bootstrap config as well as a machine (virtual machine) was created. If I navigate to the VMC console, I can observe my control plane VM being created, using the CentOS template defined in the envvars.txt file:

Now I’m ready to create my worker nodes for the cluster, which are defined by the machinedeployment.yaml file. In ClusterAPI terms, a MachineDeployment is analogues to a Deployment in the Kubernetes world. MachineDeployments manage the desired state of a group of Machines (VMs) just as Deployments manage the desired state of pods.

Before I deploy my worker nodes, I need to up the replica count to 4 in the .yaml file. This will ensure my MachineDeployment contains 4 worker nodes:

# vi ./out/prod-cluster/machinedeployment.yaml 

...
spec:
  replicas: 4
  selector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: prod-cluster
...

Now I’m ready to create my worker nodes!

# kubectl apply -f ./out/prod-cluster/machinedeployment.yaml 

kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/prod-cluster-md-0 created
machinedeployment.cluster.x-k8s.io/prod-cluster-md-0 created
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/prod-cluster-md-0 created

Again, note the output of the kubectl command which confirms that a bootstrap config has been created to instantiate these VMs as worker nodes and “join” them under the control of the existing control plane node to form a cluster. We can also confirm the Machines (VMs) have been created in the VMC console:

At this point, I now have a workload cluster that consists of 1 master and 4 worker nodes deployed in my VMC environment.

Accessing the Workload Cluster

Now I’ll need to obtain the kubeconfig file that will allow me to interact with the workload cluster. The kubeconfig file used to access workload clusters is stored as a Kubernetes Secret on the management cluster. The secrets are stored as <cluster-name>-kubeconfig. I can confirm my prod-cluster kubeconfig is available with the following command:

kubectl get secrets

NAME                        TYPE       DATA   AGE
...
prod-cluster-kubeconfig     Opaque     1      20m
...

I can also use the following command to decode the secret and place the plain text at ./out/prod-cluster/kubeconfig for later use:

kubectl get secret prod-cluster-kubeconfig -o=jsonpath='{.data.value}' | \
{ base64 -d 2>/dev/null || base64 -D; } >./out/prod-cluster/kubeconfig

In order to start interacting with the prod-cluster workload cluster, I’ll reset my KUBECONFIG environment variable to the new kubeconfig I just pulled out of the management cluster:

# export KUBECONFIG="$(pwd)/out/prod-cluster/kubeconfig"

Now I’ll use kubectl to examine the nodes of my workload cluster:

# kubectl get nodes

NAME                                 STATUS     ROLES    AGE   VERSION
prod-cluster-controlplane-0          NotReady   master   25m   v1.16.3
prod-cluster-md-0-55f55ffdb9-4467d   NotReady   <none>   16m   v1.16.3
prod-cluster-md-0-55f55ffdb9-b9hv8   NotReady   <none>   16m   v1.16.3
prod-cluster-md-0-55f55ffdb9-x9v7z   NotReady   <none>   16m   v1.16.3
prod-cluster-md-0-55f55ffdb9-xljxw   NotReady   <none>   16m   v1.16.3

Notice that I am now receiving information about my workload cluster (1 master/4 workers) instead of my management cluster (1 master). Great!

But I’m not done yet… Notice that the nodes are all in NotReady state. This is because workload clusters do not have any add-ons applied aside from those added by kubeadm. Nodes in the workload clusters will be in the NotReady state until I apply a [Container Network Interface][25] (CNI) add-on.

The “manifests” docker images automatically creates an addon.yaml file that contains the configuration to instantiate [Calico] as the CNI for the workload cluster, but you can use any CNI you wish. For the sake of simplicity, I’m going to utilize the default Calico config provided:

kubectl apply -f ./out/prod-cluster/addons.yaml

configmap/calico-config created
...
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created

Note the Calico resources defined in the addons.yaml file include some RBAC config as well as a Deployment and DaemonSet for the Calico components running on the cluster, among other things.

Now, I can verify my nodes have all transitioned into a Ready state:

# kubectl get nodes

NAME                                 STATUS   ROLES    AGE   VERSION
prod-cluster-controlplane-0          Ready    master   39m   v1.16.3
prod-cluster-md-0-55f55ffdb9-4467d   Ready    <none>   29m   v1.16.3
prod-cluster-md-0-55f55ffdb9-b9hv8   Ready    <none>   29m   v1.16.3
prod-cluster-md-0-55f55ffdb9-x9v7z   Ready    <none>   29m   v1.16.3
prod-cluster-md-0-55f55ffdb9-xljxw   Ready    <none>   29m   v1.16.3

VOILA!! Now I’m ready to start deploying my workloads to the cluster!!

Conclusion

This concludes Part 2 of my post on deploying Kubernetes Clusters to a VMware Cloud on AWS environment at scale with ClusterAPI Provider vSphere! CAPV is a very powerful tool that helps infrastructure admins provide their users with a consistent, scalable workflow for deploying and managing Kubernetes clusters on top of vSphere infrastructure.

Stay tuned to the blog for a more in depth look into how Cluster API is utilized in Project Pacific to provide self service Kubernetes cluster management natively within vSphere!

Kubernetes Cluster Creation in VMware Cloud on AWS with CAPV: Part 1

One of the biggest challenges in starting a Cloud Native practice is understanding how to establish a repeatable and consistent method of deploying and managing Kubernetes clusters. That’s where ClusterAPI comes in handy!! ClusterAPI (CAPI) is a Kubernetes project to bring declarative, Kubernetes-style APIs to cluster creation, configuration, and management. It provides optional, additive functionality on top of core Kubernetes to manage the lifecycle of a Kubernetes cluster. Now you can use Kubernetes to create more Kubernetes!!!!

ClusterAPI is responsible for provisioning all of the infrastructure required to support a Kubernetes cluster. CAPI also provides the ability to perform Day2 operations, such as scaling and upgrading clusters. Most importantly, it provides a consistent management plane to perform these actions on multiple clusters. In fact, ClusterAPI is a big part of what will allow VI admins to orchestrate and automate the provisioning of Kubernetes clusters natively as a part of vSphere with Project Pacific. Learn more about the Project Pacific architecture and how it utilizes ClusterAPI here.

ClusterAPI Provider vSphere (CAPV)

The ClusterAPI special interest group has helped foster and sponsor implementations of CAPI for specific infrastructure providers. That’s where ClusterAPI Provider vSphere (CAPV) comes in! CAPV is a specific implementation of ClusterAPI that brings in additional functionality for allowing ClusterAPI to deploy Kubernetes clusters to vSphere environments. In Part 1 of my series, I’m going to walk through the process of preparing my VMC environment to support Kubernetes cluster creation via CAPV. I’m also going to detail the steps required to provision the control plane (bootstrap and management clusters) of my CAPV environment.

Environment and Terminology

The environment I am utilizing in this post consists of a couple of different components. First, I have a CentOS jumpbox that is deployed in my on-premises VMware environment. This jumpbox will house what is called the “bootstrap cluster” in CAPI terms. A bootstrap cluster is a temporary cluster that is used to provision a management cluster. In the case of CAPV, we are going to use a KinD (Kubernetes in Docker) cluster, deployed on the jumpbox, as the bootstrap cluster. KinD is a great tool for deploying Kubernetes clusters on a single machine, like your local workstation! Learn more about KinD here.

As mentioned above, the bootstrap cluster is responsible for provisioning the “management cluster.” The management cluster is the cluster where information about one or more Infrastructure Providers (in this case, my VMC lab) is stored. The management cluster also stores information about the different components of workload clusters, such as machines, control planes, bootstrap configuration, etc. The management cluster is responsible for provisioning X amount of workload clusters; it is the brains of the operation.

Workload clusters are conformant Kubernetes clusters that our developers applications will be deployed to. The high level workflow is that developers will use kubectl to pass .yaml files that define the spec of a workload cluster to the management cluster and the management cluster will handle the creation of all of the resources to create the workload cluster. In this post, I will use the management cluster to provision a 5 node workload cluster to support my applications.

As you may notice in the diagram, I am going to be utilizing our team’s VMware Cloud on AWS SDDC lab environment to support my cluster deployments. In the next section, I’ll go over some prereqs required to prepare the VMC environment for Kubernetes cluster creation via CAPV.

Prerequisites

In order to use CAPV to deploy clusters to VMC, I needed to complete a couple of prereqs in the VMC SDDC. First off, I needed to create a network segment that my Kubernetes nodes would utilize to communicate with each other and reach out to the internet to pull OS packages and Docker images from public repos. For information on creating segments in VMC, please refer to the product documentation. I created a routed segment (named sddc-k8-jomann) with a DHCP range to provide IP addresses to my Kubernetes cluster nodes:

After creating the segment, I also needed to create a compute firewall rule that applies to the segment to allow ingress to my cluster nodes. Since our VMC SDDC is hosted behind a VPN, I simply decided to allow all traffic to the sddc-k8-jomann segment for simplicity sake. This will allow users behind the VPN to access their Kubernetes clusters:

Finally, CAPV will deploy clusters that utilize the vSphere CSI Driver in conjunction with the Kubernetes vSphere Cloud Provider to allow developers to dynamically provision persistent storage for their workloads running in the Kubernetes clusters. In order for the vSphere Cloud Provider to be configured during deployment, the Kubernetes nodes need to be able to communicate with vCenter to set the required configuration on the Kubernetes nodes. This means I’ll need to create a Management Gateway Firewall Rule to allow communication between my sddc-k8-jomann segment and vCenter on port 443:

Last but not least, I need to load a OVA template into the VMC environment that CAPV will use to build out my Kubernetes nodes. I will be utilizing a CentOS 7 image preloaded with Kubernetes 1.16.3. You can find a list of available images here.

Now that I’ve covered the VMC prereqs, let’s talk about the jumpbox. If you’d like to use this post as a guide (along with the Getting Started Guide put together by the CAPV team), you’ll need to ensure the following tools are installed and configured on the jumpbox:

clusterctl is a tool that CAPV utilizes to automate the creation of the bootstrap and management clusters. It is not required but makes the process of instantiating the management plane of CAPV a lot easier.

Docker is utilized by KinD to create the bootstrap cluster. I’ll also use a “manifests” CAPV Docker image to automate the creation of all the manifests I’ll need to create my clusters.

Finally, kubectl is the Kubernetes command-line tool that allows me to run commands against my Kubernetes clusters. clusterctl will also utilize kubectl during the creation of the bootstrap/management clusters.

Creating the Bootstrap and Management Clusters

The first thing I’ll need to do is create the management cluster. CAPV provides a “manifests” docker image that I can use to automatically generate the .yaml manifests that clusterctl will use to create my KinD bootstrap cluster. I’ll also provide a envvars.txt file that contains information about the VMware Cloud on AWS environment that I’ll be deploying the clusters to. See example output below:

# cat envvars.txt

# vCenter config/credentials
export VSPHERE_SERVER='vmc.demolab.com'
export VSPHERE_USERNAME='cloudadmin@vmc.local'
export VSPHERE_PASSWORD='MyPassword!'

# vSphere deployment configs
export VSPHERE_DATACENTER='SDDC-Datacenter'
export VSPHERE_DATASTORE='WorkloadDatastore'
export VSPHERE_NETWORK='sddc-k8-jomann'
export VSPHERE_RESOURCE_POOL='/SDDC-Datacenter/host/Cluster-1/Resources/Compute-ResourcePool/mannimal-k8s'
export VSPHERE_FOLDER='/SDDC-Datacenter/vm/Workloads/mannimal-k8s'
export VSPHERE_TEMPLATE='centos-7-kube-v1.16.3-temp'
export SSH_AUTHORIZED_KEY='<ssh-pub-key>'

# Kubernetes configs
export KUBERNETES_VERSION='1.16.3'

As you can see above, this is where I’ll define things like the datacenter, datastore, network, etc. that the VMs will be deployed to. I also defined a public ssh key that will be loaded onto the VMs that are created in case I need to troubleshoot deployments at the OS level. Finally, I defined the Kubernetes version (1.16.3) that I’d like to be utilized in my deployments, both for management and workload clusters. There are additional optional variables that can be defined in the envvars.txt file such as VM config (mem, cpu, storage) and additional Kubernetes cluster configs. For a full list of those optional values, refer to the CAPV Quick Start Guide.

A Shoutout to govc

govc is a vSphere CLI tool that is designed as an alternative to the vSphere Web UI. I’ve found govc very useful in confirming the “locations” of the vSphere resources I’ll need to define in the envvars.txt file. From my experience, most of the issues I’ve had with CAPV deployments stem from incorrect values in the envvars.txt file.

I recommend installing and configuring govc and using it to confirm the values utilized in the vSphere Deployment Configs section of the envvars.txt. In my example, I created and sourced the following govc-creds.sh file to ensure govc knows how to reach my VMC environment:

# cat govc-creds.sh

 # vCenter host
 export GOVC_URL=vmc.demolab.com
 # vCenter credentials
 export GOVC_USERNAME=cloudadmin@vmc.local
 export GOVC_PASSWORD=MyPassword!
 # disable cert validation
 export GOVC_INSECURE=true

# source govc-creds.sh

Now I can use govc to verify my vSphere config variables. For example, to confirm the VSPHERE_RESOURCE_POOl and VSPHERE_FOLDER variables:

# govc pool.info mannimal-k8s
Name:               mannimal-k8s
  Path:             /SDDC-Datacenter/host/Cluster-1/Resources/Compute-ResourcePool/mannimal-k8s
  ...

# govc folder.info mannimal-k8s
Name:        mannimal-k8s
  Path:      /SDDC-Datacenter/vm/Workloads/mannimal-k8s
  ...

I utilized the Path: values from the govc output in my envvars.txt variables to ensure the bootstrap cluster can locate all of the required vSphere resources when provisioning the management cluster. Ok, now back to the fun stuff…

Creating the Management Cluster Manifests

Now that I’ve verfied my vSphere config variables with govc I’m ready to use the following command, which will utilize v0.5.4 version of the “manifests” image, to create the .yaml manifests for the CAPV management cluster. I’ll also use the -c flag to set the cluster name to management-cluster:

# docker run --rm \
  -v "$(pwd)":/out \
  -v "$(pwd)/envvars.txt":/envvars.txt:ro \
  gcr.io/cluster-api-provider-vsphere/release/manifests:v0.5.4 \
  -c management-cluster

...
Generated ./out/management-cluster/cluster.yaml
Generated ./out/management-cluster/controlplane.yaml
Generated ./out/management-cluster/machinedeployment.yaml
Generated /build/examples/provider-components/provider-components-cluster-api.yaml
Generated /build/examples/provider-components/provider-components-kubeadm.yaml
Generated /build/examples/provider-components/provider-components-vsphere.yaml
Generated ./out/management-cluster/provider-components.yaml
WARNING: ./out/management-cluster/provider-components.yaml includes vSphere credentials

Notice the output of the docker run command gives me the location of various .yaml files that define the configuration of my management cluster. The clusterctl tool will utilize these .yaml files to create the KinD bootstrap cluster as well as the CAPV management cluster running on a VM in my VMC envrionment.

Now that I’ve got my bootstrap/management cluster scaffolding, I’m ready to use clusterctl to create my boostrap cluster, which will in turn provision the VM in VMC that will serve as my management cluster. clusterctl will then “pivot” the CAPV management stack from the boostrap KinD cluster to the CAPV management cluster running in VMC. I’ll use the following clusterctl command, complete with the .yaml files generated by the docker manifests packages, to kick off this process:

clusterctl create cluster \
  --bootstrap-type kind \
  --bootstrap-flags name=management-cluster \
  --cluster ./out/management-cluster/cluster.yaml \
  --machines ./out/management-cluster/controlplane.yaml \
  --provider-components ./out/management-cluster/provider-components.yaml \
  --addon-components ./out/management-cluster/addons.yaml \
  --kubeconfig-out ./out/management-cluster/kubeconfig

Let’s go step by step and examine the output of the clusterctl command:

Creating the Bootstrap Cluster

26007 createbootstrapcluster.go:27] Preparing bootstrap cluster
26007 clusterdeployer.go:82] Applying Cluster API stack to bootstrap cluster
26007 applyclusterapicomponents.go:26] Applying Cluster API Provider Components
...

The first thing clusterctl does is provision a KinD Kubernetes cluster on the jumpbox server that will serve as the boostrap cluster for CAPV. Then, clusterctl applies the CAPV components to the Kubernetes cluster and ensures the Provider Components, which is the VMC environment info, is available to the cluster as well.

Creating the Infrastructure for the CAPV Management Cluster

...
clusterdeployer.go:87] Provisioning target cluster via bootstrap cluster
26007 applycluster.go:42] Creating Cluster referenced object "infrastructure.cluster.x-k8s.io/v1alpha2, Kind=VSphereCluster" with name "management-cluster" in namespace "default"
26007 applycluster.go:48] Creating cluster object management-cluster in namespace "default"
26007 clusterdeployer.go:96] Creating control plane machine "management-cluster-controlplane-0" in namespace "default"
I0121 12:11:03.460051   26007 applymachines.go:40] Creating Machine referenced object "bootstrap.cluster.x-k8s.io/v1alpha2, Kind=KubeadmConfig" with name "management-cluster-controlplane-0" in namespace "default"
...
26007 applymachines.go:46] Creating machines in namespace "default"
...

At this point, the boostrap cluster reaches out to the VMC environment and provisions a VM that will eventually serve as the CAPV management cluster. From the output above, note the bootstrap cluster is creating various objects, including the management-cluster-controlplane-0 machine as well as instantiating that machine as a Kubernetes cluster using the KubeadmConfig created from the “manifests” docker image.

If I navigate over to my VMC console, I can observe the VM is created in the resource pool defined in the envvars.txt file referenced earlier in the post:

“Pivoting” the Management Stack

...
26007 clusterdeployer.go:123] Pivoting Cluster API stack to target cluster
26007 pivot.go:76] Applying Cluster API Provider Components to Target Cluster
26007 pivot.go:81] Pivoting Cluster API objects from bootstrap to target cluster
26007 clusterdeployer.go:128] Saving provider components to the target cluster
...

Now the fun begins! After creating the VM and instantiating it as a Kubernetes cluster, the bootstrap cluster “pivots” the CAPV management stack over to the newly created management cluster. This ensure that the management cluster has the neccessary Provider config to support the creation of workload clusters going forward.

Cleaning Up

...
26007 clusterdeployer.go:164] Done provisioning cluster. You can now access your cluster with kubectl --kubeconfig ./out/management-cluster/kubeconfig
26007 createbootstrapcluster.go:36] Cleaning up bootstrap cluster.

Now that the management cluster has been created in VMC, clusterctl outputs the location of the kubeconfig file that I’ll use to interact with the management cluster as well as deleting the KinD bootstrap cluster. From this point forward, I will use the CAPV management cluster in VMC to create additional workload clusters. In order to ensure this is the case, I’m going to set the KUBECONFIG envrionment variable to the kubeconfig file of the management cluster I just created:

# export KUBECONFIG="$(pwd)/out/management-cluster/kubeconfig"

Now, when I use kubectl I am interacting directly with my CAPV management cluster deployed in VMC.

Troubleshooting

If you’re lucky, clusterctl will work without a hitch and you’ll have your bootstrap and management clusters provisioned on your first try! If you’re like me, things may not go as planned on the first couple of runs… If KinD and Docker are installed installed and configured correctly, clusterctl should have no issue moving through the Creating the Bootstrap Cluster steps referenced above.

Generally problems occur when the bootstrap cluster is trying to provision the management cluster in the target environment. I’ve found the best way to troubleshoot the process is to view the logs of the capv-system pods on the bootstrap cluster. Normally, if there is a problem during deployment of the management cluster, you’ll see the clusterctl output hang at the following step:

26007 applymachines.go:46] Creating machines in namespace "default"

When a KinD cluster is created, the kubeconfig file is stored in the default location kubectl will look to for a config file (${HOME}/.kube/config) unless the $KUBECONFIG environment variable has been set. If no $KUBECONFIG envrionment variable is set, you can run the following command on the jumpbox server in another terminal to follow the capv-system pod’s logs:

kubectl logs -n capv-system $(kubectl -n capv-system get po -o jsonpath='{.items..metadata.name}') -f

For example, in an earlier deployment, there was a typo in my VSPHERE_RESOURCE_POOL variable that I was able to confirm by viewing the following error message in the capv-system logs:

E1217 21:07:41.348178       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" 
"error"="failed to reconcile VM: unable to get resource pool for \"default/management-cluster/management-cluster-controlplane-0\": resource pool 'Cluster-1/mannimal-k8s' not found"  
"controller"="vspheremachine" "request"={"Namespace":"default","Name":"management-cluster-controlplane-0"}

As you may notice from my error message, the bootstrap cluster is looking for a vSphere resource pool at Cluster-1/mannimal-k8s and is unable to find it. Through utilizing govc, I was able to confirm the full path of the resource pool and correct the VSPHERE_RESOURCE_POOL variable in my envvars.txt file. For additional troubleshooting tips, please refer to the troubleshooting guide from the CAPV documentation

Conclusion

This concludes Part 1 of my post on automating the deployment of Kubernetes clusters to VMware Cloud on AWS with ClusterAPI Provider vSphere. In this post, I walked through the various steps required to prepare the VMC environment to support cluster creation via CAPV as well as walking through the process of deploying the bootstrap and management clusters with clusterctl.

Join me in Part 2 of my post where I’ll utilize the management cluster to create a workload cluster that I can use to provision my applications!!

Container Service Extension 2.5 Installation: Part 3

In Parts 1 and 2 of my series on installing and configuring the Container Service Extension for VMware Cloud Director, I focused on setting the CSE server up to support CSE Standard Kubernetes cluster creation.

CSE Standard clusters are comprised of deployed vApps that utilize NSX-V networking resources, utilizing Weave as the Container Network Interface for the Kubernetes clusters. In Part 3 of my series, I wanted to take some time to look at configuring the CSE Server to support the creation of CSE Enterprise Kubernetes clusters. CSE Enterprise clusters are comprised of VMware Enterprise PKS Kubernetes clusters deployed on top of NSX-T networking resources, utilizing the NSX Container Plugin as a CNI. CSE Enterprise brings enterprise grade features and functionality to CSE that include, but are not limited to:

  • HA, multi-master Kubernetes clusters
  • Dynamic persistent storage provisioning with the vSphere Cloud Provider integration
  • Automated Day 1 and Day 2 Kubernetes cluster management via Bosh Director
  • Microsegmentation capability for Kubernentes resources via integration with NSX-T
  • Automated creation of Kubernetes service type LoadBalancer and ingress resrouces via NSX-T L4/L7 load balancers
  • Support for Harbor, an open source cloud native registry

How Does CSE Enterprise Work?

Before we get started on the configuration steps, I want to take some time to explore how the CSE Server communicates with both Enterprise PKS and NSX-T to automate the creation of Kubernetes clusters, all triggered by a request from an end user via the vcd-cli. When a user issues a vcd cse create cluster command, they are either utilizing an OrgVDC that has been enabled for CSE Standard, CSE Enterprise, or no Kubernetes provider at all. If their OvDC has been enabled to support CSE Enterprise cluster creation (and they have the correct permission to create clusters), the cse extension of the vcd-cli communicates directly with the PKS and NSX-T APIs to initiate the request to create an Enterprise PKS Kubernetes cluster as well as creating all of the NSX-T resources (T1 routers, Load Balancers, logical switches, etc.) to support the cluster. The CSE server also communicates with the NSX-T API to create DFW rules that isolate provisioned clusters from each other to support network resource isolation.

As opposed to CSE Standard, which creates a vApp from a vApp template and then instantiates the vApp as a Kubernetes cluster, the PKS control plane receives an API call from the CSE server and utilizes the plan assigned to the OrgVDC to create the cluster. There is not vApp or vApp template required to provision CSE Enterprise clusters.

CSE PKS Config File

Before we get started, I want to note that I have already deployed both VMware Enterprise PKS and VMware NSX-T Datacenter in my envrironment. CSE will need to hook into an existing deployment of PKS and NSX-T so these components must be configured before configuring CSE Enterprise.

As mentioned in Part 1 of the series, CSE uses a yaml config file that contains information about the vSphere/VCD environment(s) that CSE will need to interact with to deploy CSE Standard Kubernetes clusters. Similarly, we’ll need to utilize a second yaml file that will contain all of the information that CSE will need to communicate with the PKS/NSX-T APIs to orchestrate the provisioning of Enterprise PKS Kubernetes clusters. You can view a sample PKS config file on the CSE docs site but I will walk through through the required components below.

The first step is to create a pks-config.yaml file that I’ll populate with the values below. I’m going to create the file in the same directory as my existing config.yaml file:

$ vi ~/pks-config.yaml

Now I can populate the file with the required information, detailed below:

pks_api_servers section

This section of the config file will contain information that CSE will use to communicate with the PKS API server:

pks_api_servers:
- clusters:
  - RegionA01-COMP01          <--- Cluster name as it appears in vCenter
  cpi: 6bd3f75a31e5c3f2d65e   <--- Bosh CPI value, see note below
  datacenter: RegionA01       <--- Datacenter name as it appears in vCenter
  host: pks.corp.local        <--- FQDN of the PKS API server
  name: PKS-1                 <--- Name used to identity this particular PKS API server, user defined in this file
  port: '9021'                <--- Port used to for PKS API communication, default = 9021
  uaac_port: '8443'           <--- Port used to authenticate against the PKS UAAC service, default = 8443
  vc: vcsa-01                 <--- vCenter name as it appears in VCD
  verify: false               <--- Set to "true" to verify SSL certificates

Note: The cpi value above can be obtained by utilizing the Bosh CLI to run the following command on the Opsman VM in the PKS environment:

$ bosh cpi-config

Using environment '172.31.0.2' as client 'ops_manager'

cpis:
- migrated_from:
  - name: ""
  name: 6bd3f75a31e5c3f2d65e    <--- cpi value 
---output omitted---

pks_accounts section

This section defines the PKS UAAC credentials that CSE will use to authenticate against the PKS API when creating clusters:

pks_accounts:
- name: PKS1-admin                    
  pks_api_server: PKS-1
  secret: fhB-Z5hHcsXl_UnC86dkuYlTzQPoE3Yz
  username: admin
  • name is a user defined value that identifies the name of this credentials set. This is used in case there are multiple PKS API servers, which require their own set of credentials
  • pks_api_server is the user defined variable for the PKS API server defined by the name value in the pks_api_servers section (PKS-1 in my example)
  • username is the UAAC username that CSE will use to authenticate against the PKS API server (using the admin user in my example)
  • secret is the secret tied to the user defined in the username value. If using the admin user, you can obtain the secret by navigating to the Opsman web UI, selecting the PKS tile, selecting the credentials tab, and then selecting Link to Credential next to the Pks Uaa Management Admin Client entry. See screenshot below:

pvdcs section

This section defines the PvDC(s) in VCD that support the Org and OrgVDCs that are enabled to support PKS cluster creation via CSE:

pvdcs:
- cluster: RegionA01-COMP01  <--- Cluster name as it appears in vCenter
  name: prod-pvdc            <--- PvDC name as it appears in VCD
  pks_api_server: PKS-1      <--- user defined variable for the PKS API server defined by the `name` value in the `pks_api_servers` section

nsxt_servers section

This section defines information that CSE will need to communicate with the NSX-T API to create the NSX-T resources to back the Kubernetes clusters as well as create the DFW to isolate the clusters provisioned via CSE:

nsxt_servers:
- distributed_firewall_section_anchor_id: 9d6d2a5c-c32d-419c-ada8-e5208475ca88
  host: nsxmgr-01a.corp.local
  name: nsxt-server-1
  nodes_ip_block_ids:
  - eac47bea-5304-4a7b-8c10-9b16e62f1cda
  password: MySuperSecretPassword!
  pks_api_server: PKS-1
  pods_ip_block_ids:
  - 27d2d1c3-969a-46a5-84b9-db503ce2edd5
  username: admin
  verify: false
  • distributed_firewall_section_anchor_id is the UUID of the “Default Layer3 Section” DFW rule created by PKS during the installation of Enterprise PKS, see screenshot below:

  • host is the FDQN of the NSX-T Management server
  • name is a user defined name for this particular NSX-T instance
  • nodes_ip_block_ids is the UUID of the IP block created in NSX-T to be used by PKS for cluster nodes, see screenshot below:

  • password is the password for the NSX-T user defined in the config file
  • pks_api_server is the user defined variable for the PKS API server defined by the name value in the pks_api_servers section (PKS-1 in my example)
  • pods_ip_block_ids is the UUID of the IP block created in NSX-T to be used by PKS to assign IP addresses to the pods running in a cluster.

  • username is the NSX-T username CSE will use to authenticate against the API
  • verify can be set to “true” to verify SSL certificates

There we have it! This is the bare minimun information required for CSE to deploy Enterprise PKS clusters via vcd-cli. Do note, in my example deployment above, I have a single PKS deployment, cluster, PvDC, and NSX-T instance that CSE is communicate with. You can also add additional of each resource as required for your deployment.

Enabling CSE Enterprise on the CSE Server

Now that I’ve built my PKS config file for CSE, I’m ready to enable CSE Enterprise for my tenants’ org(s). I’ve already deployed the CSE server only utilizing the CSE Standard methodology so the first thing I’ll need to do is update the config.yaml file to point to the new pks-config.yaml file:

$ vi ~/config.yaml

---output omitted---

pks_config: pks-config.yaml

---output omitted---

After updating the config.yaml file with the filename of the PKS config file, I’ll stop the CSE service on the CSE server and re-install CSE:

$ sudo systemctl stop cse

$ cse install -c config.yaml --skip-template-creation

Required Python version: >= 3.7.3
Installed Python version: 3.7.3 (default, Nov 13 2019, 16:41:06)
[GCC 7.4.0]
Validating config file 'config.yaml'
Connected to AMQP server (vcd.corp.local:5672)
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Connected to vCloud Director (vcd.corp.local:443)
Connected to vCenter Server 'vcsa-01' as 'administrator@corp.local' (vcsa-01a.corp.local:443)
Config file 'config.yaml' is valid
Validating PKS config file 'pks-config.yaml'
Connected to PKS server (PKS-1 : pks.corp.local)
Connected to NSX-T server (nsxmgr-01a.corp.local)
PKS Config file 'pks-config.yaml' is valid
Installing CSE on vCloud Director using config file 'config.yaml'
Connected to vCD as system administrator: vcd.corp.local:443
Checking for AMQP exchange 'cse-ext'
AMQP exchange 'cse-ext' is ready
Registered cse as an API extension in vCD
Registering Right: CSE NATIVE DEPLOY RIGHT in vCD
Registering Right: PKS DEPLOY RIGHT in vCD
Creating catalog 'cse'
Created catalog 'cse'
Skipping creation of templates.
Configuring NSX-T server (nsxt-server-1) for CSE. Please check install logs for details.

As you can see from the output of the cse install command, CSE was able to communicate with the PKS and NSX-T API. The CSE server also added a new rights bundle (PKS DEPLOY RIGHT) that I will use to grant users the ability to provision CSE Enterprise Kubernetes clusters.

After successfully installing CSE with the PKS config, I’ll restart the CSE service:

$ sudo systemctl start cse

and verify the service is running as expected (look for the MessageConsumer threads):

$ sudo systemctl status cse
[sudo] password for cse:
● cse.service
   Loaded: loaded (/etc/systemd/system/cse.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-11-18 12:08:18 EST; 1 day 3h ago
 Main PID: 911 (sh)
    Tasks: 7
   Memory: 3.6M
   CGroup: /system.slice/cse.service
           ├─911 /bin/sh /home/cse/cse.sh
           └─914 /home/cse/cse-env/bin/python3.7 /home/cse/cse-env/bin/cse run -c /home/cse/config.yaml

Nov 18 12:08:22 cse sh[911]: CSE installation is valid
Nov 18 12:08:23 cse sh[911]: Started thread 'MessageConsumer-0 (139897843828480)'
Nov 18 12:08:23 cse sh[911]: Started thread 'MessageConsumer-1 (139897746745088)'
Nov 18 12:08:24 cse sh[911]: Started thread 'MessageConsumer-2 (139897755137792)'
Nov 18 12:08:24 cse sh[911]: Started thread 'MessageConsumer-3 (139897835435776)'
Nov 18 12:08:24 cse sh[911]: Started thread 'MessageConsumer-4 (139897738352384)'
Nov 18 12:08:24 cse sh[911]: Container Service Extension for vCloud Director
Nov 18 12:08:24 cse sh[911]: Server running using config file: /home/cse/config.yaml
Nov 18 12:08:24 cse sh[911]: Log files: cse-logs/cse-server-info.log, cse-logs/cse-server-debug.log
Nov 18 12:08:24 cse sh[911]: waiting for requests (ctrl+c to close)

Voila!! I’ve successfully configured CSE to support Enterprise PKS cluster creation!! Now, let’s have a look at enabling a tenant to deploy CSE Enterprise clusters via the vcd-cli.

Onboarding Tenants and Provisioning CSE Enterprise Kubernetes Clusters

The first thing I’ll need to do is login to the VCD deployment via the vcd-cli as the System Admin user:

$ vcd login vcd.corp.local system admin -iw

Next, I can use the cse extension to view the existing OvDCs and what Kubernetes provider is enabled for each OvDC:

$ vcd cse ovdc list

name       org       k8s provider
---------  --------  --------------
base-ovdc  base-org  native
acme-ovdc  AcmeCorp  none

From the results above, note that the base-ovdc is enabled for the Kubernetes provider of native, which means users in that org with the correct permissions can create CSE Standard clusters. The acme-ovdc does not have any Kubernetes provider assigned yet, but I want to enable it to support CSE Enterprise cluster creation.

First, I’ll need to instruct the vcd-cli to “use” the AcmeCorp org:

$ vcd org use AcmeCorp
now using org: 'AcmeCorp', vdc: 'acme-ovdc', vApp: ''.

Then, I’ll add the new PKS rights bundle to the AcmeCorp org:

$ vcd right add "{cse}:PKS DEPLOY RIGHT" -o AcmeCorp
Rights added to the Org 'AcmeCorp'

Before moving on to to the next step, I created a new role in the VCD tenant portal by the name of “orgadmin-k8” which mimics the “Org Admin” permissions. I also created a new user named “pks-k8-admin” and assigned that user the “orgadmin-k8” role.

After creating the new role and user, I need to use the vcd-cli to add the PKS DEPLOY RIGHT rights bundle to the custom role:

$ vcd role add-right "orgadmin-k8" "{cse}:PKS DEPLOY RIGHT"
Rights added successfully to the role 'orgadmin-k8'

The last thing I need to do as the System admin is enable the OvDC to support CSE Enterprise cluster with the following command:

$ vcd cse ovdc enable "acme-ovdc" -o AcmeCorp -k ent-pks -p "xsmall" -d "corp.local"

metadataUpdate: Updating metadata for Virtual Datacenter acme-ovdc(07972d80-faf1-48c7-8f7e-cea92ce7cc6e)
task: 9fced925-4ec8-47c3-a45e-7903b14b0c8d, Updated metadata for Virtual Datacenter acme-ovdc(07972d80-faf1-48c7-8f7e-cea92ce7cc6e), result: success

where -k is the Kubernetes provider (PKS in this case), -p is the PKS plan we want clusters to use when provisioned from CSE for this OvDC, and -d is the domain name that PKS will use for the hostname of each cluster.

Note, CSE uses compute profiles to create Availability Zones for each OvDC that is enabled. When vce cse ovdc enable command above is issued, CSE talks to the PKS API to create a compute profile that defines the Availability Zones for all clusters created in this OvDC as the Resource Pool that is assigned to that OvDC. This ensures that users that provision clusters to this OvDC will have their compute isolated (via the Resource Pool) from other clusters provisioned in other OvDCs in the VCD environment.

Verify that I now see a Kubernetes provider for our acme-ovdc after enabling it:

$ vcd cse ovdc list

name       org       k8s provider
---------  --------  --------------
base-ovdc  base-org  native
acme-ovdc  AcmeCorp  ent-pks

There we have it! Now I can tell the pks-k8-admin user that they are ready to provision some CSE Enterprise clusters!!

Provisioning CSE Enterprise Clusters

Now that I, as the system admin, have done the pre-work to enable the Org to support CSE Enterprise clusters, I’m ready to turn the tenants loose and allow them to provision CSE Enterprise Kubernetes clusters.

First, the orgadmin-k8 user will log in to VCD via the vcd-cli:

$ vcd login vcd.corp.local AcmeCorp pks-k8-admin -iw

All they have to do now is run the vcd cse cluster create command and CSE will handle the rest:

$ vcd cse cluster create prod-1 --nodes 2

property                     value
---------------------------  -----------------
kubernetes_master_host       prod-1.corp.local
kubernetes_master_ips        In Progress
kubernetes_master_port       8443
kubernetes_worker_instances  2
last_action                  CREATE
last_action_description      Creating cluster
last_action_state            in progress
name                         prod-1
worker_haproxy_ip_addresses

where prod-1 is the name of our cluster and --nodes is the number of worker nodes assigned to the cluster. As you can see, the FQDN of the master host will be “cluster-name”.”domain” where “domain” was defined when we enabled the OvDC.

Once the cluster has finished provisioning, we can use the cse extension to gather information about the cluster:

$ vcd cse cluster info prod-1

property                     value
---------------------------  -------------------------------------------------
k8s_provider                 ent-pks
kubernetes_master_host       prod-1.corp.local
kubernetes_master_ips        10.40.14.37
kubernetes_master_port       8443
kubernetes_worker_instances  2
last_action                  CREATE
last_action_description      Instance provisioning completed
last_action_state            succeeded
name                         prod-1
network_profile_name
nsxt_network_profile
pks_cluster_name             prod-1---be6cc6cb-b4a3-4bab-8d6f-e6d1499485bd
plan_name                    small
uuid                         7a1283da-d8b4-418c-a5ea-720810195d72
worker_haproxy_ip_addresses

Note that 10.40.14.37 is the IP address of the Kubernetes master node. If I navigate to the NSX-T Manager web UI, I can verify that a virtual server was automatically created within an L4 load balancer to allow external access to the Kubernetes cluster via kubectl.

Now, the tenant can use the cse extension to pull down the Kubernetes config file from the cluster and store it in the default config file location on my local workstation (~/.kube/config):

Note: The config file will use prod-1.corp.local as the Kubernetes master server name so I have added a DNS entry that maps prod-1.corp.local to the IP of my NSX-T virtual server that fronts the Kubernetes master(s).

$ vcd cse cluster config prod-1 > ~/.kube/config

$ kubectl get nodes

NAME                                   STATUS   ROLES    AGE     VERSION
82d3022e-9fbc-4a31-9be2-fecc80e2ab27   Ready    <none>   2d17h   v1.13.5
d35f7324-0f09-440e-81b0-af9ad26481a6   Ready    <none>   2d17h   v1.13.5

Now the pks-k8-admin user has full admin access to their Enterprise PKS Kubernetes cluster and can instantly begin deploying their workloads to the newly created cluster!!

Conclusion

This wraps up my 3 part series on installing and configuring the Container Service Extension to support both CSE Standard and CSE Enterprise cluster creation. Feel free to reach out to me in the comment section or on Twitter if you have any additional questions or comments. Thanks for the read!

Exploring the Nirmata Kubernetes Extension for VMware Cloud Director

If you’ve been following my blog, you know that a lot of the content I publish focuses on VMware’s Container Service Extension and it’s integration with VMware Cloud Director, which allows service providers to create a Kubernetes-as-a-Service experience for their tenants utilizing their existing VCD-managed infrastructure.

Recently, myself and my colleague at VMware, Daniel Paluszek partnered with Nirmata to perform some testing on their new Kubernetes Extension for VMware Cloud Director. The Nirmata Kubernetes Extension for VCD builds on the rich UI experience already present in the VCD tenant portal by providing a workflow for provisioning Kubernetes clusters via CSE using the native UI.

The Native CSE Experience

As I’ve written about in my previous posts on CSE, once a service provider enables a tenant to provision Kubernetes clusters via CSE, tenants will use the vcd-cli with a CSE extension enabled to provision and manage Kubernetes clusters. For example, a tenant would log in to their VCD Org through the vcd-cli and issue the following command to create a Kubernetes cluster via CSE:

$ vcd cse cluster create k8-cluster-1 --network outside --nodes 1

where k8-cluster-1 is the name of the cluster, --network is the OvDC network the cluster will nodes will utilize, and --nodes 1 defines the number of worker nodes the cluster will contain.

While many users are familiar enough with a CLI to adapt to this method of resource provisioning, one piece of feedback we get from our partner community is that they’d like to offer a native UI experience in the tenant portal to allow their end customers to more intuitively provision Kubernetes clusters via VCD. That’s where the Nirmata Kubernetes Extension for VCD comes in…

Utilizing the Nirmata Kubernetes Extension

The Nirmata Kubernetes Extension for VMware Cloud Director is a custom extension created by Nirmata in partnership with the VMware Cloud Director team. The extension is comprised of a VCD UI extension as well as a Nirmata Server, deployed as a docker container, that passes communication between the UI elements, the CSE server, and the Nirmata SaaS platform. Daniel and I put together a detailed write up over at the Nirmata blog so I won’t go too deep in this blog post but wanted to walk through the experience of utilizing the service in the tenant portal.

After a Cloud Admin has onboarded a tenant in CSE and enabled the Nirmata Kubernetes Extension for their org, a tenant will see the Kubernetes option in their tenant portal menu:

After navigating to the Kubernetes page in the tenant portal, they can observe various information about the number of clusters, nodes, and pods deployed in the org. By selecting the Clusters option in the left hand, they are taken to a page that contains information about existing clusters as well as options to provision new clusters or register existing clusters with the extension.

As we can see from the screenshot above, our cse-standard-admin VCD user has already got a handful of clusters deployed in the environment. But what about a cluster that was provisioned outside of the UI? Can we still “see” that within the extension without redeploying? We sure can! We can click the Register button and register the existing cluster. This action communicates with the Nirmata server to deploy the Nirmata controller pod to the cluster to feed information about the cluster back to the UI for visibility:

After the cluster has been registered, we can select the cluster and observe a wealth of information about the cluster itself natively in the UI:

Nirmata also surfaces the idea of “add-ons,” or curated applications, that tenants can deploy directly to their clusters from the UI:

Service Providers can utilize applications curated by the Nirmata team as well as adding their own custom deployments. To take it a step further, Service Providers can create profiles that contain a set of add-ons that will be deployed to a cluster automatically on provisioning.

As far as interacting with existing clusters goes, tenants can also scale clusters in the tenant portal as well, via the extension:

So tenants can managed existing clusters deployed by CSE, what about provisioning net-new workloads? Tenants can visit the Cluster page of the UI extension and select the Create button and provision a Kubernetes cluster with a couple of clicks!!

The tenant defines information such as OvDC, OvDC network, storage policy, and worker node count and Nirmata and CSE handle the rest! In my humble opinion, this a game changer for the service provider community already invested in VCD. By installing and configuring CSE and the Nirmata Kubernetes Extension, they have the foundation in place to build an advanced Kubernetes-as-a-Service offering for their tenants to consume.

Conclusion

Nirmata has done some great work in conjunction with the VMware Cloud Director team to bring Kubernetes cluster provisioning and management directly into the tenant portal of VCD. As I said earlier, Daniel and I collaborated on a more detailed write-up on the Nirmata Kubernetes Extension for VCD that is hosted on the Nirmata blog. We also put together a video walkthrough of the extension, which you can view below:

Feel free to reach out to myself, Daniel or the Nirmata team for any additional feedback or questions around the Nirmata Kubernetes Extension for VCD. Thanks for the read!

Container Service Extension 2.5 Installation: Part 2

Building on Part 1 of my series on installing VMware’s Container Service Extension 2.5.0, in this post, I’ll walk through the process of configuring a client server to interact with CSE via the vcd-cli tool. I’ll also walk through the process of onboarding a tenant as well as the workflow, from the tenant’s perspective, of provisioning and managing a Kubernetes cluster.

Configuring a CSE Client

Now that I’ve deployed my CSE server, I’ll need to utilize the the vcd-cli tool with the CSE client extension enabled in order to interact with the CSE service. For the client server, I am, again, utilizing a CentOS 7.6 server and a Python 3.7.3 virtual environment to install and utilize the vcd-cli tool in this walkthrough.

The first thing I’ll need to do is create and activate my virtual environment, which I will install in the ~/cse-client directory:

$ python3.7 -m virtualenv ~/cse-client
$ source ~/cse-client/bin/activate

Now I’m ready to install the vcd-cli tool. vcd-cli is a command line interface for VMware vCloud Director that allows system administrators and tenants to perform operations from the command line for convenience and automation. Use pip within the virtual environment to install vcd-cli and the Container Service Extension bits:

$ pip install vcd-cli
$ pip install container-service-extension

Now that I’ve installed vcd-cli, I’m going to attempt a login to my vCloud Director environment to create a profile at ~/.vcd-cli/profiles.yaml that we will eventually use to activate the CSE client extension:

$ vcd login director.vcd.zpod.io system administrator -iw
Password: 
administrator logged in, org: 'system', vdc: ''

Note: If you see a python traceback when attempting to log in to the vCloud Director environment that references ModuleNotFoundError: No module named '_sqlite3', you can disable the browsercookie feature by editing the following file within your virtual environment directory:

$ vi <virtual-env-directory>/lib/python3.7/site-packages/vcd_cli/browsercookie/__init__.py

and commenting out the following lines:

#try:
    # should use pysqlite2 to read the cookies.sqlite on Windows
    # otherwise will raise the "sqlite3.DatabaseError: file is encrypted or  is
    # not a database" exception
    #from pysqlite2 import dbapi2 as sqlite3
#except ImportError:
    #import sqlite3

After making the above changes, you should be able to successfully login via the vcd-cli tool.

Now that I have successfully logged in to the vCloud Director environment, I can enable the CSE client in my vcd-cli profile. I’ll use vi to edit my profile:

$ vi ~/.vcd-cli/profiles.yaml 

and add the following lines to the file between the active: and profiles: sections to enable the CSE client. Your file should look like the example below:

active: default
extensions:
- container_service_extension.client.cse
profiles:

---output omitted---

Now, I’ll run a cse command to test my connection to the CSE server from the client:

$ vcd cse system info
property              value
--------------------  ------------------------------------------------------
all_threads           6
config_file           /home/cse/config.yaml
consumer_threads      5
description           Container Service Extension for VMware vCloud Director
product               CSE
python                3.7.3
requests_in_progress  0
status                Running
version               2.5.0

Great!! So now I’ve configured a client to communicate with the CSE server via the CSE client extension for vcd-cli. Now, as the vCD system admin, I’m ready to onboard a new tenant for Kubernetes cluster provisioning via CSE.

Onboarding a Tenant

I’m ready to onboard my first tenant that is interested in deploying Kubernetes cluster in their vCD managed environments.

The first thing I’ll do is examine the Organizations and Organization Virtual Datacenters (OrgVDCs) available in my environment and what Kubernetes providers are assigned to those OrgVDCs, using the cse client:

$ vcd cse ovdc list
name                org                 k8s provider
------------------  ------------------  --------------
base-ovdc           base-org            none

As you can see, in my environment, I have a single org (base-org) and a single OrgVDC (base-ovdc). Currently, the k8 provider value for the OrgVDC is none, so tenants in the base-org can not use CSE to provision clusters.

In order to allow those users to provision clusters, I need to enable the OrgVDC to allow cluster provisioning. The two options for k8 provider are native or enterprise. native is for CSE Standard Kubernetes cluster provisioning while enterprise is used for CSE Enterprise (Enterprise PKS) Kubernetes cluster creation.

Note: These commands must be run as a vCD system administrator

First, I’ll need to instruct vcd-cli to “use” the base-org organization:

$ vcd org use base-org
now using org: 'base-org', vdc: 'base-ovdc', vApp: ''.

Then, as the system administrator, I can enable the base-ovdc to support CSE Standard Kubernetes cluster provisioning:

$ vcd cse ovdc enable base-ovdc --k8s-provider native
metadataUpdate: Updating metadata for Virtual Datacenter base-ovdc(dd7d117e-6034-467b-b696-de1b943e8664)
task: 05706a5a-0469-404f-82b6-559c078f855a, Updated metadata for Virtual Datacenter base-ovdc(dd7d117e-6034-467b-b696-de1b943e8664), result: success

I can now verify the OrgVDC metadata has been updated with the cse command below:

$ vcd cse ovdc list
name                org                 k8s provider
------------------  ------------------  --------------
base-ovdc           base-org            native

Awesome! Now my base-org tenant users have been granted the ability to deploy Kubernetes clusters in their OrgVDC.

A Note Regarding RBAC

If you remember back to Part 1 of my series, I enabled RBAC functionality on the CSE server to allow my tenant admins the ability to control who is able to create Kubernetes clusters in their organizations. Now that I, as the vCD system admin, have enabled the base-org tenant to support Kubernetes cluster creation, it is up to the base-org tenant admin to allow specific users within their org to create clusters.

I have written a detailed blog post for configuring RBAC functionality so I won’t rehash that here, but from a high level, I have performed the following actions in my environment to onboard users in the base-org as the base-org tenant admin:

  • Logged into vcd-cli as a base-org user with the Organizational Admin role
  • Assigned the "{cse}:CSE NATIVE DEPLOY RIGHT" right to a role in org
  • Assigned above role to any user I’d like to be able to deploy Kubernetes clusters via CSE

Now the users within the base-org tenant that have the proper permissions to provision Kubernetes clusters CSE. So let’s see it in action!!

Provisioning (And Managing) Kubernetes Clusters via CSE

For the last section of the post, I’m going to switch personas to a tenant user (cse-native-user) of the base-org within the vCD environment. I have been assigned the "{cse}:CSE NATIVE DEPLOY RIGHT" right by my organization admin and I’m ready to provision clusters.

First, I’ll use vcd-cli to log in to my organization within the vCD environment:

$ vcd login director.vcd.zpod.io base-org cse-native-user -iw
Password: 
cse-native-user logged in, org: 'base-org', vdc: 'base-ovdc'

Once logged in, I’ll use the cse client to examine which Kubernetes templates are available to me:

$ vcd cse template list
name                                    revision  is_default    catalog  
------------------------------------  ----------  ------------  --------- 
ubuntu-16.04_k8-1.15_weave-2.5.2               1  False         cse-25  

And now I’m ready to provision a cluster with the following command:

$ vcd cse cluster create test-cluster -t ubuntu-16.04_k8-1.15_weave-2.5.2 -r 1 \
--network outside --ssh-key ~/.ssh/id_rsa.pub --nodes 1

cluster operation: Creating cluster vApp 'test-cluster' (2ad4df27-a7fd-4a11-bf29-f9e18eea490b) from template 'ubuntu-16.04_k8-1.15_weave-2.5.2' (revision 1), 
cluster operation: Creating master node for test-cluster (2ad4df27-a7fd-4a11-bf29-f9e18eea490b)
cluster operation: Initializing cluster test-cluster (2ad4df27-a7fd-4a11-bf29-f9e18eea490b)
cluster operation: Creating 1 node(s) for test-cluster(2ad4df27-a7fd-4a11-bf29-f9e18eea490b)
cluster operation: Adding 1 node(s) to test-cluster(2ad4df27-a7fd-4a11-bf29-f9e18eea490b)
task: 8d302115-35ef-4566-a95c-f4f0000010e8, Created cluster test-cluster (2ad4df27-a7fd-4a11-bf29-f9e18eea490b), result: success

where -t is the template name, -r is the template revision number, --network is the OrgVDC network we will deploy the Kubernetes nodes on, --ssh-key is the public ssh key CSE will embed in the Kubernetes nodes to allow root access via ssh to the OS of the nodes, and --nodes is the number of worker nodes to be deployed in the cluster.

As you can see from the output of the command, the CSE server is essentially performing the following actions:

  • Creating a vApp in vCD with the cluster name specified in the cluster create command
  • Creating a Kubernetes master node utilizing the vApp template I installed during the CSE server deployment
  • Running post provisioning scripts on the master node to instantiate the VM as a master node
  • Creating a Kubernetes worker node utilizing the vApp template I installed during the CSE server deployment
  • Running post provisioning scripts on the worker node to add it into the cluster, under control of the master node

Once, I have received the final result: success message, I am ready to access my cluster! First, I’ll get some info about the cluster I just provisioned:

vcd cse cluster info test-cluster

property           value
-----------------  -------------------------------------------------------------------------------
cluster_id         2ad4df27-a7fd-4a11-bf29-f9e18eea490b
cse_version        2.5.0
k8s_provider       native
k8s_version        1.15
leader_endpoint    10.96.66.39
master_nodes       {'name': 'mstr-spxa', 'ipAddress': '10.96.66.39'}
name               test-cluster
nfs_nodes
nodes              {'name': 'node-a5i0', 'ipAddress': '10.96.66.43'}
number_of_vms      2
status             POWERED_ON
template_name      ubuntu-16.04_k8-1.15_weave-2.5.2
template_revision  1
vapp_href          https://director.vcd.zpod.io/api/vApp/vapp-17e81bd9-8995-4c4b-8965-1df9ae23e9f9
vapp_id            17e81bd9-8995-4c4b-8965-1df9ae23e9f9
vdc_href           https://director.vcd.zpod.io/api/vdc/d72b0350-9614-4692-a3b9-730c362036c6
vdc_id             d72b0350-9614-4692-a3b9-730c362036c6
vdc_name           base-ovdc

The cluster info command will give me information about cluster, including the IP addresses of the nodes, as well as the current state of the cluster, and template used to create said cluster, among other things.

Now, I’ve provisioned a cluster and I’m ready to deploy some applications!! First, I need to use CSE to obtain the cluster config file that will allow me to access the cluster via native Kubernetes tooling like kubectl:

$ vcd cse cluster config test-cluster > ~/.kube/config

The above command will grab the cluster config file from the master node of the test-cluster and pipe it into a file at the default location used by kubectl (`~/.kube/config) for cluster config files.

Now, I’ll verify connectivity to the cluster via kubectl:

$ kubectl get nodes
NAME        STATUS   ROLES    AGE     VERSION
mstr-spxa   Ready    master   11m     v1.15.3
node-a5i0   Ready    <none>   8m15s   v1.15.3

Great, my cluster is up and running!! But I only deployed with 1 worker node… What if I want to add more? Do I have to redeploy? Nope!! CSE can add (and remove) worker nodes to existing clusters with the following command:

$ vcd cse cluster resize test-cluster --nodes 2 --network outside

where --nodes is the total number of worker nodes in the cluster. So in the example above, I added 1 additional worker nodes to my cluster because my original worker node count was 1.

Note: You will need to use -t and -r flags in the above command to specific the template and revision if you are not using the default template defined in the CSE server configuration file.

After performing all of my testing, I decided I’m going to delete my cluster with the following command:

$ vcd cse cluster delete test-cluster
Are you sure you want to delete the cluster? [y/N]: y

This command will delete the vApp that was created to house the cluster, which includes all components of the Kubernetes cluster. For additional information on managing Kubernetes clusters with CSE, refer to the product documentation.

Conclusion

Well if you’ve made it this far, congratulations!! I hope this walk through of installation and configuration of Container Service Extension 2.5.0 was informative. Keep an eye on the blog for more articles on Day 2 operations coming down the pipe!!

Container Service Extension 2.5 Installation: Part 1

With the recent release of the Container Service Extension 2.5.0, I wanted to take some time to walk through the installation and configuration of the Container Service Extension (CSE) server in conjunction with VMware vCloud Director 10.

This will be a series of 3 blog posts that cover the following topics:

Container Service Extension Overview

Before we get started, I wanted to talk a bit about CSE and what purpose it serves in a Service Provider’s environment. The Container Service Extension is a VMware vCloud Director extension that helps tenants create, lifecycle manage, and interact with Kubernetes clusters in vCloud Director-managed environments.

There are currently two versions of CSE: Standard and Enterprise. CSE Standard brings Kubernetes-as-a-Service to vCD by creating customized vApp templates and enabling tenant/organization administrators to deploy fully functional Kubernetes clusters in self-contained vApps. CSE Standard cluster creation can be enabled on existing NSX-V backed OrgVDCs in a tenant’s environment. With the release of CSE Enterprise in the CSE 2.0 release, VMware has also added the ability for tenants to provision VMware Enterprise PKS Kubernetes clusters back by NSX-T resources in vCloud Director managed environments. In this blog post, I am going to focus on the enablement of CSE Standard Kubernetes cluster creation in an existing vCloud Director OvDC.

For more information on CSE, have a look at the Kubernetes-as-a-Service in vCloud Director reference architecture (authored by yours truly 😄) as well as the CSE Installation Documentation.

Prerequisites

In order to install CSE 2.5.0, please ensure you review the CSE Server Installation Prerequisites section of the CSE documentation to ensure you have fulfilled all of the vCD specific requirements to support CSE Standard Kubernetes cluster deployment. As mentioned in the aforementioned documentation, VMware recommends utilizing a user with System administrator in the vCD environment for CSE server management.

Along with the prereqs mentioned in the documentation above, please ensure you have a RabbitMQ server available as the CSE server utilizes AMQP as a messaging queue to communicate with the vCD cell, as referenced in the diagram below:

For vCloud Director 10, you will need to deploy RabbitMQ 3.7.x (see vCloud Director Release notes for RabbitMQ compatibility information). For more information on deploying RabbitMQ, please refer to the RabbitMQ installation documentation.

Finally, CSE requires Python 3.7.3 or later at the time of this writing. In this walkthrough, I have chosen to install the CSE Server on a CentOS 7.6 install within a Python 3.7.3 virtual environment but any variant of Linux that supports Python 3.7.3 installations will suffice. For more information on configuring a virtual environment to support a CSE Server installation, see my earlier blog post which walks through the process.

Installing CSE Server 2.5.0

Now that I’ve established the prereqs, I am ready to install the bits that will support the CSE server installation.

Note: The following commands will need to be run on the Linux server hosting the CSE server installation.

First thing’s first, I’ll create a cse user that I’ll use to manage our CSE server:

# useradd cse
# passwd cse
# su - cse

Now, after creating our Python 3.7.3 virtual environment, I’ll need to activate it. I created my virtual environment in the ~/cse-env directory:

$ source ~/cse-env/bin/activate

Note: After activating the virtual environment, you should see a (virual-environment-name) appended to the front of your bash prompt to confirm you are operating in the virtual environment.

Now I’m ready to install the CSE server bits within the virtual environment! Utilize pip to pull down the CSE packages:

$ pip install container-service-extension

Verify CSE is installed and the version is 2.5.0

$ cse version
CSE, Container Service Extension for VMware vCloud Director, version 2.5.0

Now I’m ready to build the configuration file and deploy the CSE server!!

Container Service Extension Configuration File

The CSE server utilizes a yaml config file that contains information about the vCloud Director/vCenter infrastructure that will be supporting the Kubernetes cluster deployments. The config file also contains information regarding the RabbitMQ broker that I configured in Part 1 of the series. This config file will be used to install and run the CSE service on the CSE server.

Before we get started, I wanted to take some time to talk about how CSE deploys Kubernetes clusters. CSE uses customized VM templates (Kubernetes templates) as building blocks for deployment of Kubernetes clusters. These templates are crucial for CSE to function properly. New in version 2.5.0, CSE utilizes “pre-configured” template definitions hosted on a remote repository.

Templates vary by guest OS (e.g. PhotonOS, Ubuntu), as well as software versions, like Kubernetes, Docker, and Weave. Each template name is uniquely constructed based on the flavor of guest OS, Kubernetes, and Weave versions. The definitions of different templates reside in an official location hosted at a remote repository URL. The CSE sample config file, out of the box, points to the official location of those templates definitions. The remote repository is officially managed by maintainers of the CSE project. For more information on template management in CSE, refer to the CSE documentation.

Now that we’ve discussed some of the changes for template management in CSE 2.5.0, I’m ready to start our CSE server installation.

If you’ll remember back to Part 1 of the series, I installed the CSE bits within a Python 3.7.3 virtual environment, so the first thing I’ll do is activate that virtual environment and verify our CSE version:

Note: All commands below should be run from the CSE server CLI.

$ source cse-env/bin/activate


$ cse version
CSE, Container Service Extension for VMware vCloud Director, version 2.5.0

I’ll use the cse command to generate a sample file (I’m calling mine config.yaml) that I can use to build out my config file for my CSE installation:

$ cse sample -o config.yaml

Great! Now I have a skeleton configuration file to use to build out my CSE server config file. Let’s have a look at each section of the config file.

amqp section

The amqp section of the config file contains information about the RabbitMQ AMQP broker that the CSE server will use to communicate with the vCloud Director instance. Let’s have a look at my completed amqp section below. All of the values used below are from my lab and some will differ for your deployment:

amqp:
  exchange: cse-exchange      <--- RabbitMQ exchange name
  host: rabbitmq.vcd.zpod.io  <--- RabbitMQ hostname
  password: <password>        <--- RabbitMQ user's password
  port: 5672                  <--- RabbitMQ port (default is 5672)
  prefix: vcd                 <--- default value, can be left as is
  routing_key: cse            <--- default value, can be left as is
  ssl: false                  <--- Set to "true" if using SSL for RabbitMQ connections
  ssl_accept_all: false       <--- Set to "true" if using SSL and utilizing self-signed certs
  username: cse-amqp          <--- RabbitMQ username (with access to the vhost)
  vhost: /                    <--- RabbitMQ virtual host that contains the exchange

The exchange defined in the file above will be created by the CSE server on install (if it doesn’t already exist). This exchange should NOT be the same one configured in the Extensibility section of the vCD Admin Portal. However, the Extensibility section of the vCD Admin Portal must be configured using the same virtual host (/ in my example above). See screenshot below for my an example of my vCD Extensibility config:

No manual config is required on the RabbitMQ server side aside from ensuring the RabbitMQ user (cse-amqp in the example above) has full access to the virtual host. See my previous post on Deploying vCloud Director for information on creating RabbitMQ users.

vcd section

As you might guess, this section of the config file contains information regarding the vCloud Director instance that CSE will communicate with via the API. Let’s have a look at the vcd config

vcd:
  api_version: '33.0'            <--- vCD API version
  host: director.vcd.pzod.io     <--- vCD Hostname
  log: true                      <--- Set to "true" to generate log files for CSE/vCD interactions
  password: my_secret_password   <--- vCD system admin's password
  port: 443                      <--- default value, can be left as is unless otherwise needed 
  username: administrator        <--- vCD system admin username
  verify: false                  <--- Set to "true" to verify SSL certificates

vcs section

In this section, we define the vCenter instances that are being managed by vCD. CSE needs access to the vCenter appliances in order to perform guest operation modifications, queries, and program execution. In my lab, my vCD deployment is managing 2 vCSA instances. You can add additional if required:

vcs:
- name: vc-pks                           <--- vCenter name as it appears in vCD
  password: <password>                   <--- administrator@vsphere.local's password
  username: administrator@vsphere.local  <--- vCenter admin's username
  verify: false                          <--- Set to "true" to verify SSL certificates
- name: vc-standard
  password: <password>
  username: administrator@vsphere.local
  verify: false

service section

The service section is small and really only has one config decision to make. If the enforce_authorization flag is set to false, ANY user that has permissions to create vApps in any Org in the vCD environment can provision Kubernetes clusters via CSE. If set to true, you can utilize RBAC functionality to grant specific Orgs and specific users within those Orgs rights to create clusters. When set to true, the enforce_authorization flag defaults to refusing any request to create Kubernetes clusters via CSE unless a user (and its org) has the proper rights assigned to allow the operation. For more information on configuring RBAC, see my previous blog post that walks through RBAC enablement scenarios (although the blog post was authored utilizing CSE 2.0, the constructs have not changed in 2.5.0).

service:
  enforce_authorization: true
  listeners: 5                  <--- number of threads CSE server can utilize
  log_wire: false               <--- if set to "true", will log all REST calls initiated by CSE to vCD

broker section

Here’s where all the magic happens!! The broker sections is where we define where and how the CSE server will deploy the first Kubernetes cluster that will serve as a basis for a vApp template that will be used for tenants’ Kubernetes cluster deployments.

  • The catalog value is the name CSE will use when creating a publicly shared catalog within my org for storing the vApp templates(s). The CSE server will create this catalog in vCD when I install the CSE server.

  • The default_template_name value is the template name that CSE will use by default when users deploy Kubernetes clusters via CSE without defining a specific template. Refer to the following link from the CSE documentation for available template names and revision numbers.

  • The default_template_revision value is a numerical value associated with the version of the template released by VMware. At the time of writing, all available templates are at revision 1.

  • The ip_allocation_mode value is the mode to be used during the install process to build the template. Possible values are dhcp or pool. During creation of clusters for tenants, pool IP allocation mode is always used.

  • The network value is an OrgVDC Network within the OrgVDC that will be used during the install process to build the template. It should have outbound access to the public internet in order to reach the template repository. The CSE server does not need to be connected to this network.

  • The org value is the organization that contains the shared catalog where the Kubernetes vApp templates will be stored.

  • The remote_template_cookbook_url value is the URL of the template repository where all template definitions and associated script files are hosted. This is new in CSE 2.5.0.

  • The storage_profile is the name of the storage profile to use when creating the temporary vApp used to build the Kubernetes cluster vApp template.

  • The vdc value is the virtual datacenter within the org (defined above) that will be used during the install process to build the vApp template.

Here is an example of the completed broker section:

broker:
  catalog: cse-25
  default_template_name: ubuntu-16.04_k8-1.15_weave-2.5.2
  default_template_revision: 1
  ip_allocation_mode: pool
  network: outside
  org: cse_25_test
  remote_template_cookbook_url: https://raw.githubusercontent.com/vmware/container-service-extension-templates/master/template.yaml
  storage_profile: '*'
  vdc: cse_vdc_1

template_rules section

This section is new in CSE 2.5.0 and is entirely optional. The template_rules section allows system admins to utilize vCD compute policies to limit which users have access to which Kubernetes templates. By default, any user that has access to create Kubernetes clusters via CSE also has access to all templates available. Use the template_rules section, along with compute policies, to limit which users have access to which Kubernetes templates.

pks_config section

This section points to a seperate .yaml config file that contains information about a VMware Enterprise PKS deployment if you are intended to utilize CSE Enterprise as well. Refer to [Part 3}(https://mannimal.blog/2019/11/22/container-service-extension-2-5-installation-part-3/) of my series for information on building the pks_config.yaml file.

Note: System admins can add CSE Enterprise capabilities via the pks_config flag at any point after CSE server installation, it does not have to be set on initial install.

pks_config: null  <--- Set to name of .yaml config file for CSE Enterprise cluster deployment

Now that I’ve gone over the config file, I am ready to proceed with my installation of the CSE server!!

CSE Server Installation and Validation

Before starting the install, we need to set the correct permissions on the config file:

chmod 600 config.yaml

After building out the config file, I’ll simple need to run the following command to install CSE in the environment. I’ll use the --skip-template-creation flag to ensure the configuration is sound and install the desired template in a subsequent command:

cse install -c config.yaml --skip-template-creation

Required Python version: >= 3.7.3
Installed Python version: 3.7.3 (default, Sep 16 2019, 12:54:43) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
Validating config file 'config.yaml'
Connected to AMQP server (rabbitmq.vcd.zpod.io:5672)
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Connected to vCloud Director (director.vcd.zpod.io:443)
Connected to vCenter Server 'vc-standard' as 'administrator@vcd.zpod.io' (vcsa.vcd.zpod.io:443)
Connected to vCenter Server 'vc-pks' as 'administrator@pks.zpod.io' (vcsa.pks.zpod.io:443)
Config file 'config.yaml' is valid
Installing CSE on vCloud Director using config file 'config.yaml'
Connected to vCD as system administrator: director.vcd.zpod.io:443
Checking for AMQP exchange 'cse-exchange'
AMQP exchange 'cse-exchange' is ready
Updated cse API Extension in vCD
Right: CSE NATIVE DEPLOY RIGHT added to vCD
Right: CSE NATIVE DEPLOY RIGHT assigned to System organization.
Right: PKS DEPLOY RIGHT added to vCD
Right: PKS DEPLOY RIGHT assigned to System organization.
Created catalog 'cse-25'
Skipping creation of templates

Great!! I’ve installed the CSE Server. Now I’m ready to deploy a Kubernetes cluster vApp template into my cse-25 catalog. I can obtain a template name from the Template Announcement section of the CSE documentation. I can also use the following cse command from the CLI of the CSE server to query available templates:

$ cse template list -d remote

I can also define an ssh-key that will be injected into the VMs that are provisioned to act as the Kubernetes nodes with the --ssh-key flag. The system admin could then use the private ssh-key to access the OS of the Kubernetes nodes’ operating system via SSH. I’ll use the following cse command to install the Ubuntu Kubernetes template:

$ cse template install ubuntu-16.04_k8-1.15_weave-2.5.2 –-ssh-key id_rsa.pub

This command pulls down a Ubuntu OVA to the CSE server and then pushes it to the vCD environment, creates a set of VMs, and performs all required post provisioning customization to create a functioning Kubernetes cluster.

After the Kubernetes cluster is created, CSE creates a vApp template based on the cluster and then deletes the running cluster from the environment. This vApp template will then be used by CSE to create Kubernetes clusters when tenants use the vcd-cli to create clusters.

Now I’m finally ready to test our install with the cse run command, which will run the CSE service in the current bash shell:

$ cse run

---output omitted---

AMQP exchange 'vcd' exists
CSE on vCD is currently enabled
Found catalog 'cse-25'
CSE installation is valid
Started thread 'MessageConsumer-0 (140180650903296)'
Started thread 'MessageConsumer-1 (140180417672960)'
Started thread 'MessageConsumer-2 (140180634117888)'
Started thread 'MessageConsumer-3 (140180642510592)'
Started thread 'MessageConsumer-4 (140180409280256)'
Container Service Extension for vCloud Director
Server running using config file: config.yaml
Log files: cse-logs/cse-server-info.log, cse-logs/cse-server-debug.log
waiting for requests (ctrl+c to close)

Awesome!! We can see the AMQP threads are created in the output and the server is running using my config file. Use ctrl+c to stop the service and return to the command prompt.

Controlling the CSE Service with systemctl

As you can see above, I can manually run the CSE Server with the cse run command, but it makes more sense to be able to automate the starting and stopping of the CSE service. To do that, I’ll create a systemd unit file and manage the CSE service via systemctl.

First, I’ll need to create a script that the systemd unit file will refer to in order to start the service. My virtual environment is located at /home/cse/cse-env and my CSE config file is located at /home/cse/config.yaml.

I’ll use vi to create the cse.sh file:

$ vi ~/cse.sh

And add the following text to the new file and save:

#!/usr/bin/env bash

source /home/cse/cse-env/bin/activate
cse run -c /home/cse/config.yaml

Now that I’ve created the start script, I need to create a unit file for systemd. I’ll access the root user on the CSE server:

$ su -

Now I’m ready to create the unit file. I’ll use vi to create the /etc/systemd/system/cse.service file:

# vi /etc/systemd/system/cse.service

And add the following text to the file:

[Service]
ExecStart=/bin/sh /home/cse/cse.sh
Type=simple
User=cse
WorkingDirectory=/home/cse
Restart=always
[Install]
WantedBy=multi-user.target

After adding the unit file, I’ll need to reload the systemctl daemon:

# systemctl daemon-reload

Now I’ll start the CSE service and enable it to ensure it starts automatically on boot:

# systemctl start cse
# systemctl enable cse

Finally, I’ll check the status of the service to ensure it is active and verify we see the messaging threads:

# service cse status
Redirecting to /bin/systemctl status cse.service
● cse.service
   Loaded: loaded (/etc/systemd/system/cse.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-10-10 17:00:50 EDT; 13s ago
 Main PID: 9621 (sh)
   CGroup: /system.slice/cse.service
           ├─9621 /bin/sh /home/cse/cse.sh
           └─9624 /home/cse/cse-ga/bin/python3.7 /home/cse/cse-ga/bin/cse run -c /home/cse/config.yaml

Oct 10 17:00:59 cse-25.vcd.zpod.io sh[9621]: CSE installation is valid
Oct 10 17:01:00 cse-25.vcd.zpod.io sh[9621]: Started thread 'MessageConsumer-0 (139712918025984)'
Oct 10 17:01:00 cse-25.vcd.zpod.io sh[9621]: Started thread 'MessageConsumer-1 (139712892847872)'
Oct 10 17:01:00 cse-25.vcd.zpod.io sh[9621]: Started thread 'MessageConsumer-2 (139712901240576)'
Oct 10 17:01:01 cse-25.vcd.zpod.io sh[9621]: Started thread 'MessageConsumer-3 (139712909633280)'
Oct 10 17:01:01 cse-25.vcd.zpod.io sh[9621]: Started thread 'MessageConsumer-4 (139712882005760)'
Oct 10 17:01:01 cse-25.vcd.zpod.io sh[9621]: Container Service Extension for vCloud Director
Oct 10 17:01:01 cse-25.vcd.zpod.io sh[9621]: Server running using config file: /home/cse/config.yaml
Oct 10 17:01:01 cse-25.vcd.zpod.io sh[9621]: Log files: cse-logs/cse-server-info.log, cse-logs/cse-server-debug.log
Oct 10 17:01:01 cse-25.vcd.zpod.io sh[9621]: waiting for requests (ctrl+c to close)

Success!! Now I’m ready to start interacting with the CSE server with the CSE client via the vcd-cli tool.

Conclusion

In Part 1 of my series on CSE Installation, I detailed the steps required to install the CSE 2.5.0 bits within a Python 3.7.3 virtual environment. I also took a detailed look at the configuration file used to power the CSE Server before installing and running the server itself.

Join me in Part 2 of this series on the Container Service Extension where I’ll walk through configuring a tenant to allow provisioning of Kubernetes cluster via the CSE extension in vcd-cli!!

Backing Up Your Kubernetes Applications with Velero v1.1

In this post, I’m going to walk through the process of installing and using Velero v1.1 to back up a Kubernetes application that includes persistent data stored in persisentvolumes. I will then simulate a DR scenario by completely deleting the application and using Velero to restore the application to the cluster, including the persistent data.

Meet Velero!! ⛵

Velero is a backup and recovery solution built specifically to assist in the backup (and migration) of Kubernetes applications, including their persistent storage volumes. You can even use Velero to back up an entire Kubernetes cluster for restore and/or migration! Velero address various use cases, including but not limited to:

  • Taking backups of your cluster to allow for restore in case of infrastructure loss/corruption
  • Migration of cluster resources to other clusters
  • Replication of production cluster/applications to dev and test clusters

Velero is essentially comprised of two components:

  • A server that runs as a set of resources with your Kubernetes cluster
  • A command-line client that runs locally

Velero also supports the back up and restore of Kubernetes volumes using restic, an open source backup tool. Velero will need to utilize a S3 API-compatible storage server to store these volumes. To satisfy this requirement, I will also deploy a Minio server in my Kubernetes cluster so Velero is able to store my Kubernetes volume backups. Minio is a light weight, easy to deploy S3 object store that you can run on premises. In a production environment, you’d want to deploy your S3 compatible storage solution in another cluster or environment to prevent from total data loss in case of infrastructure failure.

Environment Overview

As a level set, I’d like to provide a little information about the infrastructure I am using in my lab environment. See below for infrastructure details:

  • VMware vCenter Server Appliance 6.7u2
  • VMware ESXi 6.7u2
  • VMware NSX-T Datacenter 2.5.0
  • VMware Enterprise PKS 1.5.0

Enterprise PKS handles the Day1 and Day2 operational requirements for deploying and managing my Kubernetes clusters. Click here for additional information on VMware Enterprise PKS.

However, I do want to mention that Velero can be installed and configured to interact with ANY Kubernetes cluster of version 1.7 or later (1.10 or later for restic support).

Installing Minio

First, I’ll deploy all of the components required to support the Velero service, starting with Minio.

First things first, I’ll create the velero namespace to house the Velero installation in the cluster:

$ kubectl create namespace velero

I also decided to create a dedicated storageclass for the Minio service to use for its persistent storage. In Enterprise PKS Kubernetes clusters, you can configure the vSphere Cloud Provider plugin to dynamically create VMDKs in your vSphere environment to support persistentvolumes whenever a persistentvolumeclaim is created in the Kubernetes cluster. Click here for more information on the vSphere Cloud Provider plugin:

$ kubectl create -f minio-storage-class.yaml 


kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: minio-disk
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin

Now that we have a storage class, I’m ready to create a persistentvolumeclaim the Minio service will use to store the volume backups via restic. As you can see from the example .yaml file below, the previously created storageclass is referenced to ensure the persistentvolume is provisioned dynamically:

$ cat minio-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: velero-claim
  namespace: velero
  annotations:
    volume.beta.kubernetes.io/storage-class: minio-disk
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi


$ kubectl create -f minio-pvc.yaml

Verify the persistentvolumeclaim was created and its status is Bound:

$ kubectl get pvc -n velero

NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
minio-claim   Bound    pvc-cc7ac855-e5f0-11e9-b7eb-00505697e7e7   6Gi        RWO            minio-disk     8s

Now that I’ve created the storage to support the Minio deployment, I am ready to create the Minio deployment. Click here for access to the full .yaml file for the Minio deployment:

$ kubectl create -f minio-deploy.yaml 

deployment.apps/minio created
service/minio created
secret/cloud-credentials created
job.batch/minio-setup created
ingress.extensions/velero-minio created

Use kubectl to wait for the minio-xxxx pod to enter the Running status:

$ kubectl get pods -n velero -w

NAME                    READY   STATUS              RESTARTS   AGE
minio-754667444-zc2t2   0/1     ContainerCreating   0          4s
minio-setup-skbs6       1/1     Running             0          4s
NAME                    READY   STATUS              RESTARTS   AGE
minio-754667444-zc2t2   1/1     Running             0          9s
minio-setup-skbs6       0/1     Completed           0          11s

Now that our Minio application is deployed, we need to expose the Minio service to requests outside of the cluster via a LoadBalancer service type with the following command:

$ kubectl expose deployment minio --name=velero-minio-lb --port=9000 --target-port=9000 --type=LoadBalancer --namespace=velero

Note, because of the integration between VMware Enterprise PKS and VMware NSX-T Datacenter, when I create a “LoadBalancer” service type in the cluster, the NSX Container Plugin, which we are using as our Container Network Interface, reaches out to the NSX-T API to automatically provision a virtual server in a NSX-T L4 load balancer.

I’ll use kubectl to retrieve the IP of the virtual server created within the NSX-T load balancer and access the Minio UI in my browser at EXTERNAL-IP:9000 I am looking for the IP address under the EXTERNAL-IP section for the velero-minio-lb service, 10.96.59.116 in this case:

$ kubectl get services -n velero

NAME              TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)          AGE
minio             ClusterIP      10.100.200.160   <none>         9000/TCP         7m14s
velero-minio-lb   LoadBalancer   10.100.200.77    10.96.59.116   9000:30711/TCP   12s

Now that Minio has been succesfully deployed in the my Kubernetes cluster, I’m ready to move on to the next section to install and configure Velero and restic.

Installing Velero and Restic

Now that I have an s3-compatible storage solution deployed in my environment, I am ready to complete the installation of Velero (and restic).

However, before I move forward with the installation of Velero, I need to install the Velero CLI client on my workstation. The instructions detailed below will allow you to install the client on a Linux server (I’m using a CentOS 7 instance).

First, I navigated to the Velero github releases page and copied the link for the v1.1 .rpm file for my OS distribution:

Then, I used wget to pull the image down to my linux server, extracted the contents of the file, and moved the velero binary into my path:

$ cd ~/tmp

$ wget https://github.com/vmware-tanzu/velero/releases/download/v1.1.0/velero-v1.1.0-linux-amd64.tar.gz

$ tar -xvf https://github.com/vmware-tanzu/velero/releases/download/v1.1.0/velero-v1.1.0-linux-amd64.tar.gz

$ sudo mv velero-v1.1.0-linux-amd64/velero /usr/bin/velero

Now that I have the Velero client installed on my server, I am ready to continue with the installation.

I’ll create a credentials-velero file that we will use during install to authenticate against the Minio service. Velero will use these credentials to access Minio to store volume backups:

$ cat credentials-velero

[default]
aws_access_key_id = minio
aws_secret_access_key = minio123

Now I’m ready to install Velero! The following command will complete the installation of Velero (and restic) where:

  • --provider aws instructs Velero to utilize S3 storage which is running on-prem, in my case
  • --secret-file is our Minio credentials
  • --use-restic flag ensures Velero knows to deploy restic for persistentvolume backups
  • --s3Url value is the address of the Minio service that is only resolvable from within the Kubernetes cluster * --publicUrl value is the IP address for the LoadBalancer service that allows access to the Minio UI from outside of the cluster:
$ velero install --provider aws --bucket velero --secret-file credentials-velero \ 
--use-volume-snapshots=false --use-restic --backup-location-config \ 
region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000,publicUrl=http://10.96.59.116:9000

Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

Note: The velero install command creates a set of CRDs that power the Velero service. You can run velero install --dry-run -o yaml to output all of the .yaml files used to create the Velero deployment.

After the installation is complete, I’ll verify that I have 3 restic-xxx pods and 1 velero-xxx pod deployed in the velero namespace. As the restic service is deployed as a daemonset, I will expect to see a restic pod per node in my cluster. I have 3 worker nodes so I should see 3 restic pods:

Note: Notice the status of the restic-xxx pods…

$ kubectl get pod -n velero
NAME                      READY   STATUS             RESTARTS   AGE
minio-5559c4749-7xssq     1/1     Running            0          7m21s
minio-setup-dhnrr         0/1     Completed          0          7m21s
restic-mwgsd              0/1     CrashLoopBackOff   4          2m17s
restic-xmbzz              0/1     CrashLoopBackOff   4          2m17s
restic-235cz              0/1     CrashLoopBackOff   4          2m17s
velero-7d876dbdc7-z4tjm   1/1     Running            0          2m17s

As you may notice, the restic pods are not able to start. That is because in Enterprise PKS Kubernetes clusters, the path to the pods on the nodes is a little different (/var/vcap/data/kubelet/pods) than in “vanilla” Kubernetes clusters (/var/lib/kubelet/pods). In order to allow the restic pods to run as expected, I’ll need to edit the restic daemon set and change the hostPath variable as referenced below:

$ kubectl edit daemonset restic -n velero


volumes:
      - hostPath:
          path: /var/vcap/data/kubelet/pods
          type: ""
        name: host-pods

Now I’ll verify all of the restic pods are in Running status:

$ kubectl get pod -n velero

NAME                      READY   STATUS      RESTARTS   AGE
minio-5559c4749-7xssq     1/1     Running     0          12m
minio-setup-dhnrr         0/1     Completed   0          12m
restic-p4d2c              1/1     Running     0          6s
restic-xvxkh              1/1     Running     0          6s
restic-e31da              1/1     Running     0          6s
velero-7d876dbdc7-z4tjm   1/1     Running     0          7m36s

Woohoo!! Velero is successfully deployed in my Kubernetes clusters. Now I’m ready to take some backups!!

Backup/Restore the WordPress Application using Velero

Now that I’ve deployed Velero and all of its supporting components in my cluster, I’m ready to perform some backups. But in order to taste my backup/recovery solution, I’ll need an app that preferably utilizes persistent data.

In one of my previous blog posts, I walked through the process of deploying Kubeapps in my cluster to allow me to easily deploy application stacks to my Kubernetes cluster.

For this exercise, I’ve used Kubeapps to deploy a WordPress blog that utilizes persistentvolumes to store post data for my blog. I’ve also populated the blog with a test post to test backup and recovery.

First, I’ll verify that the WordPress pods are in a Running state:

$ kubectl get pods -n wordpress

NAME                                  READY   STATUS    RESTARTS   AGE
cut-birds-mariadb-0                   1/1     Running   0          23h
cut-birds-wordpress-fbb7f5b76-lm5bh   1/1     Running   0          23h

I’ll also verify the URL of my blog and access it via my web browser to verify current state:

$ kubectl get svc -n wordpress

NAME                  TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
cut-birds-mariadb     ClusterIP      10.100.200.39   <none>         3306/TCP                     19h
cut-birds-wordpress   LoadBalancer   10.100.200.32   10.96.59.116   80:32393/TCP,443:31585/TCP   19h

Everything looks good, especially the cat!!

In order for Velero to understand where to look for persistent data to back up, in addition to other Kubernetes resources in the cluster, we need to annotate each pod that is utilizing a volume so Velero backups up the pods AND the volumes.

I’ll review both of the pods in the wordpress namespace to view the name of each volume being used by each pod:

$ kubectl describe pod/cut-birds-mariadb-0 -n wordpress

---output omitted---

Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-cut-birds-mariadb-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cut-birds-mariadb
    Optional:  false
  default-token-6q5xt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-6q5xt
    Optional:    false


$ kubectl describe pods/cut-birds-wordpress-fbb7f5b76-lm5bh -n wordpress

---output omitted---

Volumes:
  wordpress-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  cut-birds-wordpress
    ReadOnly:   false
  default-token-6q5xt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-6q5xt
    Optional:    false

As you can see, the mariadb pod is using 2 volumes: data and config, while the wordpress pod is utilizing a single volume: wordpress-data.

I’ll run the following commands to annotate each pod with the backup.velero.io tag with each pods’ corresponding volume(s):

$ kubectl -n wordpress annotate pod/cut-birds-mariadb-0 backup.velero.io/backup-volumes=data,config
$ kubectl -n wordpress annotate pod/cut-birds-wordpress-fbb7f5b76-lm5bh backup.velero.io/backup-volumes=wordpress-data

Now I’m ready to use the velero client to create a backup. I’ll name the backup wordpress-backup and ensure the backup only includes the resources in the wordpress namespace:

$ velero backup create wordpress-backup --include-namespaces wordpress

Backup request "wordpress-backup" submitted successfully.
Run `velero backup describe wordpress-backup` or `velero backup logs wordpress-backup` for more details.

I can also use the velero client to ensure the backup is compeleted by waiting for Phase: Complete:

$ velero backup describe wordpress-backup

Name:         wordpress-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

--output omitted--

I’ll navigate back to the web browser and refresh (or log back into) the Minio UI. Notice the restic folder, which holds houses our backups persistent data, as well as a backups folder:

I’ll select the backups folder and note the wordpress-backup folder in the subsequent directory. I’ll also explore the contents of the wordpress-backup folder, which contains all of the Kubernetes resources from mywordpress namespace:

Now that I’ve confirmed my backup was successful and have verified the data has been stored in Minio via the web UI, I am ready to completely delete my WordPress application. I will accomplish this by deleting the wordpress namespace, which will delete all resources created in the namespace to support the WordPress application, even the persistentvolumeclaims

$ kubectl delete namespace wordpress


$ kubectl get pods -n wordpress
$ kubectl get pvc -n wordpress

After I’ve confirmed all of the resources in the wordpress namespace have been deleted, I’ll refresh the browser to verify the blog is no longer available.

Now we’re ready to backup!! I’ll use the velero client to verify the existence/name of the backup that was previously created and restore the backup to the cluster:

$ velero backup get

NAME               STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
wordpress-backup   Completed   2019-10-03 15:47:07 -0400 EDT   29d       default            <none>


$ velero restore create --from-backup wordpress-backup

I can monitor the pods in the wordpress namespace and wait for both pods to show 1/1 in the READY column and Running in the STATUS column:

$ kubectl get pods -n wordpress -w

NAME                                  READY   STATUS     RESTARTS   AGE
cut-birds-mariadb-0                   0/1     Init:0/1   0          12s
cut-birds-wordpress-fbb7f5b76-qtcpp   0/1     Init:0/1   0          13s
cut-birds-mariadb-0                   0/1     PodInitializing   0          18s
cut-birds-mariadb-0                   0/1     Running           0          19s
cut-birds-wordpress-fbb7f5b76-qtcpp   0/1     PodInitializing   0          19s
cut-birds-wordpress-fbb7f5b76-qtcpp   0/1     Running           0          20s
cut-birds-mariadb-0                   1/1     Running           0          54s
cut-birds-wordpress-fbb7f5b76-qtcpp   1/1     Running           0          112s

Then, I can verify the URL of the WordPress blog:

$ kubectl get services -n wordpress

NAME                  TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
cut-birds-mariadb     ClusterIP      10.100.200.39   <none>         3306/TCP                     2m56s
cut-birds-wordpress   LoadBalancer   10.100.200.32   10.96.59.120   80:32393/TCP,443:31585/TCP   2m56s

And finally, I can access the URL of the blog in the web broswer, confirm the test post that was visible initially is still present:

There you have it!! Our application and it’s persistent data have been completely restored!!

In this example, we manually created a backup, but we can also use the Velero client to schedule backups on a certain interval. See examples below:

velero schedule create planes-daily --schedule="0 1 * * *" --include-namespaces wordpress
velero schedule create planes-daily --schedule="@daily" --include-namespaces wordpress

Conclusion

In this blog post, I walked through the process of installing Velero in a Kubernetes cluster, including all it’s required components, to support taking backups of Kubernetes resources. I also walked through the process for taking a backup, simulating a data loss scenario, and restoring that backup to the cluster.

Using Harbor and Kubeapps to Serve Custom Helm Charts

In my last post, I walked through the process of deploying Kubeapps in an Enterprise PKS Kubernetes cluster. In this post, I wanted to examine the workflow required for utilizing Harbor, an open source cloud native registry, as an option to serve out a curated set of Helm charts to developers in an organization. We’ll walk through a couple of scenarios, including configuring a “private” project in Harbor that houses Helm charts and container images for a specific group of developers. Building on my last post, we’ll also add this new Helm chart repository into our Kubeapps deployment to allow our developers to deploy our curated applications directly from the Kubeapps dashboard.

Harbor is an an open source trusted cloud native registry project that stores, signs, and scans content. Harbor extends the open source Docker Distribution by adding the functionalities usually required by users such as security, identity and management. Having a registry closer to the build and run environment can improve the image transfer efficiency. Harbor supports replication of images between registries, and also offers advanced security features such as user management, access control and activity auditing. Enterprise support for Harbor Container Registry is included with VMware Enterprise PKS.

Along with the ability to host container images, Harbor also recently added functionality to act as a Helm chart repository. Harbor admins create “projects” that are normally dedicated to certain teams or environments. These projects, public or private, house container images as well as Helm charts to allow our developers to easily deploy curated applications in their Kubernetes cluster(s).

We already have Harbor deployed in our environment as an OpsMan tile. For more information on installing Harbor in conjunction with Enterprise PKS, see documentation here. For instructions detailing the Harbor installation procedure outside of an Enterprise PKS deployment, see the community documentation here.

Let’s get started!!

Creating a Private Project in Harbor

The first thing we’ll need to do is create a new private project that we’ll use to store our container images and Helm charts for our group of developers.

Navigate to the Harbor web UI and login with the admin credentials defined on install. Once logged in, select the + New Project button above the list of existing projects:

Name the project (developers-private-project in our case) and leave the Public option unchecked, as we only want our specific developer group to have access to this project:

Select the newly created project from the list and note the different menus we have available to us regarding the project, including Repositories, which will house our container images, as well as Helm Charts, which will house our Helm charts. We can also add individual members to the project to allow them to authenticate to the project with a username/password combination when pulling/pushing images or Helm charts to the project. For now, let’s select the Configuration tab and select the Automatically scan images on push option. This will instruct Harbor to scan container images for possible CVEs when they are uploaded to the project. Select Save:

Now that we’ve configured our private project, we need to upload our container image that will serve as the basis for our app.

Upload Image to Private Harbor Project

Now that we’ve created our project, we need to populate the project with the container image we are going to use to power this application.

In this example, we are using a simple “To Do List” application. Additional details on the application can be found here.

You’ll need access to a server with docker installed to perform this workflow. I am using the same Linux server where my Helm client is installed.

First, pull the docker image from the public repository:

$ docker pull prydonius/todo

Verify the image has been pulled:

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
prydonius/todo      latest              4089c4ba4620        24 months ago       107MB

Since the project is private, we need to use docker login to authenticate against Harbor. Use the Harbor admin user credentials to authenticate:

$ docker login harbor.pks.zpod.io

Login Succeeded

Now we can tag the todo image with the Harbor url as well as the repo name and tag (which we define, v1 in this case) and push it to our private registry:

$ docker tag prydonius/todo:latest harbor.pks.zpod.io/developers-private-project/todo:v1


$ docker push harbor.pks.zpod.io/developers-private-project/todo:v1

Let’s head over to the Harbor web UI and ensure our image has been successfully uploaded. Navigate to the Projects tab in the left hand menu, select the developers-private-project project, and ensure the todo image is present:

While we here, let’s click on the link for the image and examine the vulnerabilities:

As we selected the option to scan all images on push, our todo container was automatically scanned when it was uploaded. There are a couple of vulnerabilities of “High” severity that we’d want to examine before pushing this app to production. Harbor also provides the ability to set rules in the configuration for each project to ensure containers with known vulnerabilities are not deployed in clusters. Our development environment is not exposed outside of our datacenter so we can let this slide…for now.

Now that we’ve uploaded our container image, we are ready to build our custom Helm chart that will utilize this image in our Harbor repository to build the application in our Kubernetes cluster.

Creating our Custom Helm Chart

As discussed in the last post, Helm uses charts, a collection of files that describe a related set of Kubernetes resources, to simplify the deployment of applications in a Kubernetes cluster. Today, we are going to build a simple Helm chart that deploys our todo app and exposes the app via a load balancer.

We’ll navigate to the server running the Helm client and issue the following command which will build out the scaffolding required for a Helm chart. We’ll call this chart dev-to-do-chart:

$ helm create dev-to-do-chart

The following directory structure will be created:

dev-to-do-chart
|-- Chart.yaml
|-- charts
|-- templates
|   |-- NOTES.txt
|   |-- _helpers.tpl
|   |-- deployment.yaml
|   |-- ingress.yaml
|   `-- service.yaml
`-- values.yaml

The templates/ directory is where Helm finds the YAML definitions for your Services, Deployments and other Kubernetes objects. We will define variables for our deployment in the values.yaml file. Values here can be dynamically set at deployment time to define things such as using an Ingress resource to expose the application or assigning persistent storage to the app.

Let’s edit the values.yaml file to add a couple of additional bits of information. We want to define the image that we will use to back our application deployment. We’ll use the todo container image that we just uploaded to our private project.

Also, since this project/repository is private, we need to create a Kubernetes secret that contains access information for the repository so Kubernetes (and docker) is allowed to pull the image. For additional information on this process, see the Kubernetes documentation here.

In our example, I have created the private-repo-sec secret that we will add to the values.yaml, along with the image name:

$ vi dev-to-do-chart/templates/values.yaml

---
image:
  repository: harbor.pks.zpod.io/developers-private-project/todo
  tag: v1
  pullPolicy: IfNotPresent
imagePullSecrets:
- name: private-repo-sec
___

This will instruct Helm to build a Kubernetes deployment that contains a pod comprised of our todo container from our developers-private-project repo and utilize the private-repo-sec secret to authenticate to the private project.

Let’s also create a README.md (in the dev-to-do-chart directory) file that will display information about the Helm chart that will be visible in our Kubeapps dashboard:

$ vi dev-to-do-chart/README.md 

___
This chart will deploy the "To Do" application. 

Set "Service" to type "LoadBalancer" in the values file to expose the application via an L4 NSX-T load balancer.
___

Now that we’ve configured our chart, we need to package it up so we can upload it to our Harbor chart repo to share with our developers. Navigate back to the parent directory and run the following command to package the chart:

$ helm package ./dev-to-do-chart
Successfully packaged chart and saved it to: /home/user/dev-to-do-chart-0.1.0.tgz

We’ve created and packaged our custom Helm chart for our developers, now we’re ready to upload the chart to Harbor so they can deploy the todo application!!

Uploading Custom Helm Chart to Harbor

There are two ways to upload a Helm chart to harbor:

  • Via the Harbor web UI
  • Via the Helm CLI tool

We are going to use the Helm CLI tool to push the chart to our private project. The first thing we’ll need to do is grab the ca.crt for our project which will allow us to add the chart repo from our Harbor project to our local Helm client.

Navigate back to the homepage for the developers-private-project and select the Registry Certificate link:

This will download the ca.crt that we can use in the following command to push our Helm chart to our project. Since the project is private, we will need to authenticate with the admin users credentials as well as the ca.crt when we add the repo to our Helm repo list:

Note: These commands should be run from the Linux server where the Helm client is installed.

helm repo add developers-private-project --ca-file=ca.crt --username=admin --password=<password> https://harbor.pks.zpod.io/chartrepo/developers-private-project

Let’s verify the repo was added to our Helm repo list:

$ helm repo list
NAME                        URL                                                                                               
developers-private-project  https://harbor.pks.zpod.io/chartrepo/developers-private-project

It should be noted, the native Helm CLI does not support pushing charts so we need to install the helm-push plugin:

$ helm plugin install https://github.com/chartmuseum/helm-push

Now we’re ready to push our chart to our Harbor project:

$ helm push --ca-file=ca.crt --username=admin --password=<password> dev-to-do-chart-0.1.0.tgz developers-private-project
Pushing dev-to-do-chart-0.1.0.tgz to developers-private-project...
Done.

Let’s update our helm repos and search for our chart via the Helm CLI to confirm it is available in our project’s chart repo:

$ helm repo update


$ helm search dev-to-do
NAME                                        CHART VERSION   APP VERSION DESCRIPTION                
developers-private-project/dev-to-do-chart  0.1.0           1.0         A Helm chart for Kubernetes
local/dev-to-do-chart                       0.1.0           1.0         A Helm chart for Kubernetes

Now let’s confirm we can see it in the Harbor web UI as well. Navigate back to the developers-private-project homepage and select the Helm Charts tab:

Awesome!! Now we’re finally ready to add our private chart repo into our Kubeapps deployment so our developers can deploy our to-do app via the Kubeapps dashboard.

Adding a Private Project Helm Chart Repo to Kubeapps

Now that we’ve created our private project, populated with our custom container image and helm chart, we are ready to add the Helm chart repo into our Kubeapps deployment so our developers can deploy the to-do application via the Kubeapps dashboard.

First, we to access our Kubeapps dashboard. Once we’ve authenticated with our token, hover over the Configuration button in the top right-hand corner and select the App Repositories option from the drop down:

Select the Add App Repository button and file in the required details. We are using basic authentication with the Harbor admin user’s credentials. We also will need to add our ca.crt file as well. When finished, select the Install Repo button:

If all the credentials have been populated correctly, we can click on the developers-private-project link and see our dev-to-do-cart Helm chart:

Now, our developers can log in to the Kubeapps dashboard, select the Catalog option, search for our dev-to-do-chart, click on the entry, and select the Deploy button on the subsequent browner page:

In order for our developers to expose this app to access outside of the Kubernetes cluster, we need to change the Service from ClusterIP to LoadBalancer:

Once they’ve made this change, they can select the Submit button to deploy the application in their Kubernetes cluster. The subsequent webpage will show us information about our deployment, including the URL (IP of the NSX-T load balancer that was automatically created, highlighted with a red box in the screenshot) as well as the current state of the deployment:

Note: The automatic creation of the LoadBalancer service is made possible by the integration between NSX-T and Enterprise PKS. These instructions will need to be augmented to provide this same functionality running on a different set of infrastructure.

Navigate to the IP address of the load balancer to test application access:

Boom!! There we have it, our application being served out via our NSX-T L4 load balancer resource.

Conclusion

In this post, we walked through the steps required to create a private Harbor project for our developers that will house custom container images and Helm charts as well as building a custom Helm chart and uploading our container image and custom Helm chart to that private project.

We also walked through the process of adding a private Helm chart repo, hosted by our Harbor deployment, in to our Kubeapps dashboard so our developers can deploy this custom application for testing in their Kubernetes clusters.

Deploying Kubeapps and Exposing the Dashboard via Ingress Controller in Enterprise PKS

In this post, I’d like to take some time to walk through the process of deploying Kubeapps in an Enterprise PKS kubernetes cluster. I’ll also walk through the process of utilizing the built-in ingress controller provided by NSX-T to expose the Kubeapps dashboard via a fully qualified domain name.

What is Kubeapps?

There’s been a lot of excitement in the Cloud Native space at VMware since the acquisition of Bitnami last year. The Bitnami team has done a lot of amazing work over the years to simplify the process of application deployment across all types of infrastructure, both in public and private clouds. Today we are going to take a look at Kubeapps. Kubeapps, an open source project developed by the folks at Bitnami, is a web-based UI for deploying and managing applications in Kubernetes clusters. Kubeapps allows users to:

  • Browse and deploy Helm charts from chart repositories
  • Inspect, upgrade and delete Helm-based applications installed in the cluster
  • Add custom and private chart repositories (supports ChartMuseum and JFrog Artifactory)
  • Browse and provision external services from the Service Catalog and available Service Brokers
  • Connect Helm-based applications to external services with Service Catalog Bindings
  • Secure authentication and authorization based on Kubernetes Role-Based Access Control

Assumptions/Pre-reqs

Before we get started, I wanted to lay out some assumptions and pre-reqs regarding the environment I’m using to support this Kubeapps deployment. First, some info about the infrastructure I’m using to support my kubernetes cluster:

  • vSphere 6.7u2
  • NSX-T 2.4
  • Enterprise PKS 1.4.1
  • vSphere Cloud Provider configured for persistent storage
  • A wildcard DNS entry to support your app ingress strategy

I’m also making the assumption that you have Helm installed on your kubernetes cluster as well. Helm is a package manager for kubernetes. Helm uses a packaging format called charts. A chart is a collection of files that describe a related set of Kubernetes resources. A single chart might be used to deploy something simple, like a memcached pod, or something complex, like a full web app stack with HTTP servers, databases, caches, and so on. Kubeapps uses Helm charts to deploy application stacks to kubernetes clusters so Helm must be deployed in the cluster prior to deploying Kubeapps. In this tutorial, we’re actually going to deploy kubeapps via the helm chart as well!

Finally, in order for Kubeapps to be able to deploy applications into the cluster, we will need to create a couple of Kubernetes RBAC resources. First, we’ll create a serviceaccount (called kubeapps-operator) and attach a clusterrole to the serviceaccount via a clusterrolebinding to allow the service account to deploy apps in the cluster. For the sake of simplicity, we are going to assign this service account cluster-admin privileges. This means the kubeapps-operator service account has the highest level of access to the kubernetes cluster. This is NOT recommended in production environments. I’ll be publishing a follow-up post on best practices for deploying Helm and Kubeapps in a production environment soon. Stay tuned!

Preparing the Cluster for a Kubeapps Deployment

This first thing we’ll want to do is add the Bitnami repo to our Helm configuration, as the Bitnami repo houses the Kubeapps Helm chart:

$ helm repo add bitnami https://charts.bitnami.com/bitnami

Now that we’ve added the repo, let’s create a namespace for our Kubeapps deployment to live in:

$ kubectl create ns kubeapps

Now we’re ready to create our serviceaccount and attach our clusterole to it:

$ kubectl create serviceaccount kubeapps-operator 
$ kubectl create clusterrolebinding kubeapps-operator \
--clusterrole=cluster-admin \
--serviceaccount=default:kubeapps-operator

Let’s use Helm to deploy our Kubeapps application!!

helm install --name kubeapps --namespace kubeapps bitnami/kubeapps \
--set mongodb.securityContext.enabled=false \
--set mongodb.mongodbEnableIPv6=false

Note, we could opt to set frontend.service.type=LoadBalancer if we wanted to utilize the Enterprise PKS/NSX-T integration to expose the dashboard via a dedicated IP but since we’re going to use an Ingress controller (also provided by NSX-T), we’ll leave that option out.

After a minute or two, we can check what was deployed via the Kubeapps Helm chart and ensure all the pods are available:

$ kubectl get all -n kubeapps

Exposing the Kubeapps Dashboard via FQDN

Our pods and services are now available, but we haven’t exposed the dashboard for access from outside of the cluster yet. For that, we need to create an ingress resource. If you review the output from the screenshot above, the kubeapps service, of type ClusterIP, is serving out our dashboard on port 80. The kubernetes service type of ClusterIP only exposes our service internally within the cluster so we’ll need to create an ingress resource that targets this service on port 80 so we can expose the dashboard to external users.

Part of the Enterprise PKS and VMware NSX-T integration provides an ingress controller per kubernetes cluster provisioned. This ingress controller is actually an L7 Load Balancer in NSX-T primitives. Any time we create an ingress service type in our Enterprise PKS kubernetes cluster, NSX-T automatically creates an entry in the L7 load balancer to redirect traffic, based on hostname, to the correct services/pods in the cluster.

As mentioned in the Pre-reps section, I’ve got a wildcard DNS entry that redirects *.prod.example.com to the IP address of the NSX-T L7 Load Balancer. This will allows my developers to use the native kubernetes ingress services to define the hostname of their applications without having to work with me or my infrastructure team to manually update DNS records every time they want to expose an application to the public.

Enough talk, let’s deploy our ingress controller! I’ve used the .yaml file below to expose my Kubeapps dashboard at kubeapps.prod.example.com:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: kubeapps-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: kubeapps.prod.example.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: kubeapps 
          servicePort: 80

As we can see, we are telling the Ingress service to target the kubeapps service on port 80 to “proxy” the dashboard to the public. Now let’s create that ingress resource:

$ kubectl create -f kubeapps-ingress.yaml -n kubeapps

And review the service to get our hostname and confirm IP address of the NSX-T L7 Load Balancer:

$ kubectl get ing -n kubeapps
NAME               HOSTS                       ADDRESS                     PORTS   AGE
kubeapps-ingress   kubeapps.prod.example.com   10.96.59.106,100.64.32.27   80      96m

Note, the 10.96.59.106 address is the IP of the NSX-T Load Balancer, which is where my DNS wildcard is directing requests to, and the HOSTS entry is the hostname our Kubeapps dashboard should be accessible on. So let’s check it out!

Now we’re ready to deploy applications in our kubernetes cluster with the click of a button!!

Behind the Scenes with NSX-T

So let’s have a look at what’s actually happening in NSX-T and how we can cross reference this with what’s going on with our Kubernetes resources. As I mentioned earlier, any time an Enterprise PKS cluster is provisioned, two NSX-T Load Balancers are created automatically:

  • An L4 load balancer that fronts the kubernetes master(s) to expose the kubernetes API to external users
  • An L7 load balancer that acts as the ingress controller for the cluster

So, we’ve created an ingress resource for our Kubeapps dashboard, let’s look at what’s happening in the NSX-T manager.

So let’s navigate to the NSX-T manager, login with our admin credentials and navigate to the Advanced Networking and Security tab. Navigate to Load Balancing and choose the Server Pools tab on the right side of the UI. I’ve queried the PKS API to get the UUID for my cluster (1cd1818c...), which corresponds with the LB we want to inspect (Note: you’ll see two LB entries for the UUID mentioned, one for kubernetes API, the other for the ingress controller):

Select the Load Balancer in question and then select the Pool Members option on the right side of the UI:

This will show us two kubernetes pods and their internal IP addresses. Let’s go back to the CLI and compare this with what we see in the cluster:

$ kubectl get pods -l app=kubeapps -o wide -n kubeapps
NAME                        READY   STATUS    RESTARTS   AGE    IP            NODE                                   
kubeapps-7cd9986dfd-7ghff   1/1     Running   0          124m   172.16.17.6   0faf789a-18db-4b3f-a91a-a9e0b213f310
kubeapps-7cd9986dfd-mwk6j   1/1     Running   0          124m   172.16.17.7   8aa79ec7-b484-4451-aea8-cb5cf2020ab0

So this confirms that our 2 pods serving out our Kubeapps dashboard are being fronted by our L7 Load Balancer in NSX-T.

Conclusion

I know that was a lot to take in but I wanted to make sure to review what the actions we performed in this post:

  • Created a serviceaccount and clusterrolebinding to allow Kubeapps to deploy apps
  • Deployed our Kubeapps application via a Helm Chart
  • Exposed the Kubeapps dashboard for external access via our NSX-T “ingress controller”
  • Verified that Enterprise PKS and NSX-T worked together to automate the creation of all of these network resources to support our applications

As I mentioned above, stay tuned for a follow up post that will detail security implications for deploying Helm and Kubeapps in Production environments. Thanks for reading!!!

Creating a virtualenv with Python 3.7.3

As I’ve mentioned in recent posts, VMware’s Container Service Extension 2.0 (CSE) has recently been released. The big news around the 2.0 release is the ability to provision Enterprise PKS clusters via CSE.

It’s important to note that CSE 2.0 has a dependency on Python 3.7.3 or later. I had some trouble managed different versions of Python3 on the CentOS host I used to support the CSE server component. I wanted to document my steps in creating a virtual environment via virtualenv utilizing Python 3.7.3 and installing CSE Server 2.0 within the virtual environment.

virtualenv is a tool to create isolated Python environments. virtualenv creates a folder which contains all the necessary executables to use the packages that a Python project would need. This is useful in my situation as I had various versions of Python 3 installed on my CentOS server and I wanted to ensure Python 3.7.3 was being utilized exclusively for the CSE installation while not effecting other services running on the server utilizing Python3.

Installing Python 3.7.3 on CentOS

The first thing we need to do is install (and compile) Python 3.7.3 on our CentOS server.

We’ll need some development packages and the GCC compiler installed on the server:

# yum install -y zlib-devel gcc openssl-devel bzip2-devel libffi-devel

Next, we’ll pull down the Python 3.7.3 bits from the official Python site and unpack the archive:

# cd /usr/src
# wget https://www.python.org/ftp/python/3.7.3/Python-3.7.3.tgz
# tar xzf Python-3.7.3.tgz
# cd Python-3.7.3

At this point we need to compile the Python source code on our system. We’ll use altinstall as not to replace the system’s default python binary located at /usr/bin/python:

# ./configure --enable-optimizations
# make altinstall

Now that we’ve compiled our new version of Python, we can clean up the archive file and check our python3.7 version to ensure we compiled our source code correctly:

# rm /usr/src/Python-3.7.3.tgz
# python3.7 -V
Python 3.7.3

Finally, we need to use pip to install the virtualenv tool on our server:

# pip3.7 install virtualenv

Creating our virtualenv

Now we’re ready to create our virtual environment within which to install CSE 2.0 server. First, let’s create a user that we’ll utilize to deploy the CSE server within the virtual environment. We can create the user and then switch to that user’s profile:

# useradd cse
# su - cse

Now we need to create a directory that will contain our virtual environment. In this example, I used the cse-env directory to house my virtual environment:

$ mkdir ~/cse-env

Now we need to create our virtual environment for our Python 3.7.3 project:

$ python3.7 -m virtualenv cse-env
Using base prefix '/usr/local'
New python executable in /home/cse/cse-env/bin/python3.7
Also creating executable in /home/cse/cse-env/bin/python
Installing setuptools, pip, wheel...
done.

Before you can start installing or using packages in the virtual environment, we’ll need to activate it. Activating a virtual environment will put the virtual environment-specific python and pip executables into your shell’s PATH. Run the following command to activate your virtual environment:

$ source ~/cse-env/bin/activate

Now check the default python version within the environment to verify we are using 3.7.3:

$ python -V
Python 3.7.3
$ pip -V
pip 19.1.1 from /home/cse/cse-env/lib/python3.7/site-packages/pip (python 3.7)

Now we’re ready to install the CSE server and we won’t have to worry about Python version conflicts as we are installing the CSE packages within our virtual environment, which will only utilize Python 3.7.3.

Stay tuned for my next post which will walk through an installation of Container Service Extension server!!