Container Service Extension 2.5 Installation: Part 3

In Parts 1 and 2 of my series on installing and configuring the Container Service Extension for VMware Cloud Director, I focused on setting the CSE server up to support CSE Standard Kubernetes cluster creation.

CSE Standard clusters are comprised of deployed vApps that utilize NSX-V networking resources, utilizing Weave as the Container Network Interface for the Kubernetes clusters. In Part 3 of my series, I wanted to take some time to look at configuring the CSE Server to support the creation of CSE Enterprise Kubernetes clusters. CSE Enterprise clusters are comprised of VMware Enterprise PKS Kubernetes clusters deployed on top of NSX-T networking resources, utilizing the NSX Container Plugin as a CNI. CSE Enterprise brings enterprise grade features and functionality to CSE that include, but are not limited to:

  • HA, multi-master Kubernetes clusters
  • Dynamic persistent storage provisioning with the vSphere Cloud Provider integration
  • Automated Day 1 and Day 2 Kubernetes cluster management via Bosh Director
  • Microsegmentation capability for Kubernentes resources via integration with NSX-T
  • Automated creation of Kubernetes service type LoadBalancer and ingress resrouces via NSX-T L4/L7 load balancers
  • Support for Harbor, an open source cloud native registry

How Does CSE Enterprise Work?

Before we get started on the configuration steps, I want to take some time to explore how the CSE Server communicates with both Enterprise PKS and NSX-T to automate the creation of Kubernetes clusters, all triggered by a request from an end user via the vcd-cli. When a user issues a vcd cse create cluster command, they are either utilizing an OrgVDC that has been enabled for CSE Standard, CSE Enterprise, or no Kubernetes provider at all. If their OvDC has been enabled to support CSE Enterprise cluster creation (and they have the correct permission to create clusters), the cse extension of the vcd-cli communicates directly with the PKS and NSX-T APIs to initiate the request to create an Enterprise PKS Kubernetes cluster as well as creating all of the NSX-T resources (T1 routers, Load Balancers, logical switches, etc.) to support the cluster. The CSE server also communicates with the NSX-T API to create DFW rules that isolate provisioned clusters from each other to support network resource isolation.

As opposed to CSE Standard, which creates a vApp from a vApp template and then instantiates the vApp as a Kubernetes cluster, the PKS control plane receives an API call from the CSE server and utilizes the plan assigned to the OrgVDC to create the cluster. There is not vApp or vApp template required to provision CSE Enterprise clusters.

CSE PKS Config File

Before we get started, I want to note that I have already deployed both VMware Enterprise PKS and VMware NSX-T Datacenter in my envrironment. CSE will need to hook into an existing deployment of PKS and NSX-T so these components must be configured before configuring CSE Enterprise.

As mentioned in Part 1 of the series, CSE uses a yaml config file that contains information about the vSphere/VCD environment(s) that CSE will need to interact with to deploy CSE Standard Kubernetes clusters. Similarly, we’ll need to utilize a second yaml file that will contain all of the information that CSE will need to communicate with the PKS/NSX-T APIs to orchestrate the provisioning of Enterprise PKS Kubernetes clusters. You can view a sample PKS config file on the CSE docs site but I will walk through through the required components below.

The first step is to create a pks-config.yaml file that I’ll populate with the values below. I’m going to create the file in the same directory as my existing config.yaml file:

$ vi ~/pks-config.yaml

Now I can populate the file with the required information, detailed below:

pks_api_servers section

This section of the config file will contain information that CSE will use to communicate with the PKS API server:

pks_api_servers:
- clusters:
  - RegionA01-COMP01          <--- Cluster name as it appears in vCenter
  cpi: 6bd3f75a31e5c3f2d65e   <--- Bosh CPI value, see note below
  datacenter: RegionA01       <--- Datacenter name as it appears in vCenter
  host: pks.corp.local        <--- FQDN of the PKS API server
  name: PKS-1                 <--- Name used to identity this particular PKS API server, user defined in this file
  port: '9021'                <--- Port used to for PKS API communication, default = 9021
  uaac_port: '8443'           <--- Port used to authenticate against the PKS UAAC service, default = 8443
  vc: vcsa-01                 <--- vCenter name as it appears in VCD
  verify: false               <--- Set to "true" to verify SSL certificates

Note: The cpi value above can be obtained by utilizing the Bosh CLI to run the following command on the Opsman VM in the PKS environment:

$ bosh cpi-config

Using environment '172.31.0.2' as client 'ops_manager'

cpis:
- migrated_from:
  - name: ""
  name: 6bd3f75a31e5c3f2d65e    <--- cpi value 
---output omitted---

pks_accounts section

This section defines the PKS UAAC credentials that CSE will use to authenticate against the PKS API when creating clusters:

pks_accounts:
- name: PKS1-admin                    
  pks_api_server: PKS-1
  secret: fhB-Z5hHcsXl_UnC86dkuYlTzQPoE3Yz
  username: admin
  • name is a user defined value that identifies the name of this credentials set. This is used in case there are multiple PKS API servers, which require their own set of credentials
  • pks_api_server is the user defined variable for the PKS API server defined by the name value in the pks_api_servers section (PKS-1 in my example)
  • username is the UAAC username that CSE will use to authenticate against the PKS API server (using the admin user in my example)
  • secret is the secret tied to the user defined in the username value. If using the admin user, you can obtain the secret by navigating to the Opsman web UI, selecting the PKS tile, selecting the credentials tab, and then selecting Link to Credential next to the Pks Uaa Management Admin Client entry. See screenshot below:

pvdcs section

This section defines the PvDC(s) in VCD that support the Org and OrgVDCs that are enabled to support PKS cluster creation via CSE:

pvdcs:
- cluster: RegionA01-COMP01  <--- Cluster name as it appears in vCenter
  name: prod-pvdc            <--- PvDC name as it appears in VCD
  pks_api_server: PKS-1      <--- user defined variable for the PKS API server defined by the `name` value in the `pks_api_servers` section

nsxt_servers section

This section defines information that CSE will need to communicate with the NSX-T API to create the NSX-T resources to back the Kubernetes clusters as well as create the DFW to isolate the clusters provisioned via CSE:

nsxt_servers:
- distributed_firewall_section_anchor_id: 9d6d2a5c-c32d-419c-ada8-e5208475ca88
  host: nsxmgr-01a.corp.local
  name: nsxt-server-1
  nodes_ip_block_ids:
  - eac47bea-5304-4a7b-8c10-9b16e62f1cda
  password: MySuperSecretPassword!
  pks_api_server: PKS-1
  pods_ip_block_ids:
  - 27d2d1c3-969a-46a5-84b9-db503ce2edd5
  username: admin
  verify: false
  • distributed_firewall_section_anchor_id is the UUID of the “Default Layer3 Section” DFW rule created by PKS during the installation of Enterprise PKS, see screenshot below:

  • host is the FDQN of the NSX-T Management server
  • name is a user defined name for this particular NSX-T instance
  • nodes_ip_block_ids is the UUID of the IP block created in NSX-T to be used by PKS for cluster nodes, see screenshot below:

  • password is the password for the NSX-T user defined in the config file
  • pks_api_server is the user defined variable for the PKS API server defined by the name value in the pks_api_servers section (PKS-1 in my example)
  • pods_ip_block_ids is the UUID of the IP block created in NSX-T to be used by PKS to assign IP addresses to the pods running in a cluster.

  • username is the NSX-T username CSE will use to authenticate against the API
  • verify can be set to “true” to verify SSL certificates

There we have it! This is the bare minimun information required for CSE to deploy Enterprise PKS clusters via vcd-cli. Do note, in my example deployment above, I have a single PKS deployment, cluster, PvDC, and NSX-T instance that CSE is communicate with. You can also add additional of each resource as required for your deployment.

Enabling CSE Enterprise on the CSE Server

Now that I’ve built my PKS config file for CSE, I’m ready to enable CSE Enterprise for my tenants’ org(s). I’ve already deployed the CSE server only utilizing the CSE Standard methodology so the first thing I’ll need to do is update the config.yaml file to point to the new pks-config.yaml file:

$ vi ~/config.yaml

---output omitted---

pks_config: pks-config.yaml

---output omitted---

After updating the config.yaml file with the filename of the PKS config file, I’ll stop the CSE service on the CSE server and re-install CSE:

$ sudo systemctl stop cse

$ cse install -c config.yaml --skip-template-creation

Required Python version: >= 3.7.3
Installed Python version: 3.7.3 (default, Nov 13 2019, 16:41:06)
[GCC 7.4.0]
Validating config file 'config.yaml'
Connected to AMQP server (vcd.corp.local:5672)
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Connected to vCloud Director (vcd.corp.local:443)
Connected to vCenter Server 'vcsa-01' as 'administrator@corp.local' (vcsa-01a.corp.local:443)
Config file 'config.yaml' is valid
Validating PKS config file 'pks-config.yaml'
Connected to PKS server (PKS-1 : pks.corp.local)
Connected to NSX-T server (nsxmgr-01a.corp.local)
PKS Config file 'pks-config.yaml' is valid
Installing CSE on vCloud Director using config file 'config.yaml'
Connected to vCD as system administrator: vcd.corp.local:443
Checking for AMQP exchange 'cse-ext'
AMQP exchange 'cse-ext' is ready
Registered cse as an API extension in vCD
Registering Right: CSE NATIVE DEPLOY RIGHT in vCD
Registering Right: PKS DEPLOY RIGHT in vCD
Creating catalog 'cse'
Created catalog 'cse'
Skipping creation of templates.
Configuring NSX-T server (nsxt-server-1) for CSE. Please check install logs for details.

As you can see from the output of the cse install command, CSE was able to communicate with the PKS and NSX-T API. The CSE server also added a new rights bundle (PKS DEPLOY RIGHT) that I will use to grant users the ability to provision CSE Enterprise Kubernetes clusters.

After successfully installing CSE with the PKS config, I’ll restart the CSE service:

$ sudo systemctl start cse

and verify the service is running as expected (look for the MessageConsumer threads):

$ sudo systemctl status cse
[sudo] password for cse:
● cse.service
   Loaded: loaded (/etc/systemd/system/cse.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-11-18 12:08:18 EST; 1 day 3h ago
 Main PID: 911 (sh)
    Tasks: 7
   Memory: 3.6M
   CGroup: /system.slice/cse.service
           ├─911 /bin/sh /home/cse/cse.sh
           └─914 /home/cse/cse-env/bin/python3.7 /home/cse/cse-env/bin/cse run -c /home/cse/config.yaml

Nov 18 12:08:22 cse sh[911]: CSE installation is valid
Nov 18 12:08:23 cse sh[911]: Started thread 'MessageConsumer-0 (139897843828480)'
Nov 18 12:08:23 cse sh[911]: Started thread 'MessageConsumer-1 (139897746745088)'
Nov 18 12:08:24 cse sh[911]: Started thread 'MessageConsumer-2 (139897755137792)'
Nov 18 12:08:24 cse sh[911]: Started thread 'MessageConsumer-3 (139897835435776)'
Nov 18 12:08:24 cse sh[911]: Started thread 'MessageConsumer-4 (139897738352384)'
Nov 18 12:08:24 cse sh[911]: Container Service Extension for vCloud Director
Nov 18 12:08:24 cse sh[911]: Server running using config file: /home/cse/config.yaml
Nov 18 12:08:24 cse sh[911]: Log files: cse-logs/cse-server-info.log, cse-logs/cse-server-debug.log
Nov 18 12:08:24 cse sh[911]: waiting for requests (ctrl+c to close)

Voila!! I’ve successfully configured CSE to support Enterprise PKS cluster creation!! Now, let’s have a look at enabling a tenant to deploy CSE Enterprise clusters via the vcd-cli.

Onboarding Tenants and Provisioning CSE Enterprise Kubernetes Clusters

The first thing I’ll need to do is login to the VCD deployment via the vcd-cli as the System Admin user:

$ vcd login vcd.corp.local system admin -iw

Next, I can use the cse extension to view the existing OvDCs and what Kubernetes provider is enabled for each OvDC:

$ vcd cse ovdc list

name       org       k8s provider
---------  --------  --------------
base-ovdc  base-org  native
acme-ovdc  AcmeCorp  none

From the results above, note that the base-ovdc is enabled for the Kubernetes provider of native, which means users in that org with the correct permissions can create CSE Standard clusters. The acme-ovdc does not have any Kubernetes provider assigned yet, but I want to enable it to support CSE Enterprise cluster creation.

First, I’ll need to instruct the vcd-cli to “use” the AcmeCorp org:

$ vcd org use AcmeCorp
now using org: 'AcmeCorp', vdc: 'acme-ovdc', vApp: ''.

Then, I’ll add the new PKS rights bundle to the AcmeCorp org:

$ vcd right add "{cse}:PKS DEPLOY RIGHT" -o AcmeCorp
Rights added to the Org 'AcmeCorp'

Before moving on to to the next step, I created a new role in the VCD tenant portal by the name of “orgadmin-k8” which mimics the “Org Admin” permissions. I also created a new user named “pks-k8-admin” and assigned that user the “orgadmin-k8” role.

After creating the new role and user, I need to use the vcd-cli to add the PKS DEPLOY RIGHT rights bundle to the custom role:

$ vcd role add-right "orgadmin-k8" "{cse}:PKS DEPLOY RIGHT"
Rights added successfully to the role 'orgadmin-k8'

The last thing I need to do as the System admin is enable the OvDC to support CSE Enterprise cluster with the following command:

$ vcd cse ovdc enable "acme-ovdc" -o AcmeCorp -k ent-pks -p "xsmall" -d "corp.local"

metadataUpdate: Updating metadata for Virtual Datacenter acme-ovdc(07972d80-faf1-48c7-8f7e-cea92ce7cc6e)
task: 9fced925-4ec8-47c3-a45e-7903b14b0c8d, Updated metadata for Virtual Datacenter acme-ovdc(07972d80-faf1-48c7-8f7e-cea92ce7cc6e), result: success

where -k is the Kubernetes provider (PKS in this case), -p is the PKS plan we want clusters to use when provisioned from CSE for this OvDC, and -d is the domain name that PKS will use for the hostname of each cluster.

Note, CSE uses compute profiles to create Availability Zones for each OvDC that is enabled. When vce cse ovdc enable command above is issued, CSE talks to the PKS API to create a compute profile that defines the Availability Zones for all clusters created in this OvDC as the Resource Pool that is assigned to that OvDC. This ensures that users that provision clusters to this OvDC will have their compute isolated (via the Resource Pool) from other clusters provisioned in other OvDCs in the VCD environment.

Verify that I now see a Kubernetes provider for our acme-ovdc after enabling it:

$ vcd cse ovdc list

name       org       k8s provider
---------  --------  --------------
base-ovdc  base-org  native
acme-ovdc  AcmeCorp  ent-pks

There we have it! Now I can tell the pks-k8-admin user that they are ready to provision some CSE Enterprise clusters!!

Provisioning CSE Enterprise Clusters

Now that I, as the system admin, have done the pre-work to enable the Org to support CSE Enterprise clusters, I’m ready to turn the tenants loose and allow them to provision CSE Enterprise Kubernetes clusters.

First, the orgadmin-k8 user will log in to VCD via the vcd-cli:

$ vcd login vcd.corp.local AcmeCorp pks-k8-admin -iw

All they have to do now is run the vcd cse cluster create command and CSE will handle the rest:

$ vcd cse cluster create prod-1 --nodes 2

property                     value
---------------------------  -----------------
kubernetes_master_host       prod-1.corp.local
kubernetes_master_ips        In Progress
kubernetes_master_port       8443
kubernetes_worker_instances  2
last_action                  CREATE
last_action_description      Creating cluster
last_action_state            in progress
name                         prod-1
worker_haproxy_ip_addresses

where prod-1 is the name of our cluster and --nodes is the number of worker nodes assigned to the cluster. As you can see, the FQDN of the master host will be “cluster-name”.”domain” where “domain” was defined when we enabled the OvDC.

Once the cluster has finished provisioning, we can use the cse extension to gather information about the cluster:

$ vcd cse cluster info prod-1

property                     value
---------------------------  -------------------------------------------------
k8s_provider                 ent-pks
kubernetes_master_host       prod-1.corp.local
kubernetes_master_ips        10.40.14.37
kubernetes_master_port       8443
kubernetes_worker_instances  2
last_action                  CREATE
last_action_description      Instance provisioning completed
last_action_state            succeeded
name                         prod-1
network_profile_name
nsxt_network_profile
pks_cluster_name             prod-1---be6cc6cb-b4a3-4bab-8d6f-e6d1499485bd
plan_name                    small
uuid                         7a1283da-d8b4-418c-a5ea-720810195d72
worker_haproxy_ip_addresses

Note that 10.40.14.37 is the IP address of the Kubernetes master node. If I navigate to the NSX-T Manager web UI, I can verify that a virtual server was automatically created within an L4 load balancer to allow external access to the Kubernetes cluster via kubectl.

Now, the tenant can use the cse extension to pull down the Kubernetes config file from the cluster and store it in the default config file location on my local workstation (~/.kube/config):

Note: The config file will use prod-1.corp.local as the Kubernetes master server name so I have added a DNS entry that maps prod-1.corp.local to the IP of my NSX-T virtual server that fronts the Kubernetes master(s).

$ vcd cse cluster config prod-1 > ~/.kube/config

$ kubectl get nodes

NAME                                   STATUS   ROLES    AGE     VERSION
82d3022e-9fbc-4a31-9be2-fecc80e2ab27   Ready    <none>   2d17h   v1.13.5
d35f7324-0f09-440e-81b0-af9ad26481a6   Ready    <none>   2d17h   v1.13.5

Now the pks-k8-admin user has full admin access to their Enterprise PKS Kubernetes cluster and can instantly begin deploying their workloads to the newly created cluster!!

Conclusion

This wraps up my 3 part series on installing and configuring the Container Service Extension to support both CSE Standard and CSE Enterprise cluster creation. Feel free to reach out to me in the comment section or on Twitter if you have any additional questions or comments. Thanks for the read!

Backing Up Your Kubernetes Applications with Velero v1.1

In this post, I’m going to walk through the process of installing and using Velero v1.1 to back up a Kubernetes application that includes persistent data stored in persisentvolumes. I will then simulate a DR scenario by completely deleting the application and using Velero to restore the application to the cluster, including the persistent data.

Meet Velero!! ⛵

Velero is a backup and recovery solution built specifically to assist in the backup (and migration) of Kubernetes applications, including their persistent storage volumes. You can even use Velero to back up an entire Kubernetes cluster for restore and/or migration! Velero address various use cases, including but not limited to:

  • Taking backups of your cluster to allow for restore in case of infrastructure loss/corruption
  • Migration of cluster resources to other clusters
  • Replication of production cluster/applications to dev and test clusters

Velero is essentially comprised of two components:

  • A server that runs as a set of resources with your Kubernetes cluster
  • A command-line client that runs locally

Velero also supports the back up and restore of Kubernetes volumes using restic, an open source backup tool. Velero will need to utilize a S3 API-compatible storage server to store these volumes. To satisfy this requirement, I will also deploy a Minio server in my Kubernetes cluster so Velero is able to store my Kubernetes volume backups. Minio is a light weight, easy to deploy S3 object store that you can run on premises. In a production environment, you’d want to deploy your S3 compatible storage solution in another cluster or environment to prevent from total data loss in case of infrastructure failure.

Environment Overview

As a level set, I’d like to provide a little information about the infrastructure I am using in my lab environment. See below for infrastructure details:

  • VMware vCenter Server Appliance 6.7u2
  • VMware ESXi 6.7u2
  • VMware NSX-T Datacenter 2.5.0
  • VMware Enterprise PKS 1.5.0

Enterprise PKS handles the Day1 and Day2 operational requirements for deploying and managing my Kubernetes clusters. Click here for additional information on VMware Enterprise PKS.

However, I do want to mention that Velero can be installed and configured to interact with ANY Kubernetes cluster of version 1.7 or later (1.10 or later for restic support).

Installing Minio

First, I’ll deploy all of the components required to support the Velero service, starting with Minio.

First things first, I’ll create the velero namespace to house the Velero installation in the cluster:

$ kubectl create namespace velero

I also decided to create a dedicated storageclass for the Minio service to use for its persistent storage. In Enterprise PKS Kubernetes clusters, you can configure the vSphere Cloud Provider plugin to dynamically create VMDKs in your vSphere environment to support persistentvolumes whenever a persistentvolumeclaim is created in the Kubernetes cluster. Click here for more information on the vSphere Cloud Provider plugin:

$ kubectl create -f minio-storage-class.yaml 


kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: minio-disk
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin

Now that we have a storage class, I’m ready to create a persistentvolumeclaim the Minio service will use to store the volume backups via restic. As you can see from the example .yaml file below, the previously created storageclass is referenced to ensure the persistentvolume is provisioned dynamically:

$ cat minio-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: velero-claim
  namespace: velero
  annotations:
    volume.beta.kubernetes.io/storage-class: minio-disk
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi


$ kubectl create -f minio-pvc.yaml

Verify the persistentvolumeclaim was created and its status is Bound:

$ kubectl get pvc -n velero

NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
minio-claim   Bound    pvc-cc7ac855-e5f0-11e9-b7eb-00505697e7e7   6Gi        RWO            minio-disk     8s

Now that I’ve created the storage to support the Minio deployment, I am ready to create the Minio deployment. Click here for access to the full .yaml file for the Minio deployment:

$ kubectl create -f minio-deploy.yaml 

deployment.apps/minio created
service/minio created
secret/cloud-credentials created
job.batch/minio-setup created
ingress.extensions/velero-minio created

Use kubectl to wait for the minio-xxxx pod to enter the Running status:

$ kubectl get pods -n velero -w

NAME                    READY   STATUS              RESTARTS   AGE
minio-754667444-zc2t2   0/1     ContainerCreating   0          4s
minio-setup-skbs6       1/1     Running             0          4s
NAME                    READY   STATUS              RESTARTS   AGE
minio-754667444-zc2t2   1/1     Running             0          9s
minio-setup-skbs6       0/1     Completed           0          11s

Now that our Minio application is deployed, we need to expose the Minio service to requests outside of the cluster via a LoadBalancer service type with the following command:

$ kubectl expose deployment minio --name=velero-minio-lb --port=9000 --target-port=9000 --type=LoadBalancer --namespace=velero

Note, because of the integration between VMware Enterprise PKS and VMware NSX-T Datacenter, when I create a “LoadBalancer” service type in the cluster, the NSX Container Plugin, which we are using as our Container Network Interface, reaches out to the NSX-T API to automatically provision a virtual server in a NSX-T L4 load balancer.

I’ll use kubectl to retrieve the IP of the virtual server created within the NSX-T load balancer and access the Minio UI in my browser at EXTERNAL-IP:9000 I am looking for the IP address under the EXTERNAL-IP section for the velero-minio-lb service, 10.96.59.116 in this case:

$ kubectl get services -n velero

NAME              TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)          AGE
minio             ClusterIP      10.100.200.160   <none>         9000/TCP         7m14s
velero-minio-lb   LoadBalancer   10.100.200.77    10.96.59.116   9000:30711/TCP   12s

Now that Minio has been succesfully deployed in the my Kubernetes cluster, I’m ready to move on to the next section to install and configure Velero and restic.

Installing Velero and Restic

Now that I have an s3-compatible storage solution deployed in my environment, I am ready to complete the installation of Velero (and restic).

However, before I move forward with the installation of Velero, I need to install the Velero CLI client on my workstation. The instructions detailed below will allow you to install the client on a Linux server (I’m using a CentOS 7 instance).

First, I navigated to the Velero github releases page and copied the link for the v1.1 .rpm file for my OS distribution:

Then, I used wget to pull the image down to my linux server, extracted the contents of the file, and moved the velero binary into my path:

$ cd ~/tmp

$ wget https://github.com/vmware-tanzu/velero/releases/download/v1.1.0/velero-v1.1.0-linux-amd64.tar.gz

$ tar -xvf https://github.com/vmware-tanzu/velero/releases/download/v1.1.0/velero-v1.1.0-linux-amd64.tar.gz

$ sudo mv velero-v1.1.0-linux-amd64/velero /usr/bin/velero

Now that I have the Velero client installed on my server, I am ready to continue with the installation.

I’ll create a credentials-velero file that we will use during install to authenticate against the Minio service. Velero will use these credentials to access Minio to store volume backups:

$ cat credentials-velero

[default]
aws_access_key_id = minio
aws_secret_access_key = minio123

Now I’m ready to install Velero! The following command will complete the installation of Velero (and restic) where:

  • --provider aws instructs Velero to utilize S3 storage which is running on-prem, in my case
  • --secret-file is our Minio credentials
  • --use-restic flag ensures Velero knows to deploy restic for persistentvolume backups
  • --s3Url value is the address of the Minio service that is only resolvable from within the Kubernetes cluster * --publicUrl value is the IP address for the LoadBalancer service that allows access to the Minio UI from outside of the cluster:
$ velero install --provider aws --bucket velero --secret-file credentials-velero \ 
--use-volume-snapshots=false --use-restic --backup-location-config \ 
region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000,publicUrl=http://10.96.59.116:9000

Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

Note: The velero install command creates a set of CRDs that power the Velero service. You can run velero install --dry-run -o yaml to output all of the .yaml files used to create the Velero deployment.

After the installation is complete, I’ll verify that I have 3 restic-xxx pods and 1 velero-xxx pod deployed in the velero namespace. As the restic service is deployed as a daemonset, I will expect to see a restic pod per node in my cluster. I have 3 worker nodes so I should see 3 restic pods:

Note: Notice the status of the restic-xxx pods…

$ kubectl get pod -n velero
NAME                      READY   STATUS             RESTARTS   AGE
minio-5559c4749-7xssq     1/1     Running            0          7m21s
minio-setup-dhnrr         0/1     Completed          0          7m21s
restic-mwgsd              0/1     CrashLoopBackOff   4          2m17s
restic-xmbzz              0/1     CrashLoopBackOff   4          2m17s
restic-235cz              0/1     CrashLoopBackOff   4          2m17s
velero-7d876dbdc7-z4tjm   1/1     Running            0          2m17s

As you may notice, the restic pods are not able to start. That is because in Enterprise PKS Kubernetes clusters, the path to the pods on the nodes is a little different (/var/vcap/data/kubelet/pods) than in “vanilla” Kubernetes clusters (/var/lib/kubelet/pods). In order to allow the restic pods to run as expected, I’ll need to edit the restic daemon set and change the hostPath variable as referenced below:

$ kubectl edit daemonset restic -n velero


volumes:
      - hostPath:
          path: /var/vcap/data/kubelet/pods
          type: ""
        name: host-pods

Now I’ll verify all of the restic pods are in Running status:

$ kubectl get pod -n velero

NAME                      READY   STATUS      RESTARTS   AGE
minio-5559c4749-7xssq     1/1     Running     0          12m
minio-setup-dhnrr         0/1     Completed   0          12m
restic-p4d2c              1/1     Running     0          6s
restic-xvxkh              1/1     Running     0          6s
restic-e31da              1/1     Running     0          6s
velero-7d876dbdc7-z4tjm   1/1     Running     0          7m36s

Woohoo!! Velero is successfully deployed in my Kubernetes clusters. Now I’m ready to take some backups!!

Backup/Restore the WordPress Application using Velero

Now that I’ve deployed Velero and all of its supporting components in my cluster, I’m ready to perform some backups. But in order to taste my backup/recovery solution, I’ll need an app that preferably utilizes persistent data.

In one of my previous blog posts, I walked through the process of deploying Kubeapps in my cluster to allow me to easily deploy application stacks to my Kubernetes cluster.

For this exercise, I’ve used Kubeapps to deploy a WordPress blog that utilizes persistentvolumes to store post data for my blog. I’ve also populated the blog with a test post to test backup and recovery.

First, I’ll verify that the WordPress pods are in a Running state:

$ kubectl get pods -n wordpress

NAME                                  READY   STATUS    RESTARTS   AGE
cut-birds-mariadb-0                   1/1     Running   0          23h
cut-birds-wordpress-fbb7f5b76-lm5bh   1/1     Running   0          23h

I’ll also verify the URL of my blog and access it via my web browser to verify current state:

$ kubectl get svc -n wordpress

NAME                  TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
cut-birds-mariadb     ClusterIP      10.100.200.39   <none>         3306/TCP                     19h
cut-birds-wordpress   LoadBalancer   10.100.200.32   10.96.59.116   80:32393/TCP,443:31585/TCP   19h

Everything looks good, especially the cat!!

In order for Velero to understand where to look for persistent data to back up, in addition to other Kubernetes resources in the cluster, we need to annotate each pod that is utilizing a volume so Velero backups up the pods AND the volumes.

I’ll review both of the pods in the wordpress namespace to view the name of each volume being used by each pod:

$ kubectl describe pod/cut-birds-mariadb-0 -n wordpress

---output omitted---

Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-cut-birds-mariadb-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cut-birds-mariadb
    Optional:  false
  default-token-6q5xt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-6q5xt
    Optional:    false


$ kubectl describe pods/cut-birds-wordpress-fbb7f5b76-lm5bh -n wordpress

---output omitted---

Volumes:
  wordpress-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  cut-birds-wordpress
    ReadOnly:   false
  default-token-6q5xt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-6q5xt
    Optional:    false

As you can see, the mariadb pod is using 2 volumes: data and config, while the wordpress pod is utilizing a single volume: wordpress-data.

I’ll run the following commands to annotate each pod with the backup.velero.io tag with each pods’ corresponding volume(s):

$ kubectl -n wordpress annotate pod/cut-birds-mariadb-0 backup.velero.io/backup-volumes=data,config
$ kubectl -n wordpress annotate pod/cut-birds-wordpress-fbb7f5b76-lm5bh backup.velero.io/backup-volumes=wordpress-data

Now I’m ready to use the velero client to create a backup. I’ll name the backup wordpress-backup and ensure the backup only includes the resources in the wordpress namespace:

$ velero backup create wordpress-backup --include-namespaces wordpress

Backup request "wordpress-backup" submitted successfully.
Run `velero backup describe wordpress-backup` or `velero backup logs wordpress-backup` for more details.

I can also use the velero client to ensure the backup is compeleted by waiting for Phase: Complete:

$ velero backup describe wordpress-backup

Name:         wordpress-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

--output omitted--

I’ll navigate back to the web browser and refresh (or log back into) the Minio UI. Notice the restic folder, which holds houses our backups persistent data, as well as a backups folder:

I’ll select the backups folder and note the wordpress-backup folder in the subsequent directory. I’ll also explore the contents of the wordpress-backup folder, which contains all of the Kubernetes resources from mywordpress namespace:

Now that I’ve confirmed my backup was successful and have verified the data has been stored in Minio via the web UI, I am ready to completely delete my WordPress application. I will accomplish this by deleting the wordpress namespace, which will delete all resources created in the namespace to support the WordPress application, even the persistentvolumeclaims

$ kubectl delete namespace wordpress


$ kubectl get pods -n wordpress
$ kubectl get pvc -n wordpress

After I’ve confirmed all of the resources in the wordpress namespace have been deleted, I’ll refresh the browser to verify the blog is no longer available.

Now we’re ready to backup!! I’ll use the velero client to verify the existence/name of the backup that was previously created and restore the backup to the cluster:

$ velero backup get

NAME               STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
wordpress-backup   Completed   2019-10-03 15:47:07 -0400 EDT   29d       default            <none>


$ velero restore create --from-backup wordpress-backup

I can monitor the pods in the wordpress namespace and wait for both pods to show 1/1 in the READY column and Running in the STATUS column:

$ kubectl get pods -n wordpress -w

NAME                                  READY   STATUS     RESTARTS   AGE
cut-birds-mariadb-0                   0/1     Init:0/1   0          12s
cut-birds-wordpress-fbb7f5b76-qtcpp   0/1     Init:0/1   0          13s
cut-birds-mariadb-0                   0/1     PodInitializing   0          18s
cut-birds-mariadb-0                   0/1     Running           0          19s
cut-birds-wordpress-fbb7f5b76-qtcpp   0/1     PodInitializing   0          19s
cut-birds-wordpress-fbb7f5b76-qtcpp   0/1     Running           0          20s
cut-birds-mariadb-0                   1/1     Running           0          54s
cut-birds-wordpress-fbb7f5b76-qtcpp   1/1     Running           0          112s

Then, I can verify the URL of the WordPress blog:

$ kubectl get services -n wordpress

NAME                  TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
cut-birds-mariadb     ClusterIP      10.100.200.39   <none>         3306/TCP                     2m56s
cut-birds-wordpress   LoadBalancer   10.100.200.32   10.96.59.120   80:32393/TCP,443:31585/TCP   2m56s

And finally, I can access the URL of the blog in the web broswer, confirm the test post that was visible initially is still present:

There you have it!! Our application and it’s persistent data have been completely restored!!

In this example, we manually created a backup, but we can also use the Velero client to schedule backups on a certain interval. See examples below:

velero schedule create planes-daily --schedule="0 1 * * *" --include-namespaces wordpress
velero schedule create planes-daily --schedule="@daily" --include-namespaces wordpress

Conclusion

In this blog post, I walked through the process of installing Velero in a Kubernetes cluster, including all it’s required components, to support taking backups of Kubernetes resources. I also walked through the process for taking a backup, simulating a data loss scenario, and restoring that backup to the cluster.

Deploying Kubeapps and Exposing the Dashboard via Ingress Controller in Enterprise PKS

In this post, I’d like to take some time to walk through the process of deploying Kubeapps in an Enterprise PKS kubernetes cluster. I’ll also walk through the process of utilizing the built-in ingress controller provided by NSX-T to expose the Kubeapps dashboard via a fully qualified domain name.

What is Kubeapps?

There’s been a lot of excitement in the Cloud Native space at VMware since the acquisition of Bitnami last year. The Bitnami team has done a lot of amazing work over the years to simplify the process of application deployment across all types of infrastructure, both in public and private clouds. Today we are going to take a look at Kubeapps. Kubeapps, an open source project developed by the folks at Bitnami, is a web-based UI for deploying and managing applications in Kubernetes clusters. Kubeapps allows users to:

  • Browse and deploy Helm charts from chart repositories
  • Inspect, upgrade and delete Helm-based applications installed in the cluster
  • Add custom and private chart repositories (supports ChartMuseum and JFrog Artifactory)
  • Browse and provision external services from the Service Catalog and available Service Brokers
  • Connect Helm-based applications to external services with Service Catalog Bindings
  • Secure authentication and authorization based on Kubernetes Role-Based Access Control

Assumptions/Pre-reqs

Before we get started, I wanted to lay out some assumptions and pre-reqs regarding the environment I’m using to support this Kubeapps deployment. First, some info about the infrastructure I’m using to support my kubernetes cluster:

  • vSphere 6.7u2
  • NSX-T 2.4
  • Enterprise PKS 1.4.1
  • vSphere Cloud Provider configured for persistent storage
  • A wildcard DNS entry to support your app ingress strategy

I’m also making the assumption that you have Helm installed on your kubernetes cluster as well. Helm is a package manager for kubernetes. Helm uses a packaging format called charts. A chart is a collection of files that describe a related set of Kubernetes resources. A single chart might be used to deploy something simple, like a memcached pod, or something complex, like a full web app stack with HTTP servers, databases, caches, and so on. Kubeapps uses Helm charts to deploy application stacks to kubernetes clusters so Helm must be deployed in the cluster prior to deploying Kubeapps. In this tutorial, we’re actually going to deploy kubeapps via the helm chart as well!

Finally, in order for Kubeapps to be able to deploy applications into the cluster, we will need to create a couple of Kubernetes RBAC resources. First, we’ll create a serviceaccount (called kubeapps-operator) and attach a clusterrole to the serviceaccount via a clusterrolebinding to allow the service account to deploy apps in the cluster. For the sake of simplicity, we are going to assign this service account cluster-admin privileges. This means the kubeapps-operator service account has the highest level of access to the kubernetes cluster. This is NOT recommended in production environments. I’ll be publishing a follow-up post on best practices for deploying Helm and Kubeapps in a production environment soon. Stay tuned!

Preparing the Cluster for a Kubeapps Deployment

This first thing we’ll want to do is add the Bitnami repo to our Helm configuration, as the Bitnami repo houses the Kubeapps Helm chart:

$ helm repo add bitnami https://charts.bitnami.com/bitnami

Now that we’ve added the repo, let’s create a namespace for our Kubeapps deployment to live in:

$ kubectl create ns kubeapps

Now we’re ready to create our serviceaccount and attach our clusterole to it:

$ kubectl create serviceaccount kubeapps-operator 
$ kubectl create clusterrolebinding kubeapps-operator \
--clusterrole=cluster-admin \
--serviceaccount=default:kubeapps-operator

Let’s use Helm to deploy our Kubeapps application!!

helm install --name kubeapps --namespace kubeapps bitnami/kubeapps \
--set mongodb.securityContext.enabled=false \
--set mongodb.mongodbEnableIPv6=false

Note, we could opt to set frontend.service.type=LoadBalancer if we wanted to utilize the Enterprise PKS/NSX-T integration to expose the dashboard via a dedicated IP but since we’re going to use an Ingress controller (also provided by NSX-T), we’ll leave that option out.

After a minute or two, we can check what was deployed via the Kubeapps Helm chart and ensure all the pods are available:

$ kubectl get all -n kubeapps

Exposing the Kubeapps Dashboard via FQDN

Our pods and services are now available, but we haven’t exposed the dashboard for access from outside of the cluster yet. For that, we need to create an ingress resource. If you review the output from the screenshot above, the kubeapps service, of type ClusterIP, is serving out our dashboard on port 80. The kubernetes service type of ClusterIP only exposes our service internally within the cluster so we’ll need to create an ingress resource that targets this service on port 80 so we can expose the dashboard to external users.

Part of the Enterprise PKS and VMware NSX-T integration provides an ingress controller per kubernetes cluster provisioned. This ingress controller is actually an L7 Load Balancer in NSX-T primitives. Any time we create an ingress service type in our Enterprise PKS kubernetes cluster, NSX-T automatically creates an entry in the L7 load balancer to redirect traffic, based on hostname, to the correct services/pods in the cluster.

As mentioned in the Pre-reps section, I’ve got a wildcard DNS entry that redirects *.prod.example.com to the IP address of the NSX-T L7 Load Balancer. This will allows my developers to use the native kubernetes ingress services to define the hostname of their applications without having to work with me or my infrastructure team to manually update DNS records every time they want to expose an application to the public.

Enough talk, let’s deploy our ingress controller! I’ve used the .yaml file below to expose my Kubeapps dashboard at kubeapps.prod.example.com:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: kubeapps-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: kubeapps.prod.example.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: kubeapps 
          servicePort: 80

As we can see, we are telling the Ingress service to target the kubeapps service on port 80 to “proxy” the dashboard to the public. Now let’s create that ingress resource:

$ kubectl create -f kubeapps-ingress.yaml -n kubeapps

And review the service to get our hostname and confirm IP address of the NSX-T L7 Load Balancer:

$ kubectl get ing -n kubeapps
NAME               HOSTS                       ADDRESS                     PORTS   AGE
kubeapps-ingress   kubeapps.prod.example.com   10.96.59.106,100.64.32.27   80      96m

Note, the 10.96.59.106 address is the IP of the NSX-T Load Balancer, which is where my DNS wildcard is directing requests to, and the HOSTS entry is the hostname our Kubeapps dashboard should be accessible on. So let’s check it out!

Now we’re ready to deploy applications in our kubernetes cluster with the click of a button!!

Behind the Scenes with NSX-T

So let’s have a look at what’s actually happening in NSX-T and how we can cross reference this with what’s going on with our Kubernetes resources. As I mentioned earlier, any time an Enterprise PKS cluster is provisioned, two NSX-T Load Balancers are created automatically:

  • An L4 load balancer that fronts the kubernetes master(s) to expose the kubernetes API to external users
  • An L7 load balancer that acts as the ingress controller for the cluster

So, we’ve created an ingress resource for our Kubeapps dashboard, let’s look at what’s happening in the NSX-T manager.

So let’s navigate to the NSX-T manager, login with our admin credentials and navigate to the Advanced Networking and Security tab. Navigate to Load Balancing and choose the Server Pools tab on the right side of the UI. I’ve queried the PKS API to get the UUID for my cluster (1cd1818c...), which corresponds with the LB we want to inspect (Note: you’ll see two LB entries for the UUID mentioned, one for kubernetes API, the other for the ingress controller):

Select the Load Balancer in question and then select the Pool Members option on the right side of the UI:

This will show us two kubernetes pods and their internal IP addresses. Let’s go back to the CLI and compare this with what we see in the cluster:

$ kubectl get pods -l app=kubeapps -o wide -n kubeapps
NAME                        READY   STATUS    RESTARTS   AGE    IP            NODE                                   
kubeapps-7cd9986dfd-7ghff   1/1     Running   0          124m   172.16.17.6   0faf789a-18db-4b3f-a91a-a9e0b213f310
kubeapps-7cd9986dfd-mwk6j   1/1     Running   0          124m   172.16.17.7   8aa79ec7-b484-4451-aea8-cb5cf2020ab0

So this confirms that our 2 pods serving out our Kubeapps dashboard are being fronted by our L7 Load Balancer in NSX-T.

Conclusion

I know that was a lot to take in but I wanted to make sure to review what the actions we performed in this post:

  • Created a serviceaccount and clusterrolebinding to allow Kubeapps to deploy apps
  • Deployed our Kubeapps application via a Helm Chart
  • Exposed the Kubeapps dashboard for external access via our NSX-T “ingress controller”
  • Verified that Enterprise PKS and NSX-T worked together to automate the creation of all of these network resources to support our applications

As I mentioned above, stay tuned for a follow up post that will detail security implications for deploying Helm and Kubeapps in Production environments. Thanks for reading!!!