Ever “accidentally” delete your app or namespace from your Kubernetes cluster? Or even worse, destroyed your entire cluster?!?! Well… have no fear, the Tanzu Mission Control team recently announced the release of the Data Protection feature for Tanzu Mission Control. This new feature utilizes the open source project Velero to provide backup, migration, and recovery functionality for any Kubernetes cluster under the control of Tanzu Mission Control. As mentioned in the previously linked blog post, Tanzu Mission Control handles the installation and on-going lifecycle management of the Velero components running on the cluster so no knowledge of Velero is required to take advantage of this new feature!
In this blog post, I will walk through the process of utilizing the Data Protection feature to backup a WordPress application deployed on a Tanzu Kubernetes Grid (TKG) cluster in AWS. The WordPress application will utilize persistent volume claims (PVCs) to store persistent data to support the blog. After taking the backup, I will simulate a data loss scenario by deleting the namespace containing the application and then use the Tanzu Mission Control console to restore the application and its persistent data!
Tanzu Mission Control Overview
Before diving in to the blog, I wanted to give a brief overview of Tanzu Mission Control. Tanzu Mission Control provides a single control point for teams to more easily manage Kubernetes and operate modern, containerized applications across multiple clouds and clusters. Tanzu Mission Control codifies the know-how of operating Kubernetes—including deploying and upgrading clusters, setting policies and configurations, understanding the health of clusters and the root cause of underlying issues, and creating a map from the “people structure” to the infrastructure.
With VMware Tanzu Mission Control, VMware is providing customers with a powerful, API driven platform that allows operators to apply policy to individual clusters or groups of clusters, establishing guardrails and freeing developers to work within those boundaries. A SaaS based control plane will securely integrate with a Kubernetes cluster through an agent and supports a wide array of operations on the cluster. That includes lifecycle management (deploy, upgrade, scale, delete) of cloud-based clusters via Cluster API.
A core principle of the VMware Tanzu portfolio is to make best use of open source software. Tanzu Mission Control leverages Cluster API for Life Cycle Management, Velero for backup/recovery, and Sonobuoy for configuration control and conformance testing.
Velero Overview
Velero is an open source project that provides backup, restore and migration capabilities for Kubernetes cluster resources and persistent volumes. Velero allows users to, take backups of entire cluster and restore in case of loss, migrate cluster resources to other clusters, and replicate production cluster to development and testing clusters for testing and change management purposes. To learn more about Velero, visit the project’s homepage here
Overview of the WordPress Application
Before we jump into the tutorial, I wanted to provide a quick overview of the WordPress application deployed in my TKG cluster. I am utilizing a pretty straightforward WordPress deployment (deployment instructions and details can be found here) that utilizes PVCs as persistent storage for the blog. See the diagram above for the full overview of all the various components of the application. In this example, all of these components will be deployed into the wordpress
namespace on my TKG cluster. For the purposes of this blog post, the application has already been deployed and populated with a handful of sample blog posts. See a screenshot of the landing page of the blog in its current state belowbelow:
Enabling the Data Protection Functionality in Tanzu Mission Control
I won’t walk through this procedure in detail but I wanted to provide a high level overview of the process for turning the data protection functionality on in a Tanzu Mission Control environment. The basic steps are:
- Create a Cloud Provider Account credential in Tanzu Mission Control
- Download Cloudformation stack from Tanzu Mission Control
- Access your AWS account and deploy the Cloudformation stack
- Retrieve role ARN from stack installation
- Provide role ARN to Tanzu Mission Control to create credential
Note, currently the data protection feature only supports AWS S3 buckets as a backup endpoint. Bring your own S3 compatible API end point is on the roadmap for future inclusion.
At this point, you are ready to enable the data protection functionality for clusters under the control of Tanzu Mission Control. For more information on creating a Cloud Provider Account credential, refer to the VMware documentation.
Enabling Data Protection for a Specific Cluster
The first thing I’ll do is sign in to my VMware Cloud Services account and access the Tanzu Mission Control console via the CSP portal. At that point, I will click the link to my tkg-cluster
to head to the cluster overview page:
Once I’ve landed on the cluster overview page, I can click the ENABLE DATA PROTECTION link at the bottom of the page to initiate the deployment of the Velero services on my TKG cluster:
At this point, I’ll need to select a Cloud Provider Account credential that I created previously when deploying the Data Protection Cloudformation stack in my AWS account and click ENABLE to initiate the deployment of the Velero components on my TKG cluster:
Now, if I hop over to my active Powershell session that I’ve used to interact with my TKG cluster via kubectl,
I can actually observe the Velero namespace and pod being deployed on my cluster:
That was easy!! As mentioned previously, Tanzu Mission Controls completely handles the Velero management components automatically. Now I’m ready to backup my WordPress application via the Tanzu Mission Control console.
Backing Up WordPress Application
Now that I’ve enabled data protection on my cluster and verified the Velero components were deployed locally on the cluster, I’m going to head back to the Tanzu Mission Control console and initiate my first backup by creating the CREATE BACKUP link in the Data Protection section of the cluster overview page:
I need to define the parameters of my backup. As you’ll note in the screenshot, I can backup a particular namespace, an entire cluster, or use a label selector to only backup certain resources that are marked with a specific key:value
label. In this exercise, my application and its supporting components are all deployed in the wordpress
namespace, so I’m going to choose to only backup that particular namespace:
Tanzu Mission Control allows users to automatically set retention dates for backups. For this example, I’m only going to set the retention time period to 1 day. This means the Velero controller will automatically delete this backup in 24 hours:
Finally, I’m going to name the backup and initiate the creation of said backup:
Since this is a fairly “small” blog, the backup won’t take long but I’ll wait to verify the status of the backup transitions to Ready state before proceeding:
Once the backup is in a ready state, I can click on the name of the backup to view and verify characteristics of the backup. Note the general information on the backup, as well as the number of namespaces and persistent volumes (2) included in the backup:
Simulate Data Loss Scenario
In order to simulate a data loss scenario, I’m going to completely delete the wordpress
namespace on my TKG cluster. This will delete ALL of the resources (pods, services, PVCs, etc) that comprise my WordPress application:
After deleting the namespace, I’m going to verify that the wordpress
namespace is no longer present on the cluster. I’ll also confirm that no persistent volume claims are present on ANY namespace (by using the -A
flag) in the cluster:
At this point, if I refresh the webpage, I notice that the blog is no longer available:
Uh oh… Let’s hope Tanzu Mission Control can help me restore my VERY important blog…
Restoring Backup
So now that I’ve lost my production app and all of it’s supporting data, I need to attempt to restore my backup via Tanzu Mission Control. First, I’ll head back to the console, access the Data Protection tab, and initiate the creation of a restore from my previously created backup:
Since my backup only contained the wordpress
namespace, I can choose to restore the entire backup:
After providing a name for the backup, I can initiate the restoration process:
Once my restore transitions to the Ready status, I’ll be ready to check that the application resources are restored on the cluster:
I’ll head back over to my Powershell session to verify that the wordpress
namespace is now present, and that the namespace contains pods, PVCs, and services to support the application. I’ll also note the new LoadBalancer
external address for my blog to test external access:
Now, if I visit the LoadBalancer
address in a browser, my website should be available again. I’ll also confirm it still contains the blog posts I created before the data loss:
Phew, that was a close one!! Thank the SRE gods that Tanzu Mission Control and Velero had my back!!
Conclusion
Well that about wraps it up! I hope this post detailing the workflow for enabling the data protection feature, creating and restoring backups via Tanzu Mission Control was helpful.
One of the main motivations for creating this post was inspired by the work I have been doing for the creation of the VMware Tanzu Mission Control Hands on Lab, which will soon include a Data Protection module. The Tanzu Mission Control HOL gives users access to a live Tanzu Mission Control environment paired with an AWS accoutn that can support the creation of TKG clusters in real time, for the low, low price of free.99!!!!! Give it a spin to get hands on experience with Tanzu Mission Control!! Enjoy!!