In this post, we will walk through enabling and configuring vSphere’s Workload Management for the Tanzu Kubernetes Grid.
In part 1 we went over the pre-requisites and deploying the HAProxy appliance to implement TKG using a distributed vswitch.
Now we will go through the Workload Management configuration and implementation.
In vCenter, we go to Menu then Workload Management.
As we do not have this deployed, we get some information. Click on Get Started to begin the wizard.
As we do not have NSX-T, the only networking stack option is vCenter Server network. There’s also the disclaimer at the top stating we must use HAProxy due to not using NSX-T. If you have a EML vCenter deployment, select the appropriate vCenter then Next.
If you have multiple compute clusters, select the appropriate cluster then Next.
Select the control plane size. These will be the resources provided to the supervisor cluster. There will always be 3x VMs for the supervisor cluster and the sizing is per VM.
I have been unable to find any sizing guidance.
As this is just my homelab, I selected Tiny.
Now select a storage policy that will include the storage you want these VMs to be deployed on. As I created a Tanzu Storage Policy for this purpose, that is the policy I selected.
Now we need to provide information related to the HAProxy appliance we deployed. Provide the name of the appliance, type is HA Proxy, data plane API address(es) would bet he management IP(s) and port.
Provide the local admin account credentials.
IP address ranges for virtual servers is the frontend IP range we configured in part 1; “Load Balancer IP Ranges” provided in CIDR notation. In my case, it was 10.0.12.128/25 which would be 10.0.12.129-10.0.12.254.
We also need to provide the certificate from the HAProxy. SSH to your HAProxy, then:
Now provide info for the management network. Provide the starting IP address to provide to the 3x control plane VMs, along with other relevant IP configurations.
Note: I’ve heard that some folks have issues with a successful deployment if DNS Search Domain is left blank. I have not attempted without it, so I have not experienced issues related to that.
Now to configure the workload network. IP address for Services should be left the default, unless it interferes with your existing environment. It does not need to be a routed network, as it’s an internal network for TKG, but it does require a DNS Server.
Click “Add” just under Workload Network.
Provide a name, select your workload portgroup, gateway, subnet mask, and address range to be allocated out. Save.
We should now see this. Next.
Now we need to add the previously created content library with the cluster VM templates.
Select the subscribed content library, then OK.
Done with the wizard, click Finish.
We should see some activity. Resource pool and folder being created, provisioning agent VMs, hosts download VM templates from the content library.
You can review the configuration status in the top pane. If there’s a number in parenthesis under Config Status, you can click on it to bring up a bit more detail.
We can see it’s just informational, stating the Master VM is still being provisioned and configured. The deployment and configuration can take anywhere from 30 minutes to a couple hours.
Just an interesting side note; this was the umpteenth time I deployed this, as I initially made a few mistakes and experienced some strange behaviors.
This time going through, despite my domain account being a vCenter administrator, there were a number of tasks that did not get displayed.
These below were seen when logged in with the SSO administrator account; shows task names of deploy OVF template, reconfigure virtual machine with VM names, etc. These are not displayed when logged in with my domain account.
This is from my domain account.
At the same time, I took this screenshot from the session with SSO administrator account. Notice some additional tasks listed that aren’t displayed on the domain account session.
Now we look at config status again; shows an error configuring cluster NIC on master VM. Whatever caused the error did eventually resolve as I was patient and it did complete. Again, this can take upwards of a couple hours to complete so be patient.
It has finished. We can see the control plane node IP address on the frontend network, which should be accessible in a browser on HTTPS. Config Status is Running, so we should now be enabled.
Validate you can access the page on that IP. We should get links to download the CLI tools.
To help validate this, we can use the CLI tools to connect to our supervisor cluster and get the status of the supervisor cluster nodes.
C:\kubectl>kubectl-vsphere.exe login --vsphere-username [email protected] --server=https://10.0.12.129 --insecure-skip-tls-verify Password: Logged in successfully. You have access to the following contexts: 10.0.12.129 10.0.13.129 tkg-k8s-prod If the context you wish to use is not in this list, you may need to try logging in again later, or contact your cluster administrator. To change context, use
kubectl config use-context <workload name>C:\kubectl>kubectl get nodes NAME STATUS ROLES AGE VERSION 42069924bae46dc843987c4258871a08 Ready master 33h v1.18.2-6+38ac483e736488 4206a9ee43a052d8038c4ef1bc1d61aa Ready master 32h v1.18.2-6+38ac483e736488 4206b21ac3cd049645f2af3aaee7f4b4 Ready master 32h v1.18.2-6+38ac483e736488 C:\kubectl>kubectl get ns NAME STATUS AGE default Active 33h kube-node-lease Active 33h kube-public Active 33h kube-system Active 33h svc-tmc-c7 Active 32h tkg-k8s-prod Active 25h vmware-system-appplatform-operator-system Active 33h vmware-system-capw Active 33h vmware-system-cert-manager Active 33h vmware-system-csi Active 32h vmware-system-kubeimage Active 33h vmware-system-lbapi Active 33h vmware-system-license-operator Active 32h vmware-system-logging Active 33h vmware-system-netop Active 33h vmware-system-registry Active 33h vmware-system-tkg Active 33h vmware-system-ucs Active 33h vmware-system-vmop Active 33h C:\kubectl>
We use get nodes to get the status of the individual supervisor cluster nodes. Get ns is to get all the namespaces.
We need to use the –insecure-skip-tls-verify as this is not (yet) using a trusted certificate, as it’s using the default self-signed certificate.
Now that we have Workload Management deployed, the next step will be deploying the TKG cluster and an actual workload. We will get to that in part 3.