Nutanix Autopathing – The search for the MAGIC!
I’m currently attending the Nutanix NPP (Nutanix Platform Professional) training in Munich and during one module a question regarding CVM Autopathing come up.
CVM (Controller VM)
Each node runs an industry standard hypervisor (ESXi, KVM, Hyper-V currently) and the Nutanix Controller VM (CVM). The Nutanix CVM is what runs the Nutanix software and serves all of the I/O operations for the hypervisor and all VMs running on that host. For the Nutanix units running VMware vSphere, the SCSI controller, which manages the SSD and HDD devices, is directly passed to the CVM leveraging VM-Direct Path (Intel VT-d). In the case of Hyper-V the storage devices are passed through to the CVM.
Reliability and resiliency is a key, if not the most important, piece to NDFS (Nutanix Distributed File System). Being a distributed system NDFS is built to handle component, service and CVM failures. A CVM “failure” could include a user powering down the CVM, a CVM rolling upgrade, or any event which might bring down the CVM.
NDFS has a feature called autopathing where when a local CVM becomes unavailable the I/Os are then transparently handled by other CVMs in the cluster. The hypervisor and CVM communicate using a private 192.168.5.0 network on a dedicated vSwitch This means that for all storage I/Os these are happening to the internal IP addresses on the CVM (192.168.5.2). The external IP address of the CVM is used for remote replication and for CVM communication.
In the event of a local CVM failure the local 192.168.5.2 addresses previously hosted by the local CVM is unavailable. NDFS will automatically detect this outage and will redirect these I/Os to another CVM in the cluster over 10GbE. The re-routing is done transparently to the hypervisor and VMs running on the host. This means that even if a CVM is powered down the VMs will still continue to be able to perform I/Os to NDFS. NDFS is also self-healing meaning it will detect the CVM has been powered off and will automatically reboot or power-on the local CVM. Once the local CVM is back up and available, traffic will then seamlessly be transferred back and served by the local CVM.
Below a graphical representation of how this looks for a failed CVM
Training environment Overview
Here is a short Overview of the used training environment which is composed of NX-1000 Blocks.
|ESXi Managment IP vmk0||ESXi NFS VMkernel IP vmk1||CVM Public IP||CVM Host-only IP|
Here is the vSwitch configuration of one ESXi host.
As you can see the CVM has 2 network adapter one for the public connectivity and one for the host-only connectivity to connect to the NFS datastore(s).
When creating a container, which is a NFS export it is automatically connected to the ESXi hosts.
After seeing all of this we were not sure how an ESXi host can reach 192.168.5.2 if the local CVM is not available. After a tweet Michael Webster gave me some hint.
That means that NFS traffic can also run through the vmk0 port which is connected to the 10Gbit uplink ports. But how can traffic flow when the NFS export is mounted through 192.168.5.2 and I’m using 10.0.32.101 for the first ESXi host. This question struggles my brain very long. I tried some test where I created a VM with 100GB of eagerzored disks and shutdown the CVM during the zeroing process. What I saw through esxtop was that the traffic from the host where no CVM was running, was going through vmk0 to another host and there through the 10Gbit port into that local CVM public vNIC.
After some more research I finally found where the MAGIC happens.
Here is a screenshot of the routing table when a CVM is running on the ESXi.
When a CVM is shutdown or fails, then it took approx. 30sec until another CVM took over the traffic. The CVM which will take the traffic connects to the ESXi via SSH and configures a route so that all traffic is routed to the public IP address of that CVM.
If the failed CVM comes online again than this route will be deleted so that the traffic is locally again.