Problem Statement:
There are two virtual machines which are in same VXLAN (5002), but running on two different ESXi host in same cluster. Now while pinging both virtual machine to each other, it is not reachable.This environment is running on NSX-V.
Cause:
You need to troubleshoot L2 connectivity for these Virtual Machines. Follow below steps to fix it.
Solution:
1) Understand the Network Topology
2) Verify Network Topology for these two Virtual Machines
- Verify VXLAN port group tagging with both Virtual Machine
- Verify that NIC is connected in both Virtual machines
- Verify that Port Group exists in dvSwitch
3) Verify the problem from Physical Network
- Verify the Physical Network
- Check VTEP IP for both ESXi Host
- Ping VTEP IP from one host to another host
4) Checks for VXLAN Routing
- Check default gateway of VTEP kernel port(VXLAN traffic)
- Ping VTEP IP with packet size of 1570
- Check ARP and Make Sure it is discovery other devices
5) Verify NSX Control Plane
- Check conenctivity between ESXi host and NSX controller
- Check status of Controller Daemon NETCPAD
- Check Controller Status for VXLAN 5002
6) Check Distributed Firewall Policy
- Check Packet Flow
- Check DFW Rule for these two VMs
Step by Step Troubleshooting Guide:
Review the Virtual Machine Information
- Review the Port Groups of Virtual Machines. To do this, Right Clicko on VM > Edit Settings > Check Network Adaptor
- It's VXLAN network as showing segment ID 5002.
- VLAN can only range from (1-4094), but Segement ID for VXLAN in VMware starts from (5000-16777215).
- Below Virtual Machine is showing ID 5002, therefore it is VXLAN network.
- Verify other virtual machines running in same port group.
Understand the Network Topology of your impacted virtual machine
- There are two ESXi Hosts. Impacted VM1 is running on ESXi-01, and VM2 is running on ESXi-02.
- VXLAN ID for both Virtual Machine is 5002.
- These two VMs are running on same dvSwitch.
Verify that NIC is connected in both impacted Virtual machines. To do so;
- Right Click on Virtual Machine and Click on Edit Settings.
- Check Network Adaptor and ensure that it is connected with correct Port Group.
- Check if Network Adaptor is showing Connected.
- Do the same validation for second VM.
Verify that Port Group exist in dvSwitch. To check this;
- You need to go in Networking tab in vCenter. Under this, you will get dvSwitch name as showing below as "RegionA01-vDS-COMP".
- Click on dvSwitch, Go to Port Groups tab. Verify the Port group name.
Verify that Port group is VXLAN network. To check this;
- Goto Networking Tab, Click on dvSwitch.
- Under Networks, you will get this port group name. Scroll it and you will see VXLAN tag there. It means that it is perfect.
Verify the problem from Physical Network:
Undestand the Physical Topology of the connectivity between these two virtual machines.
- Here we have two virtual machines which are located on two different ESXi Hosts. When pinging both virtual machines, it is going through VTEP kernel port which is VXLAN Endpoint hosted on ESXi hosts. VTEP contains a IP address and it uses network adaptor of ESXi Host.
- Logical switch is connected to Transport Zone.
- In order to ping a virtual machine on one ESXi host to VM on another ESXi host, connectivity between tho VTEP must be there, else it wont go over the VXLAN network.
- In our design, VTEP IP for ESXi hosts are as below.
- VTEP IP for ESXi-01: 192.168.130.52
- VTEP IP for ESXi-02: 192.168.130.51
- These two IPs must be reachable from one host to another host. You need to login to ESXi Host putty to ping the IP address.
To check the IP address of VTEP of your ESXi Host, you need to;
- Loging to vCenter Server. Click on ESXi Host > Configure Tab
- Click on VMKernel Adapters. Here you can see the TCP IP/Stack for vmk3 is showing as VXLAN. That is the one which is VTEP kernel port. It will will always be a VXLAN network.
- Here you are also able to see IP address of vmk3. You need to note this. This is VTEP IP address for this ESXi-02.
- Perform the same for ESX-01 to check VTEP IP.
- Now I am pinging VTEP IP of ESXi-02(192.168.130.51) from ESXi-01. I have logged into putty of ESXi-01. Run below command.
vmkping 192.168.130.51
You may also check the transport zone on which these Logical switches are running. To check this;
- You need to go to Network & Security under vCenter Server > Click on Installation and Upgrade
- Click on Logical Network Settings > Click on Transport Zone.
- Click on Transport Zone "RegionA0-Global-TZ" to see how many ESXi Hosts are participating.
Verify VXLAN Routing:
Check Default Gateway of VTEP kernel port(VXLAN traffic), To do so;
- Click on ESXi Host > Go to Configure Tab
- Click on vmKernel Adaptor > Select vmk3 (vmk3 is configured with VTEP IP for VXLAN traffic).
- Go to IPv4 Settings
Here you can check configured default gateway. You can also try to ping this IP address from ESXi Host putty.
Please note that NSX use use different IP stack for VXLAN traffic, so we need to verify if default gateway is configured correctly for VXLAN traffic.
- You may also check default gateway through ESXCLI. Use below command to check the default gateway. Here you can notice that there is different TCP/IP stack for Gateway. It is not configured with IPv4(192.168.130.0).
esxcli network ip route ipv4 list -N vxlan
- Check the same for second ESXi host as well.
Ping VTEP IP with packet size of 1570;
Test IP VTEP interface connectivity with packet size of 1570, and verify that the MTU is supporting VXLAN encapsulation. Ping the vmknic interface IP address using below command. Use VTEP IP of ESXi-02(destination) when you are running this command on ESXi-01(Source).
ping ++netstack=vxlan IP_Address_Of_VTEP -s 1570 -d
The -d flag sets the don't-fragment (DF) bit on IPv4 packets. The -s flag sets the packet size.
Follow the same command to ping from ESXi-02 host. Use VTEP IP of ESXi-01.
Check ARP and make sure it is discovery other devices.
ARP and Network Discovery Protocol are used in IPv4 and IPv6 respectively at the network layer for discovering other devices on the same link. A cache of neighboring devices' IP and MAC addresses is maintained by the ESX/ESXi host's VMkernel networking stack. The cache is used for mapping logical IP addresses to link-layer MAC addresses for outbound traffic on VMkernel network interfaces.
To check this, use below command on ESXi putty.
- Login to ESXi host using putty, and run below command.
esxcli entwork ip neighbor list -N vxlan
Verify NSX Control Plane Connectivity with ESXi Host:
Check port connectivity between ESXi host and NSX Controller. Make sure that port should be open.
NSX Controller uses port number 1234 to make connection with ESXi host. So you need to check Port number 1234 if it is established or not. To do so,
- Putty to ESXi host and run below command.
esxcli network ip connection list | grep 1234
- Run the same command on ESXi-02.
Check the status of NETCPAD Daemon. To check this;
- Login to ESXi Host using Putty, and Run below command.
/etc/init.d/netcpad status
NETCPAD is an User World Agent(UWA) which run as a service daemon called netcpa. It mediates between NSX controller and hypervisor kernel module communication except for DFW. It maintains logs at /var/log/netcpa.log on the ESXi host.
- Check the status of netcpad on ESXi-02 as well.
Check Controller Status for VXLAN 5002 and make sure that it is UP. To check this;
- Login to ESXi Host using putty, and run below command.
esxcli network vswitch dvs vmware vxlan network list --vds-name vDS_Name
Check Flow Monitoring from Source (VM-01) to Destination (VM-02):
Here you can watch the TCP and UDP connections to a specific vNIC. To check the packet flow between two virtual machines, follow the below steps;
- Login to vCenter Server and go to "Networking and Security" to open NSX Management Console.
- Go to Flow Monitoring section. Under Live Flow, Click on "Select vNIC".
- Select the Virtual Macihne(VM-01). Click on > icon.
- Select the Virtual Network Interface Card(vNIC) of Virtual Machine. Click on OK.
Click on Apply.
- Enter the IP address of VM-02. Click on Apply.
- Click on Start to check the Flow.
- There is no packet being transferred to this traffic as showing in below figure. In next step, You may check the Distributed Firewall Rules (DFW) rules to check if Source to Destination communication is blocked.
Check Distributed Firewall Rules for VM-01 and VM-02. To check this;
- Login to vCenter Server and go to "Networking and Security" to open NSX Management console.
- Go to Firewall section.
Just to make you familiar, I have created a DFW rule for particular virtual machines (VM-01 and VM-02) from source to destination.You can check for particular subnet in your environment if the IP subnet is allowed from Source or Destination. Make sure that it is not denied. It may be a reason that traffic is not flowing between these two virtual machines. This is also a reason that Virtual Machine is not pingable.
That's all I had to share on this topic. In next blog, I will work on below points.
VM is not pingable:
in same logical network, on different host (Done, in this article)
- in same logical network, on same host
- in different logical network, on different host
- in different logical network, on same host
- VM in VXLAN, Physical server in VLAN, Unable to ping.
NSX Experts - Please share your inputs if I missed something to add either here in comment box or directly in my article. I will add those points to the blog.