Very first VM launched won't response to ARP request

Bug #1422785 reported by Danny Choi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-cisco
New
Undecided
Unassigned

Bug Description

I’m seeing this issue more consistently with Nexus VXLAN offload, and I think I found the culprit;
seemingly a timing issue.

In order to pass traffic, the 2 N9K VXLAN gateways have to form NVE peers first.

When the very 1st VM is launch, the cisco_nexus driver will configure the N9Ks with the VLAN,
VNI and multicast address info.

After the 9Ks are configured, it takes time to form the NVE peers.

In the mean time, Neutron spawned the DHCP server who in turns gives out an fixed
IP address to the VM.

However, since the data path is not yet established between the 9Ks, the VM never received
the IP address from the DHCP server. This can be confirmed from the VM console log:

Starting network...
udhcpc (v1.20.1) started
Sending discover...
Sending discover...
Sending discover...
No lease, failing
WARN: /etc/rc3.d/S40-network failed
.
.
=== pinging gateway failed, debugging connection ===
############ debug start ##############
### /etc/init.d/sshd start
Starting dropbear sshd: OK
route: fscanf
### ifconfig -a
eth0 Link encap:Ethernet HWaddr FA:16:3E:2E:C7:1C
          inet6 addr: fe80::f816:3eff:fe2e:c71c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:23 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1768 (1.7 KiB) TX bytes:1114 (1.0 KiB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1020 (1020.0 B) TX bytes:1020 (1020.0 B)

### route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
route: fscanf
### cat /etc/resolv.conf
cat: can't open '/etc/resolv.conf': No such file or directory
### gateway not found
/sbin/cirros-status: line 1: can't open /etc/resolv.conf: no such file

Now, if I reboot the VM, and since the data path is already established between the 9Ks,
i.e. NVE peers are formed, the VM will receive the IP address from the DHCP server:

Starting network...
udhcpc (v1.20.1) started
Sending discover...
Sending select for 10.0.0.2...
Lease of 10.0.0.2 obtained, lease time 86400

This also explains why the subsequent VMs do not have the same problem.

Tags: nexus
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.