After migration, instance doesnt have IP assigned

Bug #1831130 reported by Elio Martinez
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
High
ChenjieXu

Bug Description

Brief Description
-----------------
Performing a migration of an existing instance is causing the IP loose for that instance

Severity
--------

Major: System/Feature is usable but degraded

Steps to Reproduce
------------------
Boot and perform provisioning over a 2+2 configuration, standard storage
Create flavor with property = hw:mem_page_size='large'
Create 1 or 2 instances with this flavor and cirros image
Verify instance health with "openstack server list"
| 354f6dc6-a569-4639-90cd-5015bba8118e | elio2 | ACTIVE | private-net0=192.168.201.137 | cirros | m2.tiny |
| 72cd8b21-5378-40ad-8078-cf7cf024c1b0 | elio1 | ACTIVE | private-net0=192.168.201.45 | cirros | m2.tiny |
Verify that your instances shows the same IP that you have on active controller

ifconfig -a
eth0 Link encap:Ethernet HWaddr FA:16:3E:95:A4:E2
          inet addr:192.168.201.45 Bcast:192.168.201.255 Mask:255.255.255.0
          inet6 addr: fe80::f816:3eff:fe95:a4e2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:49 errors:0 dropped:0 overruns:0 frame:0
          TX packets:101 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:6139 (5.9 KiB) TX bytes:9094 (8.8 KiB)

Perform migration for that instance to another compute with "openstack server migrate <instance-id>"
Confirm rezising/migration
Check instance health
Get in to the instance console to verify IP

Expected Behavior
------------------
IP4 should be the same

Actual Behavior
----------------
No IP is present on instance console:

ifconfig -a
eth0 Link encap:Ethernet HWaddr FA:16:3E:95:A4:E2
          inet addr:192.168.201.45 Bcast:192.168.201.255 Mask:255.255.255.0
          inet6 addr: fe80::f816:3eff:fe95:a4e2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:49 errors:0 dropped:0 overruns:0 frame:0
          TX packets:101 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:6139 (5.9 KiB) TX bytes:9094 (8.8 KiB)

Reproducibility
---------------
I create 4 instances at the beginning , for those that were located on compute-0 , the ip was lost since the beginning. forcing ip. And then performing the migration from one compute to other, facing this problem every time that we go from compute-1 to compute-0, not vice versa. All the ips are saved on compute-1 but lost performing migration to compute-0.

System Configuration
--------------------
2+2 system , standard storage. ISO: 20190524. IPv4

Last Pass
---------
Not sure about last pass on this

Timestamp/Logs
--------------

Taking a look in to the instance console, i was observing the following errors on the instance that can not recover the IP

Initializing random number generator... [ 1.018463] random: dd urandom read with 5 bits of entropy available
done.
Starting acpid: OK
Starting network...
udhcpc (v1.23.2) started
Sending discover...
Sending discover...
Sending discover...
Usage: /sbin/cirros-dhcpc <up|down>
No lease, failing
WARN: /etc/rc3.d/S40-network failed
checking http://169.254.169.254/2009-04-04/instance-id
Starting dropbear sshd: failed to get instance-id of datasource

An this is an example for the same instance on compute-1 where is recovering the ip address:

nitializing random number generator... [ 1.023211] random: dd urandom read with 5 bits of entropy available
done.
Starting acpid: OK
Starting network...
udhcpc (v1.23.2) started
Sending discover...
Sending select for 192.168.201.45...
Lease of 192.168.201.45 obtained, lease time 86400
route: SIOCADDRT: File exists
WARN: failed: route add -net "0.0.0.0/0" gw "192.168.201.1"

I just past the full log at paste.openstack.org Paste #752313

Test Activity
-------------
Feature testing

Revision history for this message
Elio Martinez (elio1979) wrote :

following down the logs, Usage: /sbin/cirros-dhcpc <up|down>, trying that after instance boots, i tried with the instruction and is not working, investigating nova and neutron logs

Revision history for this message
Juan Carlos Alonso (juancarlosa) wrote :

I tried to launch 2 VMs, one attached to SRIOV port and the second to PCI-passthrough port.
Found same behaviour:

| dde29d25-8522-4ebe-8eb7-0a96818a769e | vm-pthru-1 | ACTIVE | private-net0=192.168.201.63 | cirros | m1.small |
| 8a926af8-b726-48eb-960a-0dd52ec15276 | vm-sriov-1 | ACTIVE | public-net0=192.168.101.252 | cirros | m1.small |

compute-0:~$ sudo virsh list
 Id Name State
-----------------------------------
 7 instance-0000002b running
 8 instance-00000028 running

compute-0:~$ sudo virsh console instance-0000002b

$ ifconfig
eth0 Link encap:Ethernet HWaddr FA:16:3E:1B:84:CC
          inet6 addr: fe80::f816:3eff:fe1b:84cc/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B) TX bytes:1332 (1.3 KiB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Forrest's team to investigate

tags: added: stx.2.0 stx.distro.openstack stx.networking
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Forrest Zhao (forrest.zhao)
Changed in starlingx:
assignee: Forrest Zhao (forrest.zhao) → ChenjieXu (midone)
Revision history for this message
ChenjieXu (midone) wrote :

Hi Elio,

The IP address appears in the following logs however IP address should not appear based on your description "No IP is present on instance console". Could you please confirm the logs are correct?

Actual Behavior
----------------
No IP is present on instance console:

ifconfig -a
eth0 Link encap:Ethernet HWaddr FA:16:3E:95:A4:E2
          inet addr:192.168.201.45 Bcast:192.168.201.255 Mask:255.255.255.0
          inet6 addr: fe80::f816:3eff:fe95:a4e2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:49 errors:0 dropped:0 overruns:0 frame:0
          TX packets:101 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:6139 (5.9 KiB) TX bytes:9094 (8.8 KiB)

Revision history for this message
ChenjieXu (midone) wrote :

Hi Elio,

"I create 4 instances at the beginning , for those that were located on compute-0 , the ip was lost since the beginning. forcing ip. And then performing the migration from one compute to other, facing this problem every time that we go from compute-1 to compute-0, not vice versa. All the ips are saved on compute-1 but lost performing migration to compute-0."

Based on your above description, it seems there is a problem for VMs on compute-0 to get the IP address. But the VMs on compute-1 can get the IP address. Could you please check which compute node the DHCP is located by executing the following command on compute-0 and compute-1:
   sudo ip netns

Revision history for this message
Elio Martinez (elio1979) wrote :

compute-0$sudo ip netns
qdhcp-19cd096a-c635-4103-a386-46efb56361e8 (id: 0)

compute-1$ sudo ip netns
Password:
qrouter-43dcc7f4-f7b6-4b25-91d3-6becc581362e (id: 7)
qrouter-1655157d-0864-4fe3-9fe3-f039b70833dd (id: 6)
qdhcp-cdf51c28-bde3-4c29-93ca-9e1e0622c0ae (id: 5)
qdhcp-bf6068fc-df0f-40cb-99d8-696f548b5bce (id: 4)
qdhcp-85caffe5-4f62-4b32-b541-fbee71a8dde7 (id: 3)
qdhcp-8333a738-9240-42e6-a660-a17efc29b55a (id: 1)

Revision history for this message
ChenjieXu (midone) wrote :

Hi Elio,

There exists more than one dhcp namespace. Could you please check the subnet for each dhcp namespace by the following command?
   On compute-0:
   sudo ip netns qdhcp-19cd096a-c635-4103-a386-46efb56361e8 exec ifconfig
   On compute-1:
   sudo ip netns qdhcp-cdf51c28-bde3-4c29-93ca-9e1e0622c0ae exec ifconfig
   sudo ip netns qdhcp-bf6068fc-df0f-40cb-99d8-696f548b5bce exec ifconfig
   sudo ip netns qdhcp-85caffe5-4f62-4b32-b541-fbee71a8dde7 exec ifconfig
   sudo ip netns qdhcp-8333a738-9240-42e6-a660-a17efc29b55a exec ifconfig

There is only one dhcp for one subnet. If the dhcp for 192.168.201.0/24 exists on compute-1, could you please check the connectivity between compute-1 and compute-0 by the following methonds:
   1. check the IP addresses of compute-0 and compute-1 by:
      ifconfig
   2. On compute-0, ping the IP address of compute-1:
      ping $COMPUTE-1_IP
   3. Log in the VM on compute-0:
      sudo virsh list
      sudo virsh console $VM
   4. Ping the IP address of DHCP in the VM located on compute-0:
      ping 192.168.201.2

Revision history for this message
Elio Martinez (elio1979) wrote :
Download full text (8.1 KiB)

Hi Chenjie, we just moved to a recent ISO "20190604T144018Z",but with the same problem, showing the following info:
compute-0:~$ sudo ip netns
Password:
qrouter-da72fdce-d5ec-4507-b0d3-9488f59974b2 (id: 7)
qrouter-543c6d12-ea46-4306-9824-057a63e8838e (id: 6)
qdhcp-9f2da2b3-5e87-4fe8-a99e-b05e173b4baf (id: 5)
qdhcp-31e9c0c9-55ee-43ae-919a-fa03f80f71ab (id: 4)
qdhcp-29050f23-f8ba-475e-b462-34b73db305ec (id: 3)

compute-1:~$ sudo ip netns
Password:
qdhcp-26a49682-1aac-4799-9370-40d3431b942e (id: 1)
qdhcp-13c1d984-1bd1-43bb-b868-103c368ad1cb (id: 0)

COMPUTE-0
 sudo ip netns exec qdhcp-9f2da2b3-5e87-4fe8-a99e-b05e173b4baf ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 1000 (Local Loopback)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

tap35491304-72: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500
        inet 192.168.0.2 netmask 255.255.255.0 broadcast 192.168.0.255
        inet6 fe80::f816:3eff:fe35:dc62 prefixlen 64 scopeid 0x20<link>
        ether fa:16:3e:35:dc:62 txqueuelen 1000 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 5 bytes 446 (446.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

sudo ip netns exec qdhcp-31e9c0c9-55ee-43ae-919a-fa03f80f71ab ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 1000 (Local Loopback)
        RX packets 1 bytes 576 (576.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 1 bytes 576 (576.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

tapeac7f54a-4b: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500
        inet 169.254.169.254 netmask 255.255.0.0 broadcast 169.254.255.255
        inet6 fe80::f816:3eff:fe4a:4268 prefixlen 64 scopeid 0x20<link>
        ether fa:16:3e:4a:42:68 txqueuelen 1000 (Ethernet)
        RX packets 121 bytes 12487 (12.1 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 79 bytes 10485 (10.2 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

 sudo ip netns exec qdhcp-29050f23-f8ba-475e-b462-34b73db305ec ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 1000 (Local Loopback)
        RX packets 2 bytes 1152 (1.1 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 2 bytes 1152 (1.1 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

tapf1a60929-96: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500
        inet 192.168.101.2 netmask 255.255.255.0 broadcast 192.168.101.255
        inet6 fe80::f816:3eff:fe55:ce69 prefixlen 64 scopeid 0x20<link>
        ether fa:16:3e:55:ce:69 tx...

Read more...

Revision history for this message
ChenjieXu (midone) wrote :

Hi Elio,

It seems that your interfaces used for datanetwork are not connected between compute-0 and compute-1. Your
network topology is following:

+++++++++++++++++++++++++ +++++++++++++++++++++++++
+ compute-0 + + compute-0 +
+ + + +
+ +++++++++++++++++++ +++++++++++++++++++ +
+ + data enp134s0f0 + + data enp134s0f0 + +
+ +++++++++++++++++++ +++++++++++++++++++ +
+ + + +
+ +++++++++++++++++++ +++++++++++++++++++ +
+ + mgmt enp134s0f1 +++++++++++++++++ mgmt enp134s0f1 + +
+ +++++++++++++++++++ +++++++++++++++++++ +
+ + + +
+++++++++++++++++++++++++ +++++++++++++++++++++++++
                           Figure 1

The enp134s0f0 on compute-0 and compute-1 should be connected physically. You can connect them by following methonds:
   1. connect them by cable.
   2. connect them to the same switch.
After you connect them physically, the network topology should be like following:

+++++++++++++++++++++++++ +++++++++++++++++++++++++
+ compute-0 + + compute-0 +
+ + + +
+ +++++++++++++++++++ +++++++++++++++++++ +
+ + data enp134s0f0 +++++++++++++++++ data enp134s0f0 + +
+ +++++++++++++++++++ +++++++++++++++++++ +
+ + + +
+ +++++++++++++++++++ +++++++++++++++++++ +
+ + mgmt enp134s0f1 +++++++++++++++++ mgmt enp134s0f1 + +
+ +++++++++++++++++++ +++++++++++++++++++ +
+ + + +
+++++++++++++++++++++++++ +++++++++++++++++++++++++
                           Figure 2

You can check the connectivity between them by the following commands:
   on compute-0
      sudo ifconfig enp134s0f0 192.168.50.5/24 up
      ifconfig enp134s0f0
   on compute-1
      sudo ifconfig enp134s0f0 192.168.50.6/24 up
      ifconfig enp134s0f0
      ping 192.168.50.5

Revision history for this message
ChenjieXu (midone) wrote :

Hi Elio,

Sorry, there is a problem with the format of the drawing. Please refer to the picture in this attachment.

Revision history for this message
Elio Martinez (elio1979) wrote :

Thanks, will check this with our infra department

Revision history for this message
Elio Martinez (elio1979) wrote :

Hi Chenjie, according with our configurations, we are not sure if the NIC interface should be renamed as it is happening so far as bridge.

br-phy0: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500
        inet6 fe80::3efd:feff:fed2:b27c prefixlen 64 scopeid 0x20<link>
        ether 3c:fd:fe:d2:b2:7c txqueuelen 1000 (Ethernet)
        RX packets 4745 bytes 304718 (297.5 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 11 bytes 962 (962.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

After setting up IP on those ones NICs on compute-0 and 1, the ping is possible, but the instances living on those computes are not reaching even the corresponding IP, is something else missing?

Revision history for this message
ChenjieXu (midone) wrote :

Hi Elio,

Could you please attach the result? Like what IP has been assigned, the ping result, the interfaces in the instances, the result of pinging from instance to DHCP namespace.

Could you please attach the topology of your StarlingX system?

Revision history for this message
ChenjieXu (midone) wrote :

I can't reproduce this bug. I'm using virtual environment with StarlingX AIO duplex. The image iso is 20190607.

Revision history for this message
Elio Martinez (elio1979) wrote :

The problem is not present on Virtaul Env. Is over Standar 2+2 Bare metal, making connection test is working the ping between . Will test this on latests green ISO

Revision history for this message
Elio Martinez (elio1979) wrote :

According with our results, the problem is not present any more on Standard configuration (2+2), current ISO:
20190705T013000Z

Revision history for this message
ChenjieXu (midone) wrote :

Hi all,

Confirmed by Elio, this bug is not present anymore. So this bug is a physical environment issue and can be closed.

Changed in starlingx:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.