after upgrade, vms nic is detached from br-int

Bug #1649290 reported by Satya Sanjibani Routray on 2016-12-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla
High
Jeffrey Zhang
Mitaka
Undecided
Unassigned
Newton
Undecided
Unassigned

Bug Description

I was running the 2 week old code
tried upgrading the neutron containers to latest trunk code

kolla-ansible upgrade -t neutron

Neutron containers upgraded successfully to latest code

Issue:
I have a VM which is created before upgrade lost the connectivity and is no more rechable

restarted the VM seems VM is not able to get the IP address

from console log of VM
<snip>
WARN: /etc/rc3.d/S10-load-modules failed
Initializing random number generator... done.
Starting acpid: OK
cirros-ds 'local' up at 1.19
no results found for mode=local. up 1.22. searched: nocloud configdrive ec2
Starting network...
udhcpc (v1.20.1) started
Sending discover...
Sending discover...
Sending discover...
Usage: /sbin/cirros-dhcpc <up|down>
No lease, failing
WARN: /etc/rc3.d/S40-network failed
cirros-ds 'net' up at 181.43
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 181.44. request failed
failed 2/20: up 183.52. request failed
failed 3/20: up 185.52. request failed
failed 4/20: up 187.53. request failed
failed 5/20: up 189.53. request failed
failed 6/20: up 191.54. request failed
failed 7/20: up 193.54. request failed
failed 8/20: up 195.55. request failed
failed 9/20: up 197.55. request failed
failed 10/20: up 199.55. request failed
failed 11/20: up 201.56. request failed
failed 12/20: up 203.57. request failed
failed 13/20: up 205.57. request failed
failed 14/20: up 207.58. request failed
failed 15/20: up 209.58. request failed
failed 16/20: up 211.59. request failed
failed 17/20: up 213.59. request failed
failed 18/20: up 215.60. request failed
failed 19/20: up 217.60. request failed
failed 20/20: up 219.61. request failed
failed to read iid from metadata. tried 20
no results found for mode=net. up 221.61. searched: nocloud configdrive ec2
failed to get instance-id of datasource
Starting dropbear sshd: OK
=== system information ===
Platform: OpenStack Foundation OpenStack Nova
Container: none
Arch: x86_64
CPU(s): 1 @ 2693.508 MHz
Cores/Sockets/Threads: 1/1/1
Virt-type:
RAM Size: 2003MB
Disks:
NAME MAJ:MIN SIZE LABEL MOUNTPOINT
vda 253:0 21474836480
vda1 253:1 21459755520 cirros-rootfs /
=== sshd host keys ===
-----BEGIN SSH HOST KEY KEYS-----
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAglDexaY3PtPW5ZZNNw3kwOCS+8LApSRrLdaBVvmoVOkgIDSLRTQyJFBAw47IlhCQXpy1pCFNeudhIpsj7U7x7KUCYrPP3ISsLDOJSiuHR6Fu6nz2yuCeZ0n/BQTGzkic6MbwwMvIsXCGfW0JZpeR1XsxaS9/P/0IMvo16+yPjIW2dzM= root@vm
ssh-dss AAAAB3NzaC1kc3MAAACBALGrRoRn5Txe9Yw34Ffql/dmyazXSCATY3TBszujQGC5KZoZzKMCU1sxxiSAfBxk26PmV7Qr4de5BuPc0aMNqP08oE9koSCX7Uc7EVcaAgYhqoQwOJpC1qSLZcWMQpdvmTeKiq+eFiMHY5OlKbDjLA2EZVKO5U6MLLrC+KkDI2zTAAAAFQCmV3MycV2fkuTt87sUeFa/obVC6QAAAIEAgcSPUArdGTD7uuheCRwHRBpwo/kOJA3jO2TOOTlkQ+OP6/EVBH+cix06PbnG8clPL8U6Kep1AQlGoabXXLuB/X45WSDO2H1KZ5jbM7f2f+h44Liz+TELSeoUI8oNtOXIRtGlh/YmxReYME52yRtdzng07pxVlbGP5wudyFf1iO8AAACAP/phFf4ue5RfImHXi6ztWvWNRkkh4tTbrFgvIY31IMWkRvtf8RT7gNv/2gx5tIccD/zh/QGAEjwNM/SBKojtMFfflHL25+FGHsMeGOxowGn8L9TSI3lqQtAdFZG87r3npI7xtq54F0Xf45XCD/Oj50iTaoRDevPt0kIBeF+rTIc= root@vm
-----END SSH HOST KEY KEYS-----
=== network info ===
if-info: lo,up,127.0.0.1,8,::1
if-info: eth0,up,,8,fe80::f816:3eff:fe26:b7c4
=== datasource: None None ===
=== cirros: current=0.3.4 uptime=221.71 ===
route: fscanf
=== pinging gateway failed, debugging connection ===
############ debug start ##############
### /etc/init.d/sshd start
Starting dropbear sshd: OK
route: fscanf
### ifconfig -a
eth0 Link encap:Ethernet HWaddr FA:16:3E:26:B7:C4
          inet6 addr: fe80::f816:3eff:fe26:b7c4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:648 (648.0 B) TX bytes:1124 (1.0 KiB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

</snip>

But the VMs created after the upgrade is reachable and able to get IP

summary: - after kolla-ansible upgrade of neutron containers OLD vms fails to gett
- IP
+ after kolla-ansible upgrade of neutron containers OLD vms fails to get
+ IPAddress
Changed in kolla:
status: New → Triaged
Changed in kolla:
status: Triaged → Confirmed

I reproduced this. after upgrade neutron. the vms is lost. but new created vm is pingable. physical OS reboot will save all vms.

After some debug, i found that the vms nic is detached from br-int during upgrade. If add the nic to br-int manually again. the vms are back.

still no idea why and how this happens. will continue debug on this.

summary: - after kolla-ansible upgrade of neutron containers OLD vms fails to get
- IPAddress
+ after upgrade, vms nic is detached from br-int
Jeffrey Zhang (jeffrey4l) wrote :

The root cause is: /etc/neutron/conf.db is not located in named volume. after openvswitch_db is removed, the current configuration is lost. when vm's nic "detached" from br-int after a new openvswitch_db start up.

Changed in kolla:
importance: Undecided → High
assignee: nobody → Jeffrey Zhang (jeffrey4l)
milestone: none → ocata-2

Fix proposed to branch: master
Review: https://review.openstack.org/410733

Changed in kolla:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/410733
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=f62b16f8a447b7c90da3844720a200fb4d3ef5d5
Submitter: Jenkins
Branch: master

commit f62b16f8a447b7c90da3844720a200fb4d3ef5d5
Author: Jeffrey Zhang <email address hidden>
Date: Wed Dec 14 20:44:51 2016 +0800

    Move openvswitch db file into docker volume

    openvswitch db file is created in /etc/openvswitch/conf.db. It will be
    lost during upgrade openvswitch_db container.

    This patch moves the db file into /var/lib/openvswitch folder, which
    located in docker volume.

    Change-Id: I73604fddacd21655590b9e66ee2805014795b9f1
    Closes-Bug: #1649290

Changed in kolla:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/411575
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=37b9008683ce14a596d0d29ca5bf00c2ae824f12
Submitter: Jenkins
Branch: stable/newton

commit 37b9008683ce14a596d0d29ca5bf00c2ae824f12
Author: Jeffrey Zhang <email address hidden>
Date: Wed Dec 14 20:44:51 2016 +0800

    Move openvswitch db file into docker volume

    openvswitch db file is created in /etc/openvswitch/conf.db. It will be
    lost during upgrade openvswitch_db container.

    This patch moves the db file into /var/lib/openvswitch folder, which
    located in docker volume.

    Change-Id: I73604fddacd21655590b9e66ee2805014795b9f1
    Closes-Bug: #1649290
    (cherry picked from commit f62b16f8a447b7c90da3844720a200fb4d3ef5d5)

This issue was fixed in the openstack/kolla 3.0.2 release.

Reviewed: https://review.openstack.org/412873
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=8e07a3c44d10333a3a234a8ac9c47a65eb850ec3
Submitter: Jenkins
Branch: stable/mitaka

commit 8e07a3c44d10333a3a234a8ac9c47a65eb850ec3
Author: Jeffrey Zhang <email address hidden>
Date: Wed Dec 14 20:44:51 2016 +0800

    Move openvswitch db file into docker volume

    openvswitch db file is created in /etc/openvswitch/conf.db. It will be
    lost during upgrade openvswitch_db container.

    This patch moves the db file into /var/lib/openvswitch folder, which
    located in docker volume.

    Change-Id: I73604fddacd21655590b9e66ee2805014795b9f1
    Closes-Bug: #1649290
    (cherry picked from commit f62b16f8a447b7c90da3844720a200fb4d3ef5d5)

This issue was fixed in the openstack/kolla 4.0.0.0b3 development milestone.

This issue was fixed in the openstack/kolla 2.0.3 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers