Instances fail to hard reboot when using OpenDaylight

Bug #1755890 reported by Mohammed Naser on 2018-03-14
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Mohammed Naser

Bug Description

When using OpenDaylight with Open vSwitch, the Neutron Open vSwitch agent does not exist in the environment anymore.

When an instance is started up for the first time, OpenDaylight will successfully bind the port and send the vif plugged notification. However, since the introduction of the following patch:

https://review.openstack.org/#/q/Ib08afad3822f2ca95cfeea18d7f4fc4cb407b4d6

It now expects the vif plugged event to happen on hard reboots, which for certain environments (such as using ODL with OVS, it will not come in). This results in all instance starts after the first one failing.

Discussion:
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-14.log.html#t2018-03-14T18:12:48

ODL issue:
https://jira.opendaylight.org/projects/NETVIRT/issues/NETVIRT-512

Matt Riedemann (mriedem) wrote :
tags: added: libvirt neutron opendaylight

Fix proposed to branch: master
Review: https://review.openstack.org/553035

Changed in nova:
assignee: nobody → Mohammed Naser (mnaser)
status: New → In Progress

Change abandoned by Mohammed Naser (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/553037
Reason: Abandoning to be replaced by https://review.openstack.org/#/c/553817/

Change abandoned by Mohammed Naser (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/553038
Reason: abandoning to be replaced by https://review.openstack.org/#/c/553818/

Reviewed: https://review.openstack.org/553035
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ed3e27afb4ca7b36baab3b8008a727caa43e1e3b
Submitter: Zuul
Branch: master

commit ed3e27afb4ca7b36baab3b8008a727caa43e1e3b
Author: Mohammed Naser <email address hidden>
Date: Wed Mar 14 20:03:47 2018 +0000

    Revert "Refine waiting for vif plug events during _hard_reboot"

    This reverts commit aaf37a26d6caa124f0cc6c3fe21bfdf58ccb8517.

    This gets us back to Ib0cf5d55750f13d0499a570f14024dca551ed4d4
    which was meant to address an issue introduced
    by Id188d48609f3d22d14e16c7f6114291d547a8986.

    So we essentially had three changes:

    1. Hard reboot would blow away volumes and vifs and then wait for the
       vifs to be plugged; this caused a problem for some vif types (
       linuxbridge was reported) because the event never came and we
       timed out.

    2. To workaround that, a second change was made to simply not wait for
       vif plugging events.

    3. Since #2 was a bit heavy-handed for a problem that didn't impact
       openvswitch, another change was made to only wait for non-bridge vif
       types, so we'd wait for OVS.

    But it turns out that opendaylight is an OVS vif type and doesn't send
    events for plugging the vif, only for binding the port (and we don't
    re-bind the port during reboot). There is also a report of this being a
    problem for other types of ports, see
    If209f77cff2de00f694b01b2507c633ec3882c82.

    So rather than try to special-case every possible vif type that could
    be impacted by this, we are simply reverting the change so we no longer
    wait for vif plugged events during hard reboot.

    Note that if we went back to Id188d48609f3d22d14e16c7f6114291d547a8986
    and tweaked that to not unplug/plug the vifs we wouldn't have this
    problem either, and that change was really meant to deal with an
    encrypted volume issue on reboot. But changing that logic is out of the
    scope of this change. Alternatively, we could re-bind the port during
    reboot but that could have other implications, or neutron could put
    something into the port details telling us which vifs will send events
    and which won't, but again that's all outside of the scope of this
    patch.

    Change-Id: Ib3f10706a7191c58909ec1938042ce338df4d499
    Closes-Bug: #1755890

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/553817
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2e03eae67d7977be2318dfb89be9890fa4ceb440
Submitter: Zuul
Branch: stable/queens

commit 2e03eae67d7977be2318dfb89be9890fa4ceb440
Author: Mohammed Naser <email address hidden>
Date: Wed Mar 14 20:04:29 2018 +0000

    Revert "Refine waiting for vif plug events during _hard_reboot"

    This reverts commit 5a10047f9dd4383f6ec9046d392f26c93f8420d4.

    This gets us back to Ib0cf5d55750f13d0499a570f14024dca551ed4d4
    which was meant to address an issue introduced
    by Id188d48609f3d22d14e16c7f6114291d547a8986.

    So we essentially had three changes:

    1. Hard reboot would blow away volumes and vifs and then wait for the
       vifs to be plugged; this caused a problem for some vif types (
       linuxbridge was reported) because the event never came and we
       timed out.

    2. To workaround that, a second change was made to simply not wait for
       vif plugging events.

    3. Since #2 was a bit heavy-handed for a problem that didn't impact
       openvswitch, another change was made to only wait for non-bridge vif
       types, so we'd wait for OVS.

    But it turns out that opendaylight is an OVS vif type and doesn't send
    events for plugging the vif, only for binding the port (and we don't
    re-bind the port during reboot). There is also a report of this being a
    problem for other types of ports, see
    If209f77cff2de00f694b01b2507c633ec3882c82.

    So rather than try to special-case every possible vif type that could
    be impacted by this, we are simply reverting the change so we no longer
    wait for vif plugged events during hard reboot.

    Note that if we went back to Id188d48609f3d22d14e16c7f6114291d547a8986
    and tweaked that to not unplug/plug the vifs we wouldn't have this
    problem either, and that change was really meant to deal with an
    encrypted volume issue on reboot. But changing that logic is out of the
    scope of this change. Alternatively, we could re-bind the port during
    reboot but that could have other implications, or neutron could put
    something into the port details telling us which vifs will send events
    and which won't, but again that's all outside of the scope of this
    patch.

    Change-Id: Ib3f10706a7191c58909ec1938042ce338df4d499
    Closes-Bug: #1755890

tags: added: in-stable-queens

Reviewed: https://review.openstack.org/553818
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7b5cdd7ac006a6d5f575a93aae76e3dc35bfabbe
Submitter: Zuul
Branch: stable/pike

commit 7b5cdd7ac006a6d5f575a93aae76e3dc35bfabbe
Author: Mohammed Naser <email address hidden>
Date: Wed Mar 14 20:04:56 2018 +0000

    Revert "Refine waiting for vif plug events during _hard_reboot"

    This reverts commit e06ad602f3ef98c505e49078dd225dde2a79e2a1.

    This gets us back to Ib0cf5d55750f13d0499a570f14024dca551ed4d4
    which was meant to address an issue introduced
    by Id188d48609f3d22d14e16c7f6114291d547a8986.

    So we essentially had three changes:

    1. Hard reboot would blow away volumes and vifs and then wait for the
       vifs to be plugged; this caused a problem for some vif types (
       linuxbridge was reported) because the event never came and we
       timed out.

    2. To workaround that, a second change was made to simply not wait for
       vif plugging events.

    3. Since #2 was a bit heavy-handed for a problem that didn't impact
       openvswitch, another change was made to only wait for non-bridge vif
       types, so we'd wait for OVS.

    But it turns out that opendaylight is an OVS vif type and doesn't send
    events for plugging the vif, only for binding the port (and we don't
    re-bind the port during reboot). There is also a report of this being a
    problem for other types of ports, see
    If209f77cff2de00f694b01b2507c633ec3882c82.

    So rather than try to special-case every possible vif type that could
    be impacted by this, we are simply reverting the change so we no longer
    wait for vif plugged events during hard reboot.

    Note that if we went back to Id188d48609f3d22d14e16c7f6114291d547a8986
    and tweaked that to not unplug/plug the vifs we wouldn't have this
    problem either, and that change was really meant to deal with an
    encrypted volume issue on reboot. But changing that logic is out of the
    scope of this change. Alternatively, we could re-bind the port during
    reboot but that could have other implications, or neutron could put
    something into the port details telling us which vifs will send events
    and which won't, but again that's all outside of the scope of this
    patch.

    Change-Id: Ib3f10706a7191c58909ec1938042ce338df4d499
    Closes-Bug: #1755890

tags: added: in-stable-pike

This issue was fixed in the openstack/nova 17.0.2 release.

This issue was fixed in the openstack/nova 16.1.1 release.

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers