[OVS agent] Physical bridges can't be initialized if there is no connectivity to rabbitmq

Bug #1840443 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Slawek Kaplonski

Bug Description

In some deployments it may be that same external bridge (br-ex for example) is used to provide data plane connectivity connectivity for vms but also connectivity for control plane, e.g. neutron openvswitch agent uses it to connect to rabbitmq.
That may lead to "dead lock" after e.g. host reboot. It happens like that because br-ex is set by neutron agent to be in faile_mode=secure and that means that if there are no openflow rules added for bridge, it will not proceed any packets. And as there is no connection to rabbitmq, neutron-ovs-agent is failing on setup_rpc method (here: https://github.com/openstack/neutron/blob/30a60d04f098581340f83b38b7a79104308c66bc/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L198) so will never configure initial rules for physical bridges which is done here: https://github.com/openstack/neutron/blob/30a60d04f098581340f83b38b7a79104308c66bc/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L212

To fix this problem, we should do initialization of physical bridges before setup rpc.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/676949

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/676949
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d41bd58f31e259fe408c8c059b31299fdfe81127
Submitter: Zuul
Branch: master

commit d41bd58f31e259fe408c8c059b31299fdfe81127
Author: Slawek Kaplonski <email address hidden>
Date: Fri Aug 16 13:44:09 2019 +0000

    Initialize phys bridges before setup_rpc

    Neutron-ovs-agent configures physical bridges that they works
    in fail_mode=secure. This means that only packets which match some
    OpenFlow rule in the bridge can be processed.
    This may cause problem on hosts with only one physical NIC
    where same bridge is used to provide control plane connectivity
    like connection to rabbitmq and data plane connectivity for VM.
    After e.g. host reboot bridge will still be in fail_mode=secure
    but there will be no any OpenFlow rule on it thus there will be
    no communication to rabbitmq.

    With current order of actions in __init__ method of OVSNeutronAgent
    class it first tries to establish connection to rabbitmq and later
    configure physical bridges with some initial OpenFlow rules.
    And in case described above it will fail as there is no connectivity
    to rabbitmq through physical bridge.

    So this patch changes order of actions in __init__ method that it first
    setup physical bridges and than configure rpc connection.

    Change-Id: I41c02b0164537c5b1c766feab8117cc88487bc77
    Closes-Bug: #1840443

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/677054

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/677055

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/677056

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/677054
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3a2842bdd8d8d59e445393c7c7e7a9793357df08
Submitter: Zuul
Branch: stable/stein

commit 3a2842bdd8d8d59e445393c7c7e7a9793357df08
Author: Slawek Kaplonski <email address hidden>
Date: Fri Aug 16 13:44:09 2019 +0000

    Initialize phys bridges before setup_rpc

    Neutron-ovs-agent configures physical bridges that they works
    in fail_mode=secure. This means that only packets which match some
    OpenFlow rule in the bridge can be processed.
    This may cause problem on hosts with only one physical NIC
    where same bridge is used to provide control plane connectivity
    like connection to rabbitmq and data plane connectivity for VM.
    After e.g. host reboot bridge will still be in fail_mode=secure
    but there will be no any OpenFlow rule on it thus there will be
    no communication to rabbitmq.

    With current order of actions in __init__ method of OVSNeutronAgent
    class it first tries to establish connection to rabbitmq and later
    configure physical bridges with some initial OpenFlow rules.
    And in case described above it will fail as there is no connectivity
    to rabbitmq through physical bridge.

    So this patch changes order of actions in __init__ method that it first
    setup physical bridges and than configure rpc connection.

    Change-Id: I41c02b0164537c5b1c766feab8117cc88487bc77
    Closes-Bug: #1840443
    (cherry picked from commit d41bd58f31e259fe408c8c059b31299fdfe81127)

tags: added: in-stable-stein
tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/677055
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f9473566d559bf25c1baac2fcef505b0d2609fa1
Submitter: Zuul
Branch: stable/rocky

commit f9473566d559bf25c1baac2fcef505b0d2609fa1
Author: Slawek Kaplonski <email address hidden>
Date: Fri Aug 16 13:44:09 2019 +0000

    Initialize phys bridges before setup_rpc

    Neutron-ovs-agent configures physical bridges that they works
    in fail_mode=secure. This means that only packets which match some
    OpenFlow rule in the bridge can be processed.
    This may cause problem on hosts with only one physical NIC
    where same bridge is used to provide control plane connectivity
    like connection to rabbitmq and data plane connectivity for VM.
    After e.g. host reboot bridge will still be in fail_mode=secure
    but there will be no any OpenFlow rule on it thus there will be
    no communication to rabbitmq.

    With current order of actions in __init__ method of OVSNeutronAgent
    class it first tries to establish connection to rabbitmq and later
    configure physical bridges with some initial OpenFlow rules.
    And in case described above it will fail as there is no connectivity
    to rabbitmq through physical bridge.

    So this patch changes order of actions in __init__ method that it first
    setup physical bridges and than configure rpc connection.

    Conflicts:
        neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py

    Change-Id: I41c02b0164537c5b1c766feab8117cc88487bc77
    Closes-Bug: #1840443
    (cherry picked from commit d41bd58f31e259fe408c8c059b31299fdfe81127)
    (cherry picked from commit 3a2842bdd8d8d59e445393c7c7e7a9793357df08)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/677056
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6618a917d86add7ee5c1005a21046d1bc57ac95c
Submitter: Zuul
Branch: stable/queens

commit 6618a917d86add7ee5c1005a21046d1bc57ac95c
Author: Slawek Kaplonski <email address hidden>
Date: Fri Aug 16 13:44:09 2019 +0000

    Initialize phys bridges before setup_rpc

    Neutron-ovs-agent configures physical bridges that they works
    in fail_mode=secure. This means that only packets which match some
    OpenFlow rule in the bridge can be processed.
    This may cause problem on hosts with only one physical NIC
    where same bridge is used to provide control plane connectivity
    like connection to rabbitmq and data plane connectivity for VM.
    After e.g. host reboot bridge will still be in fail_mode=secure
    but there will be no any OpenFlow rule on it thus there will be
    no communication to rabbitmq.

    With current order of actions in __init__ method of OVSNeutronAgent
    class it first tries to establish connection to rabbitmq and later
    configure physical bridges with some initial OpenFlow rules.
    And in case described above it will fail as there is no connectivity
    to rabbitmq through physical bridge.

    So this patch changes order of actions in __init__ method that it first
    setup physical bridges and than configure rpc connection.

    Conflicts:
        neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py

    Change-Id: I41c02b0164537c5b1c766feab8117cc88487bc77
    Closes-Bug: #1840443
    (cherry picked from commit d41bd58f31e259fe408c8c059b31299fdfe81127)
    (cherry picked from commit 3a2842bdd8d8d59e445393c7c7e7a9793357df08)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 15.0.0.0b1

This issue was fixed in the openstack/neutron 15.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/687095

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.opendev.org/687095
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fb515a75d6c055a5a4a1de42f48bb7eaf393c5d4
Submitter: Zuul
Branch: stable/pike

commit fb515a75d6c055a5a4a1de42f48bb7eaf393c5d4
Author: Slawek Kaplonski <email address hidden>
Date: Fri Aug 16 13:44:09 2019 +0000

    Initialize phys bridges before setup_rpc

    Neutron-ovs-agent configures physical bridges that they works
    in fail_mode=secure. This means that only packets which match some
    OpenFlow rule in the bridge can be processed.
    This may cause problem on hosts with only one physical NIC
    where same bridge is used to provide control plane connectivity
    like connection to rabbitmq and data plane connectivity for VM.
    After e.g. host reboot bridge will still be in fail_mode=secure
    but there will be no any OpenFlow rule on it thus there will be
    no communication to rabbitmq.

    With current order of actions in __init__ method of OVSNeutronAgent
    class it first tries to establish connection to rabbitmq and later
    configure physical bridges with some initial OpenFlow rules.
    And in case described above it will fail as there is no connectivity
    to rabbitmq through physical bridge.

    So this patch changes order of actions in __init__ method that it first
    setup physical bridges and than configure rpc connection.

    Conflicts:
        neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py

    Change-Id: I41c02b0164537c5b1c766feab8117cc88487bc77
    Closes-Bug: #1840443
    (cherry picked from commit d41bd58f31e259fe408c8c059b31299fdfe81127)
    (cherry picked from commit 3a2842bdd8d8d59e445393c7c7e7a9793357df08)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.3

This issue was fixed in the openstack/neutron 14.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.5

This issue was fixed in the openstack/neutron 13.0.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.1.1

This issue was fixed in the openstack/neutron 12.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/741444

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Rodolfo Alonso Hernandez (<email address hidden>) on branch: master
Review: https://review.opendev.org/741444
Reason: Superseded by https://review.opendev.org/#/c/740724/. Nice to see a better option.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron pike-eol

This issue was fixed in the openstack/neutron pike-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.