veth pair connecting between physical and integration bridge down after ovs agent restart

Bug #1218556 reported by Ralf Haferkamp
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Ralf Haferkamp

Bug Description

Sometimes after restarting the openvswitch-agent the veth pair that connects the physical bridge with the integration bridge doesn't come up correctly. (Which of cause disconnects any running VM instance from the network)

# /etc/init.d/openstack-neutron-openvswitch-agent restart
# ip addr show
[..]
83: phy-br-eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether 3a:6c:d6:a4:1c:89 brd ff:ff:ff:ff:ff:ff
84: int-br-eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether a2:12:2a:e5:b8:e4 brd ff:ff:ff:ff:ff:ff
[..]

I was able to reproduce this problem on openSUSE 12.3 and SLES 11. Ubuntu seems to be unaffected by this.

Doing a manual "ip link set up dev <device>" on both ends of the veth pair fixes the problem. (until another restarted might bring it back)

I think I was able to track this down to a race condition between udev (and its network rules) and the ip commands that the openvswitch-agent during startup. Among other things the agent does this during startup:

ip link delete int-br-fixed
ip link add int-br-fixed type veth peer name phy-br-fixed
ip link set int-br-fixed up
ip link set phy-br-fixed up

The ip link delete and ip link add command cause several udev events to be fired. However on my system the processing of the udev rules takes so long that the "remove" events are not completely processed before the ip link add command is started. Which causes the interface to be down after the above commands completed.

A possible fix for this is to call "udevadm settle" after the ip link delete call.

I will upload a draft patch for review shortly.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/44345

Changed in neutron:
assignee: nobody → Ralf Haferkamp (rhafer)
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium
ZhiQiang Fan (aji-zqfan)
tags: added: ovs
tags: added: havana-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/44345
Committed: http://github.com/openstack/neutron/commit/8d88ee7411d43f148b45d0a145fe32a75765a3ac
Submitter: Jenkins
Branch: master

commit 8d88ee7411d43f148b45d0a145fe32a75765a3ac
Author: Ralf Haferkamp <email address hidden>
Date: Thu Aug 29 20:50:55 2013 +0200

    Avoid race with udev during ovs agent startup

    After taking down the veth link between the physical bridge and the integration
    bridge call udevadm settle to wait for any udev events to be completely
    processed by the operating system before recreating the veth pair.

    Some distributions (e.g. openSUSE) have udev rules installed by default that
    call e.g. ifdown <interface> during the remove event. If that is processed
    after the ovs agent already brought up the veth pair again the veth pair's
    link will be down after the agent completed startup and networking will be
    broken for all VM instances.

    Change-Id: I95520ea96a9804c5261a0c994bbca137535cc37c
    Closes-Bug: #1218556

Changed in neutron:
status: In Progress → Fix Committed
Changed in neutron:
milestone: none → havana-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (milestone-proposed)

Fix proposed to branch: milestone-proposed
Review: https://review.openstack.org/50388

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (milestone-proposed)

Reviewed: https://review.openstack.org/50388
Committed: http://github.com/openstack/neutron/commit/99440a63af5a2c4c2e139036c42db5c64e9495b2
Submitter: Jenkins
Branch: milestone-proposed

commit 99440a63af5a2c4c2e139036c42db5c64e9495b2
Author: Ralf Haferkamp <email address hidden>
Date: Thu Aug 29 20:50:55 2013 +0200

    Avoid race with udev during ovs agent startup

    After taking down the veth link between the physical bridge and the integration
    bridge call udevadm settle to wait for any udev events to be completely
    processed by the operating system before recreating the veth pair.

    Some distributions (e.g. openSUSE) have udev rules installed by default that
    call e.g. ifdown <interface> during the remove event. If that is processed
    after the ovs agent already brought up the veth pair again the veth pair's
    link will be down after the agent completed startup and networking will be
    broken for all VM instances.

    Change-Id: I95520ea96a9804c5261a0c994bbca137535cc37c
    Closes-Bug: #1218556
    (cherry picked from commit 8d88ee7411d43f148b45d0a145fe32a75765a3ac)

Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: havana-rc2 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.