ovs-plugin-agent died when I delete vm

Bug #1050504 reported by yong sheng gong
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
dan wendlandt

Bug Description

It is not easy to reproduce, but it is because I delete the vm, which leads to a interface deleted. If at this time the agent runs this command:
sudo ovs-vsctl --timeout=2 get Interface tap216a1c49-a6 external_ids
It will die.

Changed in quantum:
importance: Undecided → Critical
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Do you reckon it is a race condition as the interface is removed from the switch - we fetch the iface in the daemon_loop but then in the meanwhile it's deleted.

I don't think a lock on ovs while the iteration is in progress will be feasibible, but perhaps we can just double check for the existence of the interface. We found a similar bug in nova ages ago.

Revision history for this message
Gary Kotton (garyk) wrote :

a new path is ready for devstack where the ovs vif driver is updated. maybe we should try and reproduce with the hybrid driver: https://review.openstack.org/#/c/11650/

Revision history for this message
dan wendlandt (danwent) wrote :

do we have the traceback for this? I agree that we should just be able to make OVS agent more tolerant of these failures.

I don't think switching to hybrid vif driver is likely to affect this bug, as both drivers look the same from the agent's perspective.

tags: added: folsom-rc-potential
Revision history for this message
dan wendlandt (danwent) wrote :

note: i added the folsom-rc-potential tag... let's try to add it to all of those items that may need to be addressed with an RC2

Revision history for this message
yong sheng gong (gongysh) wrote :
Download full text (15.2 KiB)

It is easy to reproduce, I created four vms and then delete some of them, the agent will crash:

Stdout: '{attached-mac="fa:16:3e:0e:02:0a", iface-id="d916a3d8-4140-4742-8fde-45759ae07aac", iface-status=active, vm-uuid="e75ce4d9-ebf6-4b49-bf0f-5ffefd07c479"}\n'
Stderr: ''
DEBUG:quantum.agent.linux.utils:Running command: sudo ovs-vsctl --timeout=2 list-ports br-int
2012-09-17 14:22:30 DEBUG [quantum.agent.linux.utils] Running command: sudo ovs-vsctl --timeout=2 list-ports br-int
DEBUG:quantum.agent.linux.utils:
Command: ['sudo', 'ovs-vsctl', '--timeout=2', 'list-ports', 'br-int']
Exit code: 0
Stdout: 'int-br-eth1-1\nint-br-eth1-2\nqr-9b44fdbe-9c\ntap216a1c49-a6\ntap3d8ab559-59\ntap6bc96b0b-ca\ntap8359faa8-e1\ntapc041f805-a2\ntapd916a3d8-41\n'
Stderr: ''
2012-09-17 14:22:30 DEBUG [quantum.agent.linux.utils]
Command: ['sudo', 'ovs-vsctl', '--timeout=2', 'list-ports', 'br-int']
Exit code: 0
Stdout: 'int-br-eth1-1\nint-br-eth1-2\nqr-9b44fdbe-9c\ntap216a1c49-a6\ntap3d8ab559-59\ntap6bc96b0b-ca\ntap8359faa8-e1\ntapc041f805-a2\ntapd916a3d8-41\n'
Stderr: ''
DEBUG:quantum.agent.linux.utils:Running command: sudo ovs-vsctl --timeout=2 get Interface int-br-eth1-1 external_ids
2012-09-17 14:22:30 DEBUG [quantum.agent.linux.utils] Running command: sudo ovs-vsctl --timeout=2 get Interface int-br-eth1-1 external_ids
DEBUG:quantum.agent.linux.utils:
Command: ['sudo', 'ovs-vsctl', '--timeout=2', 'get', 'Interface', 'int-br-eth1-1', 'external_ids']
Exit code: 0
Stdout: '{}\n'
Stderr: ''
2012-09-17 14:22:30 DEBUG [quantum.agent.linux.utils]
Command: ['sudo', 'ovs-vsctl', '--timeout=2', 'get', 'Interface', 'int-br-eth1-1', 'external_ids']
Exit code: 0
Stdout: '{}\n'
Stderr: ''
2012-09-17 14:22:30 DEBUG [quantum.agent.linux.utils] Running command: sudo ovs-vsctl --timeout=2 get Interface int-br-eth1-2 external_ids
DEBUG:quantum.agent.linux.utils:Running command: sudo ovs-vsctl --timeout=2 get Interface int-br-eth1-2 external_ids
DEBUG:quantum.agent.linux.utils:
Command: ['sudo', 'ovs-vsctl', '--timeout=2', 'get', 'Interface', 'int-br-eth1-2', 'external_ids']
Exit code: 0
Stdout: '{}\n'
Stderr: ''
2012-09-17 14:22:30 DEBUG [quantum.agent.linux.utils]
Command: ['sudo', 'ovs-vsctl', '--timeout=2', 'get', 'Interface', 'int-br-eth1-2', 'external_ids']
Exit code: 0
Stdout: '{}\n'
Stderr: ''
DEBUG:quantum.agent.linux.utils:Running command: sudo ovs-vsctl --timeout=2 get Interface qr-9b44fdbe-9c external_ids
2012-09-17 14:22:30 DEBUG [quantum.agent.linux.utils] Running command: sudo ovs-vsctl --timeout=2 get Interface qr-9b44fdbe-9c external_ids
DEBUG:quantum.agent.linux.utils:
Command: ['sudo', 'ovs-vsctl', '--timeout=2', 'get', 'Interface', 'qr-9b44fdbe-9c', 'external_ids']
Exit code: 0
Stdout: '{attached-mac="fa:16:3e:5a:5f:2a", iface-id="9b44fdbe-9c97-4a2f-b08c-17a7a36b4cfe", iface-status=active}\n'
Stderr: ''
2012-09-17 14:22:30 DEBUG [quantum.agent.linux.utils]
Command: ['sudo', 'ovs-vsctl', '--timeout=2', 'get', 'Interface', 'qr-9b44fdbe-9c', 'external_ids']
Exit code: 0
Stdout: '{attached-mac="fa:16:3e:5a:5f:2a", iface-id="9b44fdbe-9c97-4a2f-b08c-17a7a36b4cfe", iface-status=active}\n'
Stderr: ''
DEBUG:quantum.agent.linux.utils:Running c...

Changed in quantum:
assignee: nobody → yong sheng gong (gongysh)
Revision history for this message
dan wendlandt (danwent) wrote :

ok, a one-liner should be able to fix this. i'll push.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to quantum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/13094

Changed in quantum:
assignee: yong sheng gong (gongysh) → dan wendlandt (danwent)
status: New → In Progress
dan wendlandt (danwent)
Changed in quantum:
status: In Progress → Confirmed
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to quantum (master)

Reviewed: https://review.openstack.org/13094
Committed: http://github.com/openstack/quantum/commit/8ba098a65acafe4ffde1b51f97f7b6a1b45e6d99
Submitter: Jenkins
Branch: master

commit 8ba098a65acafe4ffde1b51f97f7b6a1b45e6d99
Author: Dan Wendlandt <email address hidden>
Date: Sun Sep 16 23:48:19 2012 -0700

    ovs-lib: make db_get_map return empty dict on error

    bug 1050504

    this fixes a crash caused when we try to iterate over the return value
    of db_get_map

    Change-Id: I56640035c3e166ddcc3d23e76be9118604dbeadc

Changed in quantum:
status: In Progress → Fix Committed
tags: removed: folsom-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to quantum (milestone-proposed)

Fix proposed to branch: milestone-proposed
Review: https://review.openstack.org/13263

Thierry Carrez (ttx)
no longer affects: quantum/folsom
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to quantum (milestone-proposed)

Reviewed: https://review.openstack.org/13263
Committed: http://github.com/openstack/quantum/commit/d8f15756be0a0b6292031c0c20f1ec8aa1c20a24
Submitter: Jenkins
Branch: milestone-proposed

commit d8f15756be0a0b6292031c0c20f1ec8aa1c20a24
Author: Dan Wendlandt <email address hidden>
Date: Sun Sep 16 23:48:19 2012 -0700

    ovs-lib: make db_get_map return empty dict on error

    bug 1050504

    this fixes a crash caused when we try to iterate over the return value
    of db_get_map

    Change-Id: I56640035c3e166ddcc3d23e76be9118604dbeadc

Changed in quantum:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in quantum:
milestone: folsom-rc2 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.