detach interface fails as instance info cache is corrupted
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
wangpan | ||
Icehouse |
Fix Released
|
High
|
Matt Riedemann |
Bug Description
Performing attach/detach interface on a VM sometimes results in an interface that can't be detached from the VM.
I could triage it to the corrupted instance cache info due to non-atomic update of that information.
Details on how to reproduce the bug are as follows. Since this is due to a race condition, the test can take quite a bit of time before it hits the bug.
Steps to reproduce:
1) Devstack with trunk with the following local.conf:
disable_service n-net
enable_service q-svc
enable_service q-agt
enable_service q-dhcp
enable_service q-l3
enable_service q-meta
enable_service q-metering
RECLONE=yes
# and other options as set in the trunk's local
2) Create few networks:
$> neutron net-create testnet1
$> neutron net-create testnet2
$> neutron net-create testnet3
$> neutron subnet-create testnet1 192.168.1.0/24
$> neutron subnet-create testnet2 192.168.2.0/24
$> neutron subnet-create testnet3 192.168.3.0/24
2) Create a testvm in testnet1:
$> nova boot --flavor m1.tiny --image cirros-
3) Run the following shell script to attach and detach interfaces for this vm in the remaining two networks in a loop until we run into the issue at hand:
--------
#! /bin/bash
c=10000
netid1=`neutron net-list | grep testnet2 | cut -f 2 -d ' '`
netid2=`neutron net-list | grep testnet3 | cut -f 2 -d ' '`
while [ $c -gt 0 ]
do
echo "Round: " $c
echo -n "Attaching two interfaces... "
nova interface-attach --net-id $netid1 testvm
nova interface-attach --net-id $netid2 testvm
echo "Done"
echo "Sleeping until both those show up in interfaces"
waittime=0
while [ $waittime -lt 60 ]
do
count=`nova interface-list testvm | wc -l`
if [ $count -eq 7 ]
then
break
fi
sleep 2
(( waittime+=2 ))
done
echo "Waited for " $waittime " seconds"
echo "Detaching both... "
nova interface-list testvm | grep $netid1 | awk '{print "deleting ",$4; system("nova interface-detach testvm "$4 " ; sleep 2");}'
nova interface-list testvm | grep $netid2 | awk '{print "deleting ",$4; system("nova interface-detach testvm "$4 " ; sleep 2");}'
echo "Done; check interfaces are gone in a minute."
waittime=0
while [ $waittime -lt 60 ]
do
count=`nova interface-list testvm | wc -l`
echo "line count: " $count
if [ $count -eq 5 ]
then
break
fi
sleep 2
(( waittime+=2 ))
done
if [ $waittime -ge 60 ]
then
echo "bad case"
exit 1
fi
echo "Interfaces are gone"
(( c-- ))
done
---------
Eventually the test will stop with a failure ("bad case") and the interface remaining either from testnet2 or testnet3 can not be detached at all.
Changed in nova: | |
status: | New → In Progress |
assignee: | nobody → Praveen Yalagandula (ypraveen-5) |
tags: | added: icehouse-backport-potential |
Changed in nova: | |
assignee: | Praveen Yalagandula (ypraveen-5) → Sean Dague (sdague) |
Changed in nova: | |
assignee: | Sean Dague (sdague) → Dan Smith (danms) |
Changed in nova: | |
milestone: | none → juno-rc1 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | juno-rc1 → 2014.2 |
Changed in nova: | |
assignee: | nobody → wangpan (hzwangpan) |
I could triage it to non-atomic instance info cache update. It seems heal_instance_ info_cache periodic
task runs in parallel with these API calls and due to race condition between these two, the info
cache is getting corrupted.
I put in a debug statement to print stack track for each cache update and in the following failed case you could see that the "_heal_ instance_ info_cache" periodic task overwrote the info-cache incorrectly and hence corrupted it.
------ ubuntu- 1204-server: ~/devstack$ grep -A 1 "Updating cache" /opt/stack/ logs/screen/ screen- n-cpu.log | grep -A 4 1aea47a5- 8677-4a68- aec6-74c8573dde 52 base_api [req-8a4531a0- 1b8c-4c2d- 815d-0e126515d4 46 demo demo] Updating cache with info: [VIF({' ovs_interfaceid ': u'9a697bd9- d2fb-4924- 8298-b76c8d1c7c 4f', 'network': Network({'bridge': 'br-int', 'subnets': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': 'fixed', 'floating_ips': [], 'address': u'192.168.1.2'})], 'version': 4, 'meta': {'dhcp_server': u'192.168.1.3'}, 'dns': [], 'routes': [], 'cidr': u'192.168.1.0/24', 'gateway': IP({'meta': {}, 'version': 4, 'type': 'gateway', 'address': u'192.168. 1.1'})} )], 'meta': {'injected': False, 'tenant_id': u'adeb5c20087c4 2ccb7f4561a7d9c ba6e'}, 'id': u'a8159f6d- dc4a-4eab- a6e2-b9d3a626f4 f2', 'label': u'testnet1'}), 'devname': u'tap9a697bd9-d2', 'qbh_params': None, 'meta': {}, 'details': {u'port_filter': True, u'ovs_hybrid_plug': True}, 'address': u'fa:16: 3e:d1:50: 3f', 'active': True, 'type': u'ovs', 'id': u'9a697bd9- d2fb-4924- 8298-b76c8d1c7c 4f', 'qbg_params': None}), VIF({'ovs_ interfaceid' : u'1aea47a5- 8677-4a68- aec6-74c8573dde 52', 'network': Network({'bridge': 'br-int', 'subnets': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': 'fixed', 'floating_ips': [], 'address': u'192.168. 2.143'} )], 'version': 4, 'meta': {'dhcp_server': u'192.168.2.3'}, 'dns': [], 'routes': [], 'cidr': u'192.168.2.0/24', 'gateway': IP({'meta': {}, 'version': 4, 'type': 'gateway', 'address': u'192.168. 2.1'})} )], 'meta': {'injected': False, 'tenant_id': u'adeb5c20087c4 2ccb7f4561a7d9c ba6e'}, 'id': u'1f411a5a- 4664-4383- a44a-9d83bef7c1 ca', 'label': u'testnet2'}), 'devname': u'tap1aea47a5-86', 'qbh_params': None, 'meta': {}, 'details': {u'port_filter': True, u'ovs_hybrid_plug': True}, 'address': u'fa:16: 3e:12:24: fc', 'active': False, 'type': u'ovs', 'id': u'1aea47a5- 8677-4a68- aec6-74c8573dde 52', 'qbg_params': None})] from (pid=5206) update_ instance_ cache_with_ nw_info /opt/stack/ nova/nova/ network/ base_api. py:38 base_api [req-8a4531a0- 1b8c-4c2d- 815d-0e126515d4 46 demo demo] traceback: [('/usr/ local/lib/ python2. 7/dist- packages/ eventlet/ greenthread. py', 194, 'main', 'result = function(*args, **kwargs)'), ('/usr/ local/lib/ python2. 7/dist- packages/ oslo/messaging/ rpc/dispatcher. py', 129, '<lambda>', 'yield lambda: self._dispatch_ and_reply( incoming) '), ('/usr/ local/lib/ python2. 7/dist- packages/ oslo/messaging/ rpc/dispatcher. py', 134, '_dispatch_ and_reply' , 'incoming. message) )'), ('/usr/ local/lib/ python2. 7/dist- packages/ oslo/messaging/ rpc/dispatcher. py', 177, '_dispatch', 'return self._do_ dispatch( endpoint, method, ctxt, args)'), ('/u...
aviuser@
2014-06-03 17:13:27.870 DEBUG nova.network.
2014-06-03 17:13:27.871 DEBUG nova.network.