[MOS 9.2] OSTF "Launch instance with file injection" test fails

Bug #1655602 reported by Alexander Gromov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Oleksiy Molchanov

Bug Description

MOS 9.2 snapshot #744

Configuration: 3 controller+mongo nodes, 2 compute+cinder, vlan, dvr, ceilometer, sahara

Steps to reproduce:
Deploy env using fuel-qa and the following template:
https://github.com/Mirantis/mos-ci-deployment-scripts/blob/stable/9.0/templates/stepler_tempest/ironic_cinder.yaml

Expected results:
Deployment is successful and OSTF tests passed.

Actual result:
OSTF test "Launch instance with file injection" failed.

Reproduced on CI:
http://cz7776.bud.mirantis.net:8080/jenkins/view/Stepler/job/9.x_Stepler_Cinder_LVM/89/consoleFull
http://cz7776.bud.mirantis.net:8080/jenkins/view/Stepler/job/9.x_Stepler_Nova_LVM/61/consoleFull

Revision history for this message
Alexander Gromov (agromov) wrote :
Changed in fuel:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
milestone: none → 9.2
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Oleksiy Molchanov (omolchanov)
Revision history for this message
Oleg Bondarev (obondarev) wrote :

The failure is caused by OVS being stuck on node-3 after 07:24 : http://paste.openstack.org/show/594670/

The reason for the failure is:
 - VM is first scheduled to node-3
 - neutron port is created, bound to node-3, ip is 10.109.17.9
 - plug port fails on node-3 due to OVS being stuck
 - instance is rescheduled to node-5, however port 10.109.17.9 is not deleted
 - new port is created, bound to node-5, ip 10.109.17.10
 - VM now has two ports: 10.109.17.9 (DOWN), 10.109.17.10 (ACTIVE)
 - floating ip gets assigned to port 10.109.17.9 which is DOWN, hence no connectivity

 So basically we have 3 issues here:
 - OVS problem to be investigated (similar to: https://bugs.launchpad.net/mos/+bug/1652934 comment #10, https://bugs.launchpad.net/mos/+bug/1606546 comment #16)
 - nova should delete stale port when rescheduling the VM, or reuse this port
 - the test should assign a floating IP to ACTIVE VM port rather then to the first one.

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

<6>Jan 11 07:27:21 node-3 kernel: [ 3480.092183] handler15 D ffff8802b0fd3e90 0 11316 25155 0x00000000
<4>Jan 11 07:27:21 node-3 kernel: [ 3480.092195] ffff8802b0fd3e90 ffff880260056200 ffff8802b2771c00 ffff8802b0fd4000
<4>Jan 11 07:27:21 node-3 kernel: [ 3480.092198] 0000000000000e30 ffffffff82107fb0 0000000000000000 0000000001c6d490
<4>Jan 11 07:27:21 node-3 kernel: [ 3480.092201] ffff8802b0fd3ea8 ffffffff817ff415 ffff8802b0fd3ef0 ffff8802b0fd3f38
<4>Jan 11 07:27:21 node-3 kernel: [ 3480.092204] Call Trace:
<4>Jan 11 07:27:21 node-3 kernel: [ 3480.095012] [<ffffffff817ff415>] schedule+0x35/0x80
<4>Jan 11 07:27:21 node-3 kernel: [ 3480.095981] [<ffffffff81061aa3>] kvm_async_pf_task_wait+0x1a3/0x1f0
<4>Jan 11 07:27:21 node-3 kernel: [ 3480.096165] [<ffffffff810bf4d0>] ? prepare_to_wait_event+0xf0/0xf0
<4>Jan 11 07:27:21 node-3 kernel: [ 3480.096622] [<ffffffff810eba21>] ? posix_ktime_get_ts+0x11/0x20
<4>Jan 11 07:27:21 node-3 kernel: [ 3480.096630] [<ffffffff81061c65>] do_async_page_fault+0x75/0x80
<4>Jan 11 07:27:21 node-3 kernel: [ 3480.096639] [<ffffffff81804e28>] async_page_fault+0x28/0x30
<6>Jan 11 07:29:21 node-3 kernel: [ 3600.104094] handler15 D ffff8802b0fd3e90 0 11316 25155 0x00000000
<4>Jan 11 07:29:21 node-3 kernel: [ 3600.104100] ffff8802b0fd3e90 ffff880260056200 ffff8802b2771c00 ffff8802b0fd4000
<4>Jan 11 07:29:21 node-3 kernel: [ 3600.104104] 0000000000000e30 ffffffff82107fb0 0000000000000000 0000000001c6d490
<4>Jan 11 07:29:21 node-3 kernel: [ 3600.104108] ffff8802b0fd3ea8 ffffffff817ff415 ffff8802b0fd3ef0 ffff8802b0fd3f38

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.