ServerPersonalityTestJSON:test_rebuild_server_with_personality fails intermittent (linuxbridge)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Critical
|
Kevin Benton |
Bug Description
24 Hits in the last 30 days - all of them after the 31st of December! [1]
2016-01-07 11:08:30.061 | Captured traceback:
2016-01-07 11:08:30.061 | ~~~~~~~~~~~~~~~~~~~
2016-01-07 11:08:30.061 | Traceback (most recent call last):
2016-01-07 11:08:30.062 | File "tempest/
2016-01-07 11:08:30.062 | waiters.
2016-01-07 11:08:30.062 | File "tempest/
2016-01-07 11:08:30.062 | raise exceptions.
2016-01-07 11:08:30.062 | tempest.
2016-01-07 11:08:30.062 | Details: (ServerPersonal
Example Log: http://
no longer affects: | nova |
Changed in neutron: | |
milestone: | none → mitaka-rc1 |
I'm not 100% sure what causes this issue. But I think it's like this.
- Instance gets started, rebuilt triggert manager [req-d99131b4- 6ad7-4d60- 8d1f-5c44075c71 9c tempest- ServerPersonali tyTestJSON- 1787158816 tempest- ServerPersonali tyTestJSON- 675574703] [instance: 5cc0ffad- aa65-4780- 8cdc-a7c5040824 fe] Preparing to wait for external event network- vif-plugged- 7094cb28- f242-4e85- 85a2-7da2a147e4 f0 prepare_ for_instance_ event /opt/stack/ new/nova/ nova/compute/ manager. py:514
- Start destroy
- Agent loop detects that related tap device exists
- Instance destroyed, tap device, too
- Instances start triggered, nova starts listening on network-vif-plugged event
2016-01-11 04:00:46.887 DEBUG nova.compute.
- Instance created and tap device created manager [req-a7841493- f6c3-485d- a9e2-720d5386dc f2 tempest- ServerPersonali tyTestJSON- 1787158816 tempest- ServerPersonali tyTestJSON- 675574703] [instance: 5cc0ffad- aa65-4780- 8cdc-a7c5040824 fe] Events pending at deletion: network- vif-plugged- 7094cb28- f242-4e85- 85a2-7da2a147e4 f0 _delete_instance /opt/stack/ new/nova/ nova/compute/ manager. py:2391
- Agent loop detects that device is still there and assumes it active - it did not realize, that it dissappeared and appeard again
---> so the device up message is missing --> the port will never become Active!
- Instance stays in paused state until it gets deleted
2016-01-11 04:03:37.974 DEBUG nova.compute.
long time after the instance got deleted, the network-vif-timeout occurs
So probably the test case did trigger the delete...
I can not verify this assumption from the logs to 100%, as I can not figure out the exact point of time when the tap device got created. But at least the agent did never detect it as down. It's like this:
agent-loop scan: [6] 2016-01-11 04:00:41.727 (device in current + added due to resync)
libvirt: [2] 2016-01-11 04:00:45.875+0000: 16554: VM destroyed
nova log: [5] 2016-01-11 04:00:45.928 - vm destroyd
agent-loop scan: [7] 2016-01-11 04:00:47.560 Loop iteration exceeded interval (2 vs. 5.83269119263)! daemon_loop /opt/stack/
agent-loop scan: [7] 2016-01-11 04:00:47.560 (device in current)
libvirt: [1] 2016-01-11 04:00:47.731+0000: 16552: instance stareted (or to be started?)
nova log [4] 2016-01-11 04:00:47.957 instance started
agent-loop scan: [8] 2016-01-11 04:00:51.560 (device in current)
nova log [9] deletion triggered although still waiting for event
nova log [3] vif timeout
--Device should have been detected as deleted by the agent somewhere in between!
[1] http:// logs.openstack. org/99/ 233499/ 16/check/ gate-tempest- dsvm-neutron- linuxbridge/ d2c2c00/ logs/libvirt/ libvirtd. txt.gz# _2016-01- 11_04_00_ 47_731 logs.openstack. org/99/ 233499/ 16/check/ gate-tempest- dsvm-neutron- linuxbridge/ d2c2c00/ logs/libvirt/ libvirtd. txt.gz# _2016-01- 11_04_00_ 45_875 logs.openstack. org/99/ 233499/ 16/check/ gate-tempest- dsvm-neutron- linuxbridge/ d2c2c00/ logs/screen- n-cpu.txt. gz#_2016- 01-11_04_ 05_47_959 logs.openstack. org/99/ 233499/ 16/check/ gate-tempest- dsvm-neutron- linuxbridge/ d2c2c00/ logs/screen- n-cpu.txt. gz#_2016- 01-11_04_ 0...
[2] http://
[3] http://
[4] http://