OSTF tests failed by VM getting into error state

Bug #1663658 reported by Alexander Ignatov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Confirmed
High
MOS Nova

Bug Description

This issue is observed in the following BVT community job for 10.0:

https://ci.fuel-infra.org/job/10.0-community.main.ubuntu.bvt_2/1186/console

Trace caused by an issue:

======================================================================
FAIL: Deploy ceph HA with RadosGW for objects
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 120, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/fuelweb_test/tests/test_ceph.py", line 532, in ceph_rados_gw
    test_sets=['ha', 'smoke', 'sanity'])
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/core/helpers/log_helpers.py", line 204, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/fuelweb_test/models/fuel_web_client.py", line 1387, in run_ostf
    failed_test_name=failed_test_name, test_sets=test_sets)
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/core/helpers/log_helpers.py", line 204, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/fuelweb_test/models/fuel_web_client.py", line 306, in assert_ostf_run
    indent=1)))
AssertionError: Failed 1 OSTF tests; should fail 0 tests. Names of failed tests:
  - Launch instance, create snapshot, launch instance from snapshot (failure) Failed to get to expected status. In error state. Please refer to OpenStack logs for more details.

In snapshot you can find the some Error in Nova logs on node-4:

http://paste.openstack.org/show/598415/

Tags: area-nova
Changed in mos:
status: New → Confirmed
tags: added: area-nova
Revision history for this message
Alexander Ignatov (aignatov) wrote :
Revision history for this message
Alexander Ignatov (aignatov) wrote :
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

So the problem here is that `qemu-img info ...` call takes more than 8 seconds of CPU time (note, that it's not wall clock time, it's the time spent by CPU on execution of the process both in user and kernel spaces). The interesting part here is that in successful runs of the CI job the timings are much lower - this call usually takes <= 0.3s (the same is true for upstream CI). As long as we are using the same image (cirros) and the same qemu-img version across jobs runs the only difference is the environment state (RAM/swap usage, disk load, etc). Unfortunately, atop logs for this period of time are not available, so we can't tell for sure, what exactly causes slowness of qemu-img.

I glanced over the CI jobs failures and looks like this is only reproduced in Community builds: as we are using the same packages versions (both qemu-utils and cirros-testvm image), most likely the problem is in the CI node resources, specifically we run out of RAM and start using swap.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Note, that we already increased the timeout from 2s to 8s in https://bugs.launchpad.net/mos/10.0.x/+bug/1643609 (a backport from upstream).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.