[OSTF] Sahara test failed: Cluster state == 'Error'

Bug #1332087 reported by Artem Panchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Medium
Dmitry Mescheryakov
Mirantis OpenStack
Fix Released
High
Dmitry Mescheryakov

Bug Description

api: '1.0'
astute_sha: a7eac46348dc77fc2723c6fcc3dbc66cc1a83152
build_id: 2014-06-18_03-01-14
build_number: '60'
fuellib_sha: d218a2683ec64e79b28fc95d7904888ee3fdfea8
fuelmain_sha: 313a0877c5e15d02e8b1393858912fbef4f0bf3c
mirantis: 'yes'
nailgun_sha: d55caa28311ff3fb9dbbfece5f58cd9eb71da47d
ostf_sha: 8c977e8423ad3aa3b75c50143c8baecb1caadaed
production: docker
release: '5.0'

The issue was reproduced during CI tests:

http://jenkins-product.srt.mirantis.net:8080/view/5.0_swarm/job/5.0_fuelmain.system_test.centos.services_simple/16/testReport/(root)/deploy_savanna_simple/deploy_savanna_simple/

Here you can find the part of OSTF log:

http://paste.openstack.org/show/84498/

Full logs snapshot are attached.

Tags: ostf sahara
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
tags: added: sahara
Changed in mos:
importance: Undecided → High
assignee: nobody → MOS Sahara (mos-sahara)
milestone: none → 5.1
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
Download full text (10.4 KiB)

Further examination by Vadim showed that Sahara failed with the following exception:
2014-06-18T19:40:58.014365+00:00 debug: 2014-06-18 19:40:52.026 20439 DEBUG sahara.utils.ssh_remote [-] [ostf-cluster-338148882-ostf-test-master-001] _execute_command took 302.8 seconds to complete _log_command /usr/lib/python2.6/site-packages/sahara/utils/ssh_remote.py:407
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 ERROR sahara.context [-] Thread 'wait-for-ssh-ostf-cluster-338148882-ostf-test-master-001' fails with exception: '300 seconds'
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context Traceback (most recent call last):
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/context.py", line 124, in _wrapper
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context func(*args, **kwargs)
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/service/engine.py", line 87, in _wait_until_accessible
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context "ls .ssh/authorized_keys", raise_when_error=False)
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/utils/ssh_remote.py", line 367, in execute_command
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context get_stderr, raise_when_error)
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/utils/ssh_remote.py", line 342, in _run_s
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context return self._run_with_log(func, timeout, *args, **kwargs)
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/utils/ssh_remote.py", line 334, in _run_with_log
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context return self._run(func, *args, **kwargs)
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/utils/ssh_remote.py", line 323, in _run
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context return procutils.run_in_subprocess(proc, func, args, kwargs)
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context File "/usr/lib/python2.6/site-packages/sahara/utils/procutils.py", line 49, in run_in_subprocess
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context result = pickle.load(proc.stdout)
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context File "/usr/lib64/python2.6/pickle.py", line 1370, in load
2014-06-18T19:40:58.015759+00:00 debug: 2014-06-18 19:40:52.027 20439 TRACE sahara.context...

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Dmitry, please confirm issue.

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Nastya: confirm, the issue do exists in both community Sahara and in the Fuel one. The issue could be avoided by using more powerful hardware. Which is actually why we didn't experience it before.

Changed in mos:
status: New → Confirmed
Changed in mos:
status: Confirmed → Incomplete
status: Incomplete → New
status: New → Confirmed
Changed in fuel:
status: New → Confirmed
assignee: nobody → Fuel Hardening Team (fuel-hardening)
Ilya Shakhat (shakhat)
Changed in fuel:
assignee: Fuel Hardening Team (fuel-hardening) → MOS Sahara (mos-sahara)
Changed in mos:
assignee: MOS Sahara (mos-sahara) → Dmitry Mescheryakov (dmitrymex)
Changed in fuel:
assignee: MOS Sahara (mos-sahara) → Dmitry Mescheryakov (dmitrymex)
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Reproduced on latest system tests run
Logs are attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
importance: High → Medium
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

The issue is 'fixed' by increasing timeout by 5 times in the following commit: https://gerrit.mirantis.com/#/c/16820/
That should decrease probability of failures in the future due to low resources provided.

Changed in fuel:
status: Confirmed → Fix Committed
Changed in mos:
status: Confirmed → Fix Committed
Revision history for this message
Egor Kotko (ykotko) wrote :

{u'build_id': u'2014-07-09_02-14-20', u'ostf_sha': u'f6f7cee46a85ca3e758f629c2df8b370e9de494a', u'build_number': u'305', u'auth_required': False, u'nailgun_sha': u'2001a30884f2c24d18a62fc9f9c76c6ed66691e3', u'production': u'docker', u'api': u'1.0', u'fuelmain_sha': u'9e441d9035fa852bdb00be1031355f0f89823231', u'astute_sha': u'c0ffd4fa1b1ea16931f174a7f4efeac701ec23e6', u'feature_groups': [u'mirantis'], u'release': u'5.1', u'fuellib_sha': u'fd5fd0d3f74c5a084adfc951a4ee24e3dc27e09c'}

Changed in mos:
status: Fix Committed → Fix Released
Changed in mos:
status: Fix Released → Fix Committed
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Need to merge this fix to master brance and apply for Fuel 5.1

Changed in mos:
status: Fix Committed → Confirmed
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Timur: fix was merged before branch for 5.1 was created, so it is there as well

Changed in mos:
status: Confirmed → Fix Committed
Changed in mos:
status: Fix Committed → Confirmed
status: Confirmed → Fix Committed
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :
Changed in fuel:
status: Fix Committed → Fix Released
Changed in mos:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.