[OSTF][QEMU] Sahara test for launching a simple Vanilla2 cluster has "Error" state

Bug #1486681 reported by Artem Hrechanychenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Evgeny Sikachev
7.0.x
Invalid
High
MOS Sahara

Bug Description

OSTF Sahara test for launching a simple Vanilla2 cluster failed with :
"Cluster failed to build and is in "Error" status. Please refer to OpenStack logs for more details." error

Steps to reproduce:

KVM/Ubuntu

1)Create cluster Neutron VLAN, Ubuntu, Ceph for volumes and images , Sahara
2) Add 1 controller, 1 computer , 1 compute+ceph, 2 ceph
3) Verify Network
4)Deploy cluster
5) Verify network
6) Add sahara-kilo-vanila2.6.0.qcow2 image to Glance and registry in Sahara
7) Run OSTF tests

Actual result:
(nose_storage_plugin) fuel_health.tests.tests_platform.test_sahara.VanillaTwoClusterTest.test_vanilla_two_cluster Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/unittest2/case.py", line 340, in run testMethod() File "/usr/lib/python2.6/site-packages/fuel_health/tests/tests_platform/test_sahara.py", line 146, in test_vanilla_two_cluster self.poll_cluster_status, 3, fail_msg, msg, cluster_id) File "/usr/lib/python2.6/site-packages/fuel_health/common/test_mixins.py", line 183, in verify " Please refer to OpenStack logs for more details.") File "/usr/lib/python2.6/site-packages/unittest2/case.py", line 415, in fail raise self.failureException(msg) AssertionError: Step 3 failed: Cluster failed to build and is in "Error" status. Please refer to OpenStack logs for more details.

fuel_health.saharamanager: DEBUG: Currently cluster is in "Error" status. fuel_health.common.test_mixins: DEBUG: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/fuel_health/common/test_mixins.py", line 177, in verify result = func(*args, **kwargs) File "/usr/lib/python2.6/site-packages/fuel_health/saharamanager.py", line 141, in poll_cluster_status self.fail('Cluster failed to build and is in "Error" status.') File "/usr/lib/python2.6/site-packages/unittest2/case.py", line 415, in fail raise self.failureException(msg) AssertionError: Cluster failed to build and is in "Error" status.

{"build_id": "2015-08-14_17-24-26", "build_number": "176", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "2015-08-14_17-24-26", "build_number": "176", "api": "1.0", "fuel-library_sha": "db1c04c9de65c8e107475967d86c7a5791ef3853", "nailgun_sha": "eaaf1233c613c6e08bcf99ee09ebb1bdb7dd4e31", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "57145b1d8804389304cd04322ba0fb3dc9d30327", "production": "docker", "python-fuelclient_sha": "4c74a60aa60c06c136d9197c7d09fa4f8c8e2863", "astute_sha": "e24ca066bf6160bc1e419aaa5d486cad1aaa937d", "fuel-ostf_sha": "17786b86b78e5b66d2b1c15500186648df10c63d", "release": "7.0", "fuelmain_sha": "d8c726645be087bc67e2eeca134f0f9747cfeacd"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "db1c04c9de65c8e107475967d86c7a5791ef3853", "nailgun_sha": "eaaf1233c613c6e08bcf99ee09ebb1bdb7dd4e31", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "57145b1d8804389304cd04322ba0fb3dc9d30327", "production": "docker", "python-fuelclient_sha": "4c74a60aa60c06c136d9197c7d09fa4f8c8e2863", "astute_sha": "e24ca066bf6160bc1e419aaa5d486cad1aaa937d", "fuel-ostf_sha": "17786b86b78e5b66d2b1c15500186648df10c63d", "release": "7.0", "fuelmain_sha": "d8c726645be087bc67e2eeca134f0f9747cfeacd"}

Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :
Changed in fuel:
assignee: nobody → MOS Sahara (mos-sahara)
Revision history for this message
Sergey Reshetnyak (sreshetniak) wrote :

In sahara logs I see problem:

2015-08-19T17:32:01.103911+00:00 err: 17:32:01.099 4412 ERROR sahara.service.ops [-] Error during operating on cluster sahara-cluster-1817035818 (reason: Node sahara-cluster-1817035818-worker-001 has error status

Looks like problem in nova.

Revision history for this message
Sergey Reshetnyak (sreshetniak) wrote :

In nova logs I see problem:

3d640a5cb7cd4a788e9cb690a6280ac4 4ece2c5b2a904721bcec0f3a52dc3586 - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 86, in select_destinations
    filter_properties)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 80, in select_destinations
    raise exception.NoValidHost(reason=reason)

Problem is the lack of resources on cluster.

Changed in fuel:
assignee: MOS Sahara (mos-sahara) → nobody
status: New → Incomplete
Changed in fuel:
status: Incomplete → Confirmed
assignee: nobody → MOS Core Components (mos-core-components)
Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :

I repeated this case with 6GB RAM per slave and 9 slaves in env and this issue are still present

Revision history for this message
Vasyl Saienko (vsaienko) wrote :

I found the following in the logs:

2015-08-19T17:16:28.804201+00:00 info: 2015-08-19 17:16:28.782 25565 INFO nova.filters [req-39df66ff-efe0-4b5b-8eeb-81a8fd556f67 3d640a5cb7cd4a788e9cb690a6280ac4 4ece2c5b2a904721bcec0f3a52dc3586 - - -] Filter RamFilter returned 0 hosts
2015-08-19T17:16:55.166587+00:00 info: 2015-08-19 17:16:55.151 25565 INFO nova.filters [req-b59a14cf-1d43-45a8-9387-2880ddea7315 3d640a5cb7cd4a788e9cb690a6280ac4 4ece2c5b2a904721bcec0f3a52dc3586 - - -] Filter RamFilter returned 0 hosts
2015-08-19T17:17:02.346678+00:00 info: 2015-08-19 17:17:02.344 25565 INFO nova.scheduler.host_manager [req-f1313a29-2b02-478a-a9b8-efb199665113 - - - - -] Successfully synced instances from host 'node-2.test.domain.local'.
2015-08-19T17:17:07.244282+00:00 info: 2015-08-19 17:17:07.236 25565 INFO nova.filters [req-6ec15abc-8723-4dbb-95f0-c954b21a5082 3d640a5cb7cd4a788e9cb690a6280ac4 4ece2c5b2a904721bcec0f3a52dc3586 - - -] Filter RamFilter returned 0 hosts
2015-08-19T17:17:22.193073+00:00 info: 2015-08-19 17:17:22.189 25565 INFO nova.filters [req-08dff2e4-0cf4-4abd-9d41-720e827d1b2a 3d640a5cb7cd4a788e9cb690a6280ac4 4ece2c5b2a904721bcec0f3a52dc3586 - - -] Filter RamFilter returned 0 hosts
2015-08-19T17:17:48.071645+00:00 info: 2015-08-19 17:17:48.063 25565 INFO nova.filters [req-6ef97072-af46-4229-9352-ff645fd67717 3d640a5cb7cd4a788e9cb690a6280ac4 4ece2c5b2a904721bcec0f3a52dc3586 - - -] Filter RamFilter returned 0 hosts
2015-08-19T17:18:28.122503+00:00 info: 2015-08-19 17:18:28.121 25565 INFO nova.filters [req-c69e34fc-fe1a-4f23-80b9-a4148f432b69 3d640a5cb7cd4a788e9cb690a6280ac4 4ece2c5b2a904721bcec0f3a52dc3586 - - -] Filter RamFilter returned 0 hosts

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Maksym Strukov (unbelll) wrote :
Download full text (3.6 KiB)

Reproduced on 7.0-288 (rc2)

Scenario:
1. Create new environment
2. Choose Neutron, VLAN
3. Choose Ceph for images
4. Choose Sahara
5. Add 3 controller+ceph+cinder
6. Add 2 compute+ceph
7. Add 1 cinder
8. Move Storage network to eth1
9. Move Management network to eth2 and untag it
10. Change disk configuration for Cinder node. Change 'Cinder' volume for vdc
11. Verify networks
12. Deploy the environment
13. Verify networks
14. Run OSTF tests

Both computes has 6 cores, 20 gb ram and 150 gb hdd's

Actual:
"Sahara test for launching a simple Vanilla2 cluster" test failed with message: "Cluster failed to build and is in "Error" status. Please refer to OpenStack logs for more details." on step "3. Wait for the cluster to build and get to "Active" status"

Healtcheck log contain:

fuel_health.common.test_mixins: DEBUG: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/fuel_health/common/test_mixins.py", line 177, in verify result = func(*args, **kwargs) File "/usr/lib/python2.6/site-packages/fuel_health/saharamanager.py", line 141, in poll_cluster_status self.fail('Cluster failed to build and is in "Error" status.') File "/usr/lib/python2.6/site-packages/unittest2/case.py", line 415, in fail raise self.failureException(msg) AssertionError: Cluster failed to build and is in "Error" status.

nova-conductor log on controllers contain:

nova.scheduler.utils [req-2df9450c-5dd9-4078-baec-fe39f61d72ce b59d42d8e38a4633a7587a0a3732de25 a01be31a13684c7b85ba75879f5b3439 - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available. Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 86, in select_destinations filter_properties) File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 80, in select_destinations raise exception.NoValidHost(reason=reason) NoValidHost: No valid host was found. There are not enough hosts available.

What are minimal requirements for this test? Why it run when cluster is not valid?
Snapshot: https://drive.google.com/a/mirantis.com/file/d/0B1yfbgZlRKfxWWxtR0Y5RTJlV00/view?usp=sharing

{"build_id": "288", "build_number": "288", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "288", "build_number": "288", "api": "1.0", "fuel-library_sha": "121016a09b0e889994118aa3ea42fa67eabb8f25", "nailgun_sha": "93477f9b42c5a5e0506248659f40bebc9ac23943", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "082a47bf014002e515001be05f99040437281a2d", "production": "docker", "python-fuelclient_sha": "1ce8ecd8beb640f2f62f73435f4e18d1469979ac", "astute_sha": "a717657232721a7fafc67ff5e1c696c9dbeb0b95", "fuel-ostf_sha": "1f08e6e71021179b9881a824d9c999957fcc7045", "release": "7.0", "fuelmain_sha": "6b83d6a6a75bf7bca3177fcf63b2eebbf1ad0a85"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "121016a09b0e889994118aa3ea42fa67eabb8f25", "nailgun_sha": "93477f9b42c5a5e050...

Read more...

Changed in fuel:
status: Invalid → New
assignee: Registry Administrators (registry) → MOS Sahara (mos-sahara)
Maksym Strukov (unbelll)
Changed in fuel:
importance: Undecided → High
Revision history for this message
Sergey Reshetnyak (sreshetniak) wrote :

In nova-scheduler I see the following lines:

2015-09-15T15:51:32.218813+00:00 info: 2015-09-15 15:51:32.217 4591 INFO nova.filters [req-11330288-0caa-4333-b678-d3cfb5057892 b59d42d8e38a4633a7587a0a3732de25 a01be31a13684c7b85ba75879f5b3439 - - -] Filter RamFilter returned 0 hosts
2015-09-15T15:52:13.306556+00:00 info: 2015-09-15 15:52:13.304 4591 INFO nova.filters [req-c91e2d04-e4d5-4668-abfa-4c36004289c0 b59d42d8e38a4633a7587a0a3732de25 a01be31a13684c7b85ba75879f5b3439 - - -] Filter RamFilter returned 0 hosts

Sahara OSTF test uses 2 VM with the following flavor:

RAM: 1024
VCPU: 1
Disk: 20

Changed in fuel:
status: New → Incomplete
Revision history for this message
Sergey Reshetnyak (sreshetniak) wrote :

Please provide more information about resource utilisation

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

The same issue.

Steps to reproduce:

1. Create a cluster (neutron/vlan, cinder/lvm for volumes enabled)
2. Add 3 controllers (1 CPU, 4Gb of RAM)
3. Add 1 compute ( 4 CPU, 16Gb of RAM )
4. Add 1 cinder ( 1 CPU, 4Gb of RAM, 300Gb disk space)
5. Deploy changes
6. Run OSTF

"Sahara test for launching a simple Vanilla2 cluster" failed: "Cluster failed to build and is in "Error" status."

Sahara image was taken from here: http://sahara-files.mirantis.com/mos70/sahara-kilo-vanilla-2.6.0-ubuntu-14.04.qcow2

In the diagnostic snapshot (see the attach):

### ./node-5.test.domain.local/**NO MATCH**.log

2015-09-15T22:07:30.331865+00:00 err: 22:07:30.328 10561 ERROR sahara.service.ops [-] Error during operating on cluster sahara-cluster-1856647386 (reason: An error occurred in thread 'configure-ntp-sahara-cluster-1856647386-worker-001': RemoteCommandException: Error during command execution: "ntpdate -u pool.ntp.org"
Return code: 1
STDERR:
Error resolving pool.ntp.org: Name or service not known (-2)
15 Sep 22:07:28 ntpdate[1829]: Can't find host pool.ntp.org: Name or service not known (-2)
15 Sep 22:07:28 ntpdate[1829]: no servers can be used, exiting

Error ID: dc625c30-25e3-407a-bb78-ccf5393b4295
Error ID: 0edec0b1-7eca-48f1-9000-95bda8506f90)

2015-09-15T22:07:31.016863+00:00 info: 22:07:31.014 10561 INFO sahara.utils.general [-] Cluster status has been changed: id=51ee7873-7c20-40cf-be64-5c9c1d81eb78, New status=Error

Changed in fuel:
status: Incomplete → Confirmed
Changed in mos:
status: New → Triaged
assignee: nobody → Sergey Reshetnyak (sreshetniak)
importance: Undecided → High
Revision history for this message
Sergey Reshetnyak (sreshetniak) wrote :

Dennis,

thanks, I create new bug with ntp problem.

https://bugs.launchpad.net/mos/+bug/1496246

no longer affects: mos
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Moving the bug to 7.0-updates because of High importance (we accept only Critical bugfixes to 7.0 RC in HCF). Confirmed for 8.0.

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/8.0.x
Revision history for this message
Sergey Reshetnyak (sreshetniak) wrote :

Not reproduced

Changed in fuel:
status: Confirmed → Incomplete
Dmitry Pyzhov (dpyzhov)
tags: added: area-mos
Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Mikhail Samoylov (msamoylov) wrote :

The same problem in fuel version:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "429"
  build_id: "429"
  fuel-nailgun_sha: "12b15b2351e250af41cc0b10d63a50c198fe77d8"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "df16d41cd7a9445cf82ad9fd8f0d53824711fcd8"
  fuel-nailgun-agent_sha: "92ebd5ade6fab60897761bfa084aefc320bff246"
  astute_sha: "c7ca63a49216744e0bfdfff5cb527556aad2e2a5"
  fuel-library_sha: "3eaf4f4a9b88b287a10cc19e9ce6a62298cc4013"
  fuel-ostf_sha: "214e794835acc7aa0c1c5de936e93696a90bb57a"
  fuel-mirror_sha: "b62f3cce5321fd570c6589bc2684eab994c3f3f2"
  fuelmenu_sha: "85de57080a18fda18e5325f06eaf654b1b931592"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "e8e36cff332644576d7853c80b8a53d5b955420a"

Sahara error: http://paste.openstack.org/show/484569/
Heat-engine error: http://paste.openstack.org/show/484571/
OSTF error: http://paste.openstack.org/show/484655/
Fuel dump: https://drive.google.com/a/mirantis.com/file/d/0B2SenDuhfXPlaE85WUR0azJwNnc

Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Mikhail Samoylov (msamoylov) wrote :

Also, we need docs about hw reqs for sahara, related bug: https://bugs.launchpad.net/fuel/+bug/1536982

Revision history for this message
Evgeny Sikachev (esikachev) wrote :

Hi! in sahara-all.log i see http://paste.openstack.org/show/484656/ . What image you are use? Actual images on http://sahara-files.mirantis.com/mos80 . And i think, on instance absent internet

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Mikhail Samoylov (msamoylov) wrote :
Revision history for this message
Evgeny Sikachev (esikachev) wrote :

this is image for vanilla 2.4.1, actual version is 2.7.1

Revision history for this message
Evgeny Sikachev (esikachev) wrote :
Revision history for this message
Vitalii Gridnev (vgridnev) wrote :

Could you please re-run OSTF tests using images for 8.0 release? Image that you have used is too old.

Revision history for this message
Mikhail Samoylov (msamoylov) wrote :
Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Evgeny Sikachev (esikachev) wrote :

from sahara-all.log on node-1 I see that the datanodes did not start. I guess the problem is the QEMU

Changed in fuel:
status: Confirmed → Incomplete
tags: added: area-sahara
removed: sahara
Revision history for this message
Mikhail Samoylov (msamoylov) wrote :

For fuel iso 478 the same result with sahara vanilla image 2.7.1
We are using QEMU KVM on local machines.
Controller log and mem and CPU info (2 CPU and 6 Gb mem):
http://paste.openstack.org/show/484978/
Compute settings (2CPU, 5 Gb Mem):
http://paste.openstack.org/show/484980/
Snapshot:
https://drive.google.com/a/mirantis.com/file/d/0B2SenDuhfXPlRlUwQXloMFlRRTA

Test scenario:
1. Create Env
2. Add 3 controller with cinder
3. Add 2 compute
4. Add sahara
5. Verify networks
6. Deploy cluster
7. Add sahara vanilla image 2.7.1
8. Run OSTF test "Sahara test for launching a simple Vanilla2 cluster"

Expected result: test passed
Actual result: test failed on step 3 "Wait for the cluster to build and get to "Active" status"

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "478"
  build_id: "478"
  fuel-nailgun_sha: "ae949905142507f2cb446071783731468f34a572"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "481ed135de2cb5060cac3795428625befdd1d814"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "420c6fa5f8cb51f3322d95113f783967bde9836e"
  fuel-ostf_sha: "ab5fd151fc6c1aa0b35bc2023631b1f4836ecd61"
  fuel-mirror_sha: "b62f3cce5321fd570c6589bc2684eab994c3f3f2"
  fuelmenu_sha: "fac143f4dfa75785758e72afbdc029693e94ff2b"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "6c6b088a3d52dd0eaf43d59f3a3a149c93a07e7e"

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Evgeny Sikachev (esikachev) wrote :

in sahara-engine.log i see time of start tests is 9:12, finish is 9:53. Duration is ~40min. But timeout of OSTF tests is 20min https://github.com/openstack/fuel-ostf/blob/master/fuel_health/tests/tests_platform/test_sahara.py#L132

I still insist that the problem of the lack of resources to start the 2 instances in Nova

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Changed in fuel:
status: Incomplete → Confirmed
assignee: MOS Sahara (mos-sahara) → MOS QA Team (mos-qa)
Revision history for this message
Evgeny Sikachev (esikachev) wrote :

HI! In sahara-engine log i see
reason: Heat stack failed with status Resource CREATE failed: resources.master: InternalServerError: resources[0].resources.sahara-cluster-1974747439-master-2fe
a9cf8: Request Failed: internal server error while processing your request.

I think this is problem of heat-engine.

But on iso 496 all working good

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

feel free to revert 493 and find the RCa, unti this it must stay as confirmed(you have logs and env at least )

Changed in fuel:
status: Incomplete → Confirmed
Changed in fuel:
assignee: MOS QA Team (mos-qa) → Evgeny Sikachev (esikachev)
Revision history for this message
Evgeny Sikachev (esikachev) wrote :

i am revert snapshot and rerun tests. tests for sahara passed. i think this is random fail

Changed in fuel:
status: Confirmed → Incomplete
Changed in fuel:
status: Incomplete → Invalid
summary: - [OSTF] Sahara test for launching a simple Vanilla2 cluster error
+ [OSTF][QEMU] Sahara test for launching a simple Vanilla2 cluster has
+ "Error" state
tags: added: area-ostf
removed: ostf
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.