Deployment fails with ProgrammingError in cinder-scheduler when ceph OSD is used

Bug #1561960 reported by Alexander Gromov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Kyrylo Galanov

Bug Description

Reproduced on CI: https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ceph_ha_one_controller/57/console (here Fuel UI doesn't work so that the environment with errors can't be chosen)
Steps to reproduce:
1. Create cluster
2. Add 1 node with controller and ceph OSD roles
3. Add 2 nodes with compute and ceph OSD roles
4. Deploy the cluster

Actual results:
Deployment fails and the following error is displayed in console:
2016-03-24 22:56:22,815 - ERROR __init__.py:66 -- assert_task_success raised: AssertionError('Task \'deploy\' has incorrect status. error != ready, \'Deployment has failed. Method task_deploy. Validation of node:\n{"uid"=>"1",\n "status"=>"error",\n "error_type"=>"deploy",\n "error_msg"=>\n "Critical nodes failed: Node[1]. Stopping the deployment process!",\n "task"=>"ceph_ready_check",\n "task_status"=>"failed"}\n for report failed: Task name is not provided.\nInspect Astute logs for the details\'',)
Traceback: Traceback (most recent call last):
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/__init__.py", line 59, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/models/fuel_web_client.py", line 317, in assert_task_success
    task["name"], task['status'], 'ready', _message(task)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/asserts.py", line 55, in assert_equal
    raise ASSERTION_ERROR(message)
AssertionError: Task 'deploy' has incorrect status. error != ready, 'Deployment has failed. Method task_deploy. Validation of node:
{"uid"=>"1",
 "status"=>"error",
 "error_type"=>"deploy",
 "error_msg"=>
  "Critical nodes failed: Node[1]. Stopping the deployment process!",
 "task"=>"ceph_ready_check",
 "task_status"=>"failed"}
 for report failed: Task name is not provided.
Inspect Astute logs for the details'

The following cinder-scheduler CRITICAL log can be found:
2016-03-24 22:08:34.061 13710 CRITICAL cinder [req-0d95d8c0-bc38-4a61-ae41-fb35c7e88323 - - - - -] ProgrammingError: (_mysql_exceptions.ProgrammingError) (1146, "Table 'cinder.services' doesn't exist") [SQL: u'SELECT services.created_at AS services_created_at, services.updated_at AS services_updated_at, services.deleted_at AS services_deleted_at, services.deleted AS services_deleted, services.id AS services_id, services.host AS services_host, services.`binary` AS services_binary, services.topic AS services_topic, services.report_count AS services_report_count, services.disabled AS services_disabled, services.availability_zone AS services_availability_zone, services.disabled_reason AS services_disabled_reason, services.modified_at AS services_modified_at, services.rpc_current_version AS services_rpc_current_version, services.object_current_version AS services_object_current_version, services.replication_status AS services_replication_status, services.active_backend_id AS services_active_backend_id, services.frozen AS services_frozen \nFROM services \nWHERE services.deleted = false AND services.`binary` = %s'] [parameters: ('cinder-scheduler',)]

Also the following error can be found in cinder-voulme.log:
/var/log/cinder/cinder-volume.log:2016-03-24 22:10:55.134 33213 ERROR oslo_service.service VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Error connecting to ceph cluster.

Revision history for this message
Alexander Gromov (agromov) wrote :
tags: added: ceph
description: updated
description: updated
description: updated
Dmitry Klenov (dklenov)
tags: added: area-library
Changed in fuel:
milestone: none → 9.0
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This is a failure because database for cinder has an issue. It's probably because this task is run in parallel with the non-primary database tasks. We saw this with keystone and it's similar.

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Dmitry Bilunov (dbilunov)
Revision history for this message
Alexander Gromov (agromov) wrote :

Looks like the same problem was also reproduced in 'ceph_rados_gw' test on CI: https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ha_neutron_tun/60/console

Revision history for this message
Alexander Gromov (agromov) wrote :
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

Also one test case fail with the same issue:
https://patching-ci.infra.mirantis.net/job/7.0.system_test.ubuntu.ha_neutron_tun_scale/25/consoleFull

https://mirantis.testrail.com/index.php?/tests/view/3805137

2016-03-31 16:12:48.681 28268 TRACE cinder.service raise original_exception
2016-03-31 16:12:48.681 28268 TRACE cinder.service DBConnectionError: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None
2016-03-31 16:12:48.681 28268 TRACE cinder.service
.....
2016-03-31 16:05:05.658 28268 TRACE cinder.service raise original_exception
2016-03-31 16:05:05.658 28268 TRACE cinder.service DBConnectionError: (OperationalError) (2003, "Can't connect to MySQL server on '10.109.17.3' (111)") None None
2016-03-31 16:05:05.658 28268 TRACE cinder.service

Changed in fuel:
assignee: Dmitry Bilunov (dbilunov) → Kyrylo Galanov (kgalanov)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.