Fuel for OpenStack

Deployment fails with ProgrammingError in cinder-scheduler when ceph OSD is used

Bug #1561960 reported by Alexander Gromov on 2016-03-25

This bug report is a duplicate of: Bug #1548271: Access denied for user 'root'@'localhost'/Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock errors during cluster deployment. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Confirmed	High	Kyrylo Galanov	Fuel for OpenStack 9.0

Bug Description

Reproduced on CI: https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ceph_ha_one_controller/57/console (here Fuel UI doesn't work so that the environment with errors can't be chosen)
Steps to reproduce:
1. Create cluster
2. Add 1 node with controller and ceph OSD roles
3. Add 2 nodes with compute and ceph OSD roles
4. Deploy the cluster

Actual results:
Deployment fails and the following error is displayed in console:
2016-03-24 22:56:22,815 - ERROR __init__.py:66 -- assert_task_success raised: AssertionError('Task \'deploy\' has incorrect status. error != ready, \'Deployment has failed. Method task_deploy. Validation of node:\n{"uid"=>"1",\n "status"=>"error",\n "error_type"=>"deploy",\n "error_msg"=>\n "Critical nodes failed: Node[1]. Stopping the deployment process!",\n "task"=>"ceph_ready_check",\n "task_status"=>"failed"}\n for report failed: Task name is not provided.\nInspect Astute logs for the details\'',)
Traceback: Traceback (most recent call last):
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/__init__.py", line 59, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/models/fuel_web_client.py", line 317, in assert_task_success
    task["name"], task['status'], 'ready', _message(task)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/asserts.py", line 55, in assert_equal
    raise ASSERTION_ERROR(message)
AssertionError: Task 'deploy' has incorrect status. error != ready, 'Deployment has failed. Method task_deploy. Validation of node:
{"uid"=>"1",
"status"=>"error",
"error_type"=>"deploy",
"error_msg"=>
  "Critical nodes failed: Node[1]. Stopping the deployment process!",
"task"=>"ceph_ready_check",
"task_status"=>"failed"}
for report failed: Task name is not provided.
Inspect Astute logs for the details'

The following cinder-scheduler CRITICAL log can be found:
2016-03-24 22:08:34.061 13710 CRITICAL cinder [req-0d95d8c0-bc38-4a61-ae41-fb35c7e88323 - - - - -] ProgrammingError: (_mysql_exceptions.ProgrammingError) (1146, "Table 'cinder.services' doesn't exist") [SQL: u'SELECT services.created_at AS services_created_at, services.updated_at AS services_updated_at, services.deleted_at AS services_deleted_at, services.deleted AS services_deleted, services.id AS services_id, services.host AS services_host, services.`binary` AS services_binary, services.topic AS services_topic, services.report_count AS services_report_count, services.disabled AS services_disabled, services.availability_zone AS services_availability_zone, services.disabled_reason AS services_disabled_reason, services.modified_at AS services_modified_at, services.rpc_current_version AS services_rpc_current_version, services.object_current_version AS services_object_current_version, services.replication_status AS services_replication_status, services.active_backend_id AS services_active_backend_id, services.frozen AS services_frozen \nFROM services \nWHERE services.deleted = false AND services.`binary` = %s'] [parameters: ('cinder-scheduler',)]

Also the following error can be found in cinder-voulme.log:
/var/log/cinder/cinder-volume.log:2016-03-24 22:10:55.134 33213 ERROR oslo_service.service VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Error connecting to ceph cluster.

See original description

Tags:

Revision history for this message

Alexander Gromov (agromov) wrote on 2016-03-25:

fuel-snapshot-2016-03-25_11-05-11.tar.xz Edit (55.6 MiB, application/octet-stream)

Alexander Gromov (agromov) on 2016-03-25

tags:	added: ceph
description:	updated
description:	updated

Alexander Gromov (agromov) on 2016-03-25

description:

updated

Dmitry Klenov (dklenov) on 2016-03-25

tags:	added: area-library
Changed in fuel:
milestone:	none → 9.0
assignee:	nobody → Fuel Library Team (fuel-library)
importance:	Undecided → High
status:	New → Confirmed

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2016-03-28:

This is a failure because database for cinder has an issue. It's probably because this task is run in parallel with the non-primary database tasks. We saw this with keystone and it's similar.

Dmitry Bilunov (dbilunov) on 2016-03-28

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Dmitry Bilunov (dbilunov)

Revision history for this message

Alexander Gromov (agromov) wrote on 2016-03-28:

Looks like the same problem was also reproduced in 'ceph_rados_gw' test on CI: https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ha_neutron_tun/60/console

Revision history for this message

Alexander Gromov (agromov) wrote on 2016-03-30:

The same error is reproduced on CI for MOS 7.0 MU3:
https://patching-ci.infra.mirantis.net/job/7.0.system_test.ubuntu.huge_ha_neutron/9/consoleFull

Revision history for this message

Dmitry Belyaninov (dbelyaninov) wrote on 2016-04-01:

Also one test case fail with the same issue:
https://patching-ci.infra.mirantis.net/job/7.0.system_test.ubuntu.ha_neutron_tun_scale/25/consoleFull

https://mirantis.testrail.com/index.php?/tests/view/3805137

2016-03-31 16:12:48.681 28268 TRACE cinder.service raise original_exception
2016-03-31 16:12:48.681 28268 TRACE cinder.service DBConnectionError: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None
2016-03-31 16:12:48.681 28268 TRACE cinder.service
.....
2016-03-31 16:05:05.658 28268 TRACE cinder.service raise original_exception
2016-03-31 16:05:05.658 28268 TRACE cinder.service DBConnectionError: (OperationalError) (2003, "Can't connect to MySQL server on '10.109.17.3' (111)") None None
2016-03-31 16:05:05.658 28268 TRACE cinder.service

Matthew Mosesohn (raytrac3r) on 2016-04-01