Ironic_deploy_ceph deployment failed with tasks: Task[ceph_ready_check/1], Task[enable_rados/3], Task[enable_rados/2]

Bug #1592483 reported by Kyrylo Romanenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Won't Fix
High
Kyrylo Romanenko
10.0.x
Won't Fix
High
Kyrylo Romanenko
9.x
Won't Fix
High
Kyrylo Romanenko

Bug Description

Steps to reproduce:
1. Create cluster
2. Add 1 node with controller+ceph-osd role
3. Add 2 nodes with controller+ironic+ceph-osd role
4. Add 1 node with compute role
5. Add 1 nodes with ironic role
6. Deploy the cluster

Failure traceback:
http://paste.openstack.org/show/516039/

Automated test job Log: https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ironic_deploy_ceph/134/console

Note that due to bug LP#1573125 test runs twice in job but it failed only on second run.
Logs snapshot attached.

Environment: CI with iso fuel-9.0-mos-465-2016-06-09_22-51-38.iso

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

expected result

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Dina Belova (dbelova) wrote :

Due to the high bug priority (only criticals to be fixed after HCF) I'm moving it to 9.0-updates.

Changed in mos:
status: New → Confirmed
milestone: 9.0 → 9.0-updates
tags: added: move-to-mu
Revision history for this message
Serge Kovaleff (serge-kovaleff) wrote :

@Kirill, I am not really sure how Ironic can help with Ceph deployment architecture.

Definitely we need this working. Who can fix the Ceph deployment?

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Dear colleagues,

please provide a basic description of ceph cluster, such as

- a brief description of the hardware (CPUs, RAM, data and journal drives)
- the output of "ceph -s"
- the output of "ceph osd tree"
- the output of "ceph mon_status"

> 2. Add 1 node with controller+ceph-osd role

Deploying a monitor and an OSD on the same node is not supported

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> fail_error_ironic_deploy_ceph-fuel-snapshot-2016-06-14_01-02-50.tar.gz

fuel-snapshot-2016-06-14_01-02-50$ tail node-1/var/log/ceph/ceph-osd.0.log
2016-06-14 00:13:10.353636 7f6be94b5700 0 osd.0 0 ignoring osdmap until we have initialized
2016-06-14 00:13:10.353776 7f6bfa7df800 0 osd.0 0 done with init, starting boot process
2016-06-14 00:13:10.495677 7f6be94b5700 0 osd.0 4 crush map has features 1107558400, adjusting msgr requires for clients
2016-06-14 00:13:10.495783 7f6be94b5700 0 osd.0 4 crush map has features 1107558400 was 33825281, adjusting msgr requires for mons
2016-06-14 00:13:10.495819 7f6be94b5700 0 osd.0 4 crush map has features 1107558400, adjusting msgr requires for osds
2016-06-14 00:13:21.811949 7f6bd5c8e700 -1 osd.0 5 *** Got signal Terminated ***
2016-06-14 00:13:21.811983 7f6bd5c8e700 0 osd.0 5 prepare_to_stop telling mon we are shutting down
2016-06-14 00:13:21.813849 7f6be94b5700 0 monclient: hunting for new mon

So osd.0 had been normally shut down (perhaps on the user request?)
By the way, the log looks truncated: there should have been more messages during the shutdown sequence.

fuel-snapshot-2016-06-14_01-02-50$ find . -type f -name '*osd.?.log' | xargs ls -l
-rw-r--r-- 1 asheplyakov adm 6756 Jun 14 04:04 ./node-1/var/log/ceph/ceph-osd.0.log
-rw-r--r-- 1 asheplyakov adm 33326 Jun 14 04:04 ./node-2/var/log/ceph/ceph-osd.1.log
-rw-r--r-- 1 asheplyakov adm 9768 Jun 14 04:04 ./node-3/var/log/ceph/ceph-osd.2.log

There is at most ~ 50KB of useful data in the 58MB tarball. The signal/noise ratio is remarkably low.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Fuel starts osd.0 on node-1 immediately after starting the monitor on the same node (at 00:13:21).
However monitors on node-2 and node-3 haven't been started yet (let alone establishing the quorum).

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Since deploying of monitor and an OSD on the same node is prohibited i can discard this bug.

Changed in mos:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.