Deployment fails on ceph-osd nodes

Bug #1529841 reported by Michael Semenov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Alexei Sheplyakov
8.0.x
Fix Released
High
Alexei Sheplyakov
9.x
Fix Released
High
MOS Ceph

Bug Description

Found on Fuel 8.0 #355. Environment: perf-3 (https://172.16.52.114:8443).

Deployment failed, ceph-osd nodes were not deployed. Logs on the nodes show:
2015-12-28 18:57:41 ERR ceph-deploy osd prepare node-7:/dev/sdd3:/dev/disk/by-id/ata-INTEL_SSDSC2BW240A4_PHDA410301C22403GN-part5 returned 1 instead of one of [0]

When running the failed command manually:
root@node-9:~# ceph-deploy osd prepare node-9:/dev/sdc3:/dev/disk/by-id/ata-INTEL_SSDSC2BW240A4_PHDA410301812403GN-part3

It fails here:
[node-9][WARNIN] DEBUG:ceph-disk:Journal /dev/disk/by-id/ata-INTEL_SSDSC2BW240A4_PHDA410301812403GN-part3 was previously prepared with ceph-disk. Reusing it.
[node-9][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk -i 2 /dev/disk/by-id/ata-INTEL_SSDSC
[node-9][WARNIN] Problem opening /dev/disk/by-id/ata-INTEL_SSDSC for reading! Error is 2.

So, the problem is that fuel passes the journal device using "by-id" symlink (not /dev/sda3) and the current version of ceph-disk (v0.94.5) does not parse it correctly. When I run:
ceph-deploy -v osd prepare node-9:/dev/sdc3:/dev/sda3

it works.

There is a bug in ceph-disk(it should work with symlinks): http://tracker.ceph.com/issues/13438

Diangostic snapshot will be soon.

Tags: area-ceph
Changed in fuel:
status: New → Confirmed
Changed in fuel:
assignee: nobody → Alexei Sheplyakov (asheplyakov)
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> http://tracker.ceph.com/issues/13438

It's a different problem (caused by parted: it resolves the link and outputs its destination which confuses ceph-disk)
The problem is caused by extremely naive splitting of the device node path into the base device and the partition number: https://github.com/ceph/ceph/blob/v0.94.5/src/ceph-disk#L2361-L2365
The bug has been silently fixed in the master branch by commits
https://github.com/ceph/ceph/commit/0e34742b968e72aa6ce4a0c95a885dced435b3bc
https://github.com/ceph/ceph/commit/3bc95dfc1b88c01e16c3df04e96acced777b344a
https://github.com/ceph/ceph/commit/77ff7c3dc6dd6861b094e5a53d329de0802f3032

I'm working on backporting those to hammer

Changed in fuel:
status: Confirmed → In Progress
tags: added: area-ceph
removed: ceph
affects: fuel → mos
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/trusty/ceph (8.0)

Fix proposed to branch: 8.0
Change author: Alexei Sheplyakov <email address hidden>
Review: https://review.fuel-infra.org/15607

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to packages/trusty/ceph (8.0)

Reviewed: https://review.fuel-infra.org/15607
Submitter: Pkgs Jenkins <email address hidden>
Branch: 8.0

Commit: 1779febd5f6740a844b147291c3cfd3e24b76ae2
Author: Alexei Sheplyakov <email address hidden>
Date: Tue Dec 29 14:02:17 2015

ceph-disk: improve base device/partition number calculation

split_dev_base_partnum fails to handle the device nodes names containing
digits (like those /dev/disk/by-id generated by udev), as a result

ceph-deploy osd prepare $node:/dev/sdc3:/dev/disk/by-id/wwn-0x50014ee00386a548-part3

fails, although it works fine this way

ceph-deploy osd prepare $node:/dev/sdc3:/dev/sde3

where /dev/sde3 is the very same partition (i.e. it's the destination of
the /dev/disk/by-id/wwn-0x50014ee00386a548-part3 symlink).

The problem has been silently fixed in the master branch by commits

https://github.com/ceph/ceph/commit/0e34742b968e
https://github.com/ceph/ceph/commit/3bc95dfc1b88
https://github.com/ceph/ceph/commit/77ff7c3dc6dd

Cherry pick those, and https://github.com/ceph/ceph/commit/1c5fea67
which fixes yet another symlink related problem.

Closes-Bug: #1529841
Change-Id: I37f83266efc4468103afd3fe82336b83f91fbe58

Revision history for this message
Aleksei Stepanov (penguinolog) wrote :

fuel-8.0-525-2016-02-05_01-55-59.iso - ceph-osd role has been deployed, OSTF passed.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Fixed in MOS 9.0 by commit https://review.fuel-infra.org/17407, updating the bug status accordingly

Revision history for this message
Sofiia Andriichenko (sandriichenko) wrote :

[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 143
cat /etc/fuel_build_number:
 143
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-library9.0-9.0.0-1.mos8252.noarch
 fuel-agent-9.0.0-1.mos272.noarch
 nailgun-mcagents-9.0.0-1.mos732.noarch
 fuel-misc-9.0.0-1.mos8252.noarch
 shotgun-9.0.0-1.mos87.noarch
 python-packetary-9.0.0-1.mos129.noarch
 fuel-bootstrap-cli-9.0.0-1.mos272.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8610.noarch
 fuel-mirror-9.0.0-1.mos129.noarch
 fuel-openstack-metadata-9.0.0-1.mos8610.noarch
 fuel-notify-9.0.0-1.mos8252.noarch
 fuel-ostf-9.0.0-1.mos920.noarch
 fuel-setup-9.0.0-1.mos6324.noarch
 python-fuelclient-9.0.0-1.mos301.noarch
 network-checker-9.0.0-1.mos72.x86_64
 fuel-9.0.0-1.mos6324.noarch
 fuel-utils-9.0.0-1.mos8252.noarch
 fuel-nailgun-9.0.0-1.mos8610.noarch
 fuel-release-9.0.0-1.mos6324.noarch
 rubygem-astute-9.0.0-1.mos732.noarch
 fuelmenu-9.0.0-1.mos263.noarch
 fuel-ui-9.0.0-1.mos2635.noarch
 fuel-migrate-9.0.0-1.mos8252.noarch
[root@nailgun ~]#

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.