Ceph install fails by using removable devices

Bug #1420094 reported by Drew
56
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Triaged
Wishlist
Unassigned
OpenStack Ceph Charm (Retired)
Won't Fix
Medium
Unassigned
ceph (Juju Charms Collection)
Invalid
Medium
Unassigned

Bug Description

It appears that ceph-install detects and attempts to use removable devices (CD, SD card) as valid block storage:

unit-ceph-0: 2015-02-10 04:05:31 INFO mon-relation-changed subprocess.CalledProcessError: Command '['ceph-disk-prepare', '--fs-type', u'xfs', '--zap-disk', u'/dev/sdc']' returned non-zero exit status 1
unit-ceph-0: 2015-02-10 04:05:31 ERROR juju.worker.uniter uniter.go:486 hook failed: exit status 1

This causes the landscape install to fail at 94%.

Storage devices on all boxes in the cluster:
/dev/sda - 136GB RAID0
/dev/sdb - 136GB RAID0
/dev/sdc - CD drive
/dev/sdd - SD card

This could be due to how they're being reported in the debian installer (?). During install time the devices are reported as sdc and sdd respectively, but post install they are sda and sdb. However, ceph does appear to correctly provision /dev/sdb for itself.

Attached is the log set from landscape.

Revision history for this message
Drew (drew-6) wrote :
Revision history for this message
Adam Collard (adam-collard) wrote :
Download full text (4.7 KiB)

unit-ceph-2: 2015-02-10 04:05:20 INFO mon-relation-changed #015Reading state information... 0%#015#015Reading state information... 0%#015#015Reading state information... Done
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed ***************************************************************
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed Found invalid GPT and valid MBR; converting MBR to GPT format
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed in memory.
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed ***************************************************************
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed Warning: The kernel is still using the old partition table.
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed The new table will be used at the next reboot.
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed GPT data structures destroyed! You may now partition the disk using fdisk or
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed other utilities.
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed Warning: The kernel is still using the old partition table.
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed The new table will be used at the next reboot.
unit-ceph-2: 2015-02-10 04:05:22 INFO mon-relation-changed The operation has completed successfully.
unit-ceph-2: 2015-02-10 04:05:23 INFO mon-relation-changed Warning: The kernel is still using the old partition table.
unit-ceph-2: 2015-02-10 04:05:23 INFO mon-relation-changed The new table will be used at the next reboot.
unit-ceph-2: 2015-02-10 04:05:23 INFO mon-relation-changed The operation has completed successfully.
unit-ceph-2: 2015-02-10 04:05:24 INFO mon-relation-changed Error: Partition(s) 5 on /dev/sdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
unit-ceph-2: 2015-02-10 04:05:25 INFO mon-relation-changed Warning: The kernel is still using the old partition table.
unit-ceph-2: 2015-02-10 04:05:25 INFO mon-relation-changed The new table will be used at the next reboot.
unit-ceph-2: 2015-02-10 04:05:25 INFO mon-relation-changed The operation has completed successfully.
unit-ceph-2: 2015-02-10 04:05:26 INFO mon-relation-changed Error: Error informing the kernel about modifications to partition /dev/sdb1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdb1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
unit-ceph-2: 2015-02-10 04:05:26 INFO mon-relation-changed Error: Failed to add partition 1 (Device or resource busy)
unit-ceph-2: 2015-02-10 04:05:26 INFO mon-relation-changed ceph-disk: Error: partition 1 for /dev/sdb does not appear to exist
unit-ceph-2: 2015-02-10 04:05:26 ERROR juju-log mon:60: Unable to initialize device: /dev/sdb
unit-ceph-2: 2015-02-10 04:05:26 INFO mon-relation-changed Traceback (most re...

Read more...

Revision history for this message
Drew (drew-6) wrote :
Download full text (6.6 KiB)

For unit-ceph-2 I removed all existing partitions on /dev/sdb (which had the swap partition on it and an old MBR) and re-deployed landscape. This time ceph-disk-prepare correctly partitions /dev/sdb. However it still errors out when trying to partition /dev/sdc (the CD drive), which causes the whole build to fail:

2015-02-10 20:59:24 INFO juju-log mon:60: Making dir /var/lib/charm/ceph root:root 555
2015-02-10 20:59:24 WARNING juju-log mon:60: Not a valid ipv6 address: 192.168.1.6
2015-02-10 20:59:24 WARNING juju-log mon:60: Not a valid ipv6 address: 192.168.1.7
2015-02-10 20:59:24 WARNING juju-log mon:60: Not a valid ipv6 address: 192.168.1.8
2015-02-10 20:59:24 INFO juju-log mon:60: Making dir /var/run/ceph root:root 755
2015-02-10 20:59:24 INFO juju-log mon:60: Making dir /var/lib/ceph/mon/ceph-portly-dust root:root 555
2015-02-10 20:59:24 INFO mon-relation-changed creating /var/lib/ceph/tmp/portly-dust.mon.keyring
2015-02-10 20:59:24 INFO mon-relation-changed added entity mon. auth auth(auid = 18446744073709551615 key=AQBAZ9pUaDONHRAAAvuwKk8AZY4TH2i2dGjSOQ== with 0 caps)
2015-02-10 20:59:24 INFO mon-relation-changed ceph-mon: mon.noname-a 192.168.1.6:6789/0 is local, renaming to mon.portly-dust
2015-02-10 20:59:24 INFO mon-relation-changed ceph-mon: set fsid to f288da7d-e3e4-441f-98f3-b2820e6aa776
2015-02-10 20:59:24 INFO mon-relation-changed ceph-mon: created monfs at /var/lib/ceph/mon/ceph-portly-dust for mon.portly-dust
2015-02-10 20:59:24 INFO mon-relation-changed ceph-mon-all stop/waiting
2015-02-10 20:59:24 INFO mon-relation-changed ceph-mon-all start/running
2015-02-10 20:59:34 INFO juju-log mon:60: Looks like /dev/sda is in use, skipping.
2015-02-10 20:59:34 INFO juju-log mon:60: Path /dev/vda does not exist - bailing
Reading package lists... Donerelation-changed
Building dependency tree lation-changed
Reading state information... Donetion-changed
2015-02-10 20:59:38 INFO mon-relation-changed Caution: invalid backup GPT header, but valid main header; regenerating
2015-02-10 20:59:38 INFO mon-relation-changed backup header from main header.
2015-02-10 20:59:38 INFO mon-relation-changed
2015-02-10 20:59:40 INFO mon-relation-changed ****************************************************************************
2015-02-10 20:59:40 INFO mon-relation-changed Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
2015-02-10 20:59:40 INFO mon-relation-changed verification and recovery are STRONGLY recommended.
2015-02-10 20:59:40 INFO mon-relation-changed ****************************************************************************
2015-02-10 20:59:40 INFO mon-relation-changed GPT data structures destroyed! You may now partition the disk using fdisk or
2015-02-10 20:59:40 INFO mon-relation-changed other utilities.
2015-02-10 20:59:40 INFO mon-relation-changed The operation has completed successfully.
2015-02-10 20:59:41 INFO mon-relation-changed The operation has completed successfully.
2015-02-10 20:59:42 INFO mon-relation-changed The operation has completed successfully.
2015-02-10 20:59:43 INFO mon-relation-changed meta-data=/dev/sdb1 isize=2048 agcount=4, agsize=8855487 blks
2015-02-10 20:59:...

Read more...

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

This is an issue with the ceph charm. It shouldn't treat the cdrom as a disk it can partition/format, and probably also don't get confused by an existing partitioning scheme in a real disk.

information type: Proprietary → Public
affects: landscape → ceph (Juju Charms Collection)
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Drew, this was your paste, right?

http://pastebin.ubuntu.com/9943987/

In that case, the above is the lshw information for the machine in question.

tags: added: cloud-installer landscape
Revision history for this message
James Page (james-page) wrote :

We need to bake some better inteligence into the code that currently checks whether a block device configured is usable for ceph; anything readonly should be ignored.

Changed in ceph (Juju Charms Collection):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Chris Holcombe (xfactor973) wrote :

The way I did this with my ceph manager code I wrote was to check some udev attributes and rule out everything that didn't fit. I actually ran into the same problems with my early code where it was seeing cd rom drives, ramdisks, etc as disks that were fit for using.

David Britton (dpb)
Changed in ceph (Juju Charms Collection):
assignee: nobody → Chris Holcombe (xfactor973)
Revision history for this message
David Britton (dpb) wrote :

Note also that this presents itself when other disk-like things are presented to the OS that cannot be used:

http://askubuntu.com/questions/612691/dell-servers-openstack-and-autopilot-beta-can-they-work-together

Revision history for this message
Chris Holcombe (xfactor973) wrote :

I think that shouldn't be a problem so long as we filter on disks that are SSD or spinning and have a size > 0. I've not encountered disks with zero size before but I've also never used Dell hardware.

David Britton (dpb)
tags: added: kanban-cross-team
David Britton (dpb)
tags: removed: kanban-cross-team
Revision history for this message
Thiago (thisab) wrote :

I found this entry on one of the broken (ceph-osd/4) containers:
2016-06-19 00:57:33 INFO mon-relation-changed Traceback (most recent call last):
2016-06-19 00:57:33 INFO mon-relation-changed File "/usr/sbin/ceph-disk", line 3019, in <module>
2016-06-19 00:57:33 INFO mon-relation-changed main()
2016-06-19 00:57:33 INFO mon-relation-changed File "/usr/sbin/ceph-disk", line 2997, in main
2016-06-19 00:57:33 INFO mon-relation-changed args.func(args)
2016-06-19 00:57:33 INFO mon-relation-changed File "/usr/sbin/ceph-disk", line 1507, in main_prepare
2016-06-19 00:57:33 INFO mon-relation-changed zap(args.data)
2016-06-19 00:57:33 INFO mon-relation-changed File "/usr/sbin/ceph-disk", line 1052, in zap
2016-06-19 00:57:33 INFO mon-relation-changed with file(dev, 'wb') as dev_file:
2016-06-19 00:57:33 INFO mon-relation-changed IOError: [Errno 123] No medium found: '/dev/sdc'
2016-06-19 00:57:33 ERROR juju-log mon:32: Unable to initialize device: /dev/sdc

I even opened another thread: https://bugs.launchpad.net/landscape/+bug/1593802 .

Revision history for this message
Thiago (thisab) wrote :

As can be seen here: http://paste.ubuntu.com/17978269/
In some machines the output is always the same for ceph-osd:
2016-06-27 15:32:46 INFO juju-log mon:27: Path /dev/vdb does not exist - bailing
2016-06-27 15:32:46 INFO mon-relation-changed Problem opening /dev/sdc for reading! Error is 123.

Revision history for this message
Chad Smith (chad.smith) wrote :

As of Landscape 16.05, ceph-osd['osd-devices'] will be set only to available, formattable, unpartitioned and unraided devices to the osd-devices. Landscape will no longer blanket ceph-osd with all known device paths, it will limit osd-devices to the specific disk-ids of supported disks to avoid ceph-osd incorrectly selecting unusable devices.

James Page (james-page)
Changed in charm-ceph:
assignee: nobody → Chris Holcombe (xfactor973)
importance: Undecided → Medium
status: New → Triaged
Changed in ceph (Juju Charms Collection):
status: Triaged → Invalid
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Marking the charm-ceph task wontfix as the ceph charm has been removed from support for a while now

Changed in charm-ceph:
assignee: Chris Holcombe (xfactor973) → nobody
status: Triaged → Won't Fix
Changed in charm-ceph-osd:
importance: Undecided → Wishlist
Changed in ceph (Juju Charms Collection):
assignee: Chris Holcombe (xfactor973) → nobody
Changed in charm-ceph-osd:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.