ceph-disk: Error: Device is mounted: /dev/sdb1 (Unable to initialize device: /dev/sdb)

Bug #1506287 reported by Ryan Beisner on 2015-10-15
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
ceph (Juju Charms Collection)
Undecided
James Page

Bug Description

In back-to-back bare metal test automation, I've encountered multiple of these failures on Vivid-Kilo (stable charms) and Wily-Liberty (next charms). I've found several similar and potentially-related bugs, and this may be a duplicate -- but none appeared to match precisely.

The hardware units are all identical, with single spindles as sda and sdb.

# all 3 ceph units fail with:
2015-10-14 20:37:37 INFO mon-relation-changed ceph-disk: Error: Device is mounted: /dev/sdb1
2015-10-14 20:37:37 INFO worker.uniter.jujuc server.go:158 running hook tool "juju-log" ["-l" "ERROR" "Unable to initialize device: /dev/sdb"]
2015-10-14 20:37:37 ERROR juju-log mon:3: Unable to initialize device: /dev/sdb
2015-10-14 20:37:37 INFO mon-relation-changed Traceback (most recent call last):
2015-10-14 20:37:37 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-0/charm/hooks/mon-relation-changed", line 389, in <module>
2015-10-14 20:37:37 INFO mon-relation-changed hooks.execute(sys.argv)
2015-10-14 20:37:37 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-0/charm/hooks/charmhelpers/core/hookenv.py", line 672, in execute
2015-10-14 20:37:37 INFO mon-relation-changed self._hooks[hook_name]()
2015-10-14 20:37:37 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-0/charm/hooks/mon-relation-changed", line 217, in mon_relation
2015-10-14 20:37:37 INFO mon-relation-changed reformat_osd(), config('ignore-device-errors'))
2015-10-14 20:37:37 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-0/charm/hooks/ceph.py", line 345, in osdize
2015-10-14 20:37:37 INFO mon-relation-changed osdize_dev(dev, osd_format, osd_journal, reformat_osd, ignore_errors)
2015-10-14 20:37:37 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-0/charm/hooks/ceph.py", line 393, in osdize_dev
2015-10-14 20:37:37 INFO mon-relation-changed raise e
2015-10-14 20:37:37 INFO mon-relation-changed subprocess.CalledProcessError: Command '['ceph-disk-prepare', '--fs-type', u'xfs', '--zap-disk', u'/dev/sdb']' returned non-zero exit status 1
2015-10-14 20:37:37 INFO juju.worker.uniter.context context.go:543 handling reboot
2015-10-14 20:37:37 ERROR juju.worker.uniter.operation runhook.go:103 hook "mon-relation-changed" failed: exit status 1

See paste for juju stat, charm revno info, deployer output, mojo output:
http://paste.ubuntu.com/12785725/

http://10.245.162.77:8080/view/Dashboards/view/Mojo/job/mojo_runner_baremetal/561/artifact/

Ryan Beisner (1chb1n) wrote :

Shortly after posting this bug, originally for v-k/stable, I observed the same for Wily-Liberty using the next charms on metal:

2015-10-14 19:06:17 INFO mon-relation-changed ceph-disk: Error: Device is mounted: /dev/sdb1
2015-10-14 19:06:17 INFO worker.uniter.jujuc server.go:158 running hook tool "juju-log" ["-l" "ERROR" "Unable to initialize device: /dev/sdb"]
2015-10-14 19:06:17 ERROR juju-log mon:3: Unable to initialize device: /dev/sdb
2015-10-14 19:06:17 INFO mon-relation-changed Traceback (most recent call last):
2015-10-14 19:06:17 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-2/charm/hooks/mon-relation-changed", line 432, in <module>
2015-10-14 19:06:17 INFO mon-relation-changed hooks.execute(sys.argv)
2015-10-14 19:06:17 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-2/charm/hooks/charmhelpers/core/hookenv.py", line 672, in execute
2015-10-14 19:06:17 INFO mon-relation-changed self._hooks[hook_name]()
2015-10-14 19:06:17 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-2/charm/hooks/mon-relation-changed", line 235, in mon_relation
2015-10-14 19:06:17 INFO mon-relation-changed reformat_osd(), config('ignore-device-errors'))
2015-10-14 19:06:17 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-2/charm/hooks/ceph.py", line 346, in osdize
2015-10-14 19:06:17 INFO mon-relation-changed osdize_dev(dev, osd_format, osd_journal, reformat_osd, ignore_errors)
2015-10-14 19:06:17 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-2/charm/hooks/ceph.py", line 395, in osdize_dev
2015-10-14 19:06:17 INFO mon-relation-changed raise e
2015-10-14 19:06:17 INFO mon-relation-changed subprocess.CalledProcessError: Command '['ceph-disk-prepare', '--fs-type', u'xfs', '--zap-disk', u'/dev/sdb']' returned non-zero exit status 1
2015-10-14 19:06:17 INFO juju.worker.uniter.context context.go:543 handling reboot
2015-10-14 19:06:17 ERROR juju.worker.uniter.operation runhook.go:103 hook "mon-relation-changed" failed: exit status 1
2015-10-14 19:06:17 DEBUG juju.worker.uniter modes.go:31 [AGENT-STATUS] failed: run relation-changed (3; ceph/1) hook

http://paste.ubuntu.com/12785825/

http://10.245.162.77:8080/view/Dashboards/view/Mojo/job/mojo_runner_baremetal/559/artifact/

summary: - vivid-kilo (stable charm on metal) ceph-disk: Error: Device is mounted:
- /dev/sdb1 (Unable to initialize device: /dev/sdb)
+ ceph-disk: Error: Device is mounted: /dev/sdb1 (Unable to initialize
+ device: /dev/sdb)
description: updated
James Page (james-page) on 2015-10-16
Changed in ceph (Juju Charms Collection):
assignee: nobody → James Page (james-page)
Ryan Beisner (1chb1n) wrote :

It is worth noting:

With the automated tests referenced above, the ceph fsid has historically always been the same static uuid value:
    fsid: 6547bd3e-1397-11e2-82e5-53567c8d32dc

That approach is passing for all upstart-based test targets, but failing on all systemd-based test targets.

When we alter the test to generate a new fsid uuid for each test run, the install hook no longer fails on systemd-based deploys (Vivid-Kilo, Wily-Liberty).

It is as if the new deployment sees the previous environment's fsid on disk, which happens to be the same as the new environment's fsid value, and thusly explodes instead of re-using it.

Vivid-Kilo (next):
ceph/0 active idle 1.24.6.1 1 international-steam.dellstack Unit is ready and clustered
ceph/1 active idle 1.24.6.1 2 decisive-punishment.dellstack Unit is ready and clustered
ceph/2 active executing 1.24.6.1 3 downright-acoustics.dellstack Unit is ready and clustered

Looking into this

Lirim (lirim-osmani) wrote :

I'm on site and trying to deploy ceph (vivid-kilo based system) and being able to reproduce the same error multiple times now, even when changing fsid.

INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-1/charm/hooks/charmhelpers/core/hookenv.py", line 672, in execute
INFO mon-relation-changed self._hooks[hook_name]()
INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-1/charm/hooks/mon-relation-changed", line 199, in mon_relation
INFO mon-relation-changed config('ignore-device-errors'))
INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-1/charm/hooks/ceph.py", line 314, in osdize
INFO mon-relation-changed osdize_dev(dev, osd_format, osd_journal, reformat_osd, ignore_errors)
INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-1/charm/hooks/ceph.py", line 363, in osdize_dev
INFO mon-relation-changed raise e
INFO mon-relation-changed subprocess.CalledProcessError: Command '['ceph-disk-prepare', '--fs-type', u'xfs', '--zap-disk', u'/dev/sdc']' returned non-zero exit status 1
ERROR juju.worker.uniter.operation runhook.go:103 hook "mon-relation-changed" failed: exit status 1

Ryan Beisner (1chb1n) wrote :

FWIW - since implementing unique fsids per new deployment in iterative test automation, we've not re-encountered this issue.

Ryan Beisner (1chb1n) on 2016-04-15
Changed in ceph (Juju Charms Collection):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers