Error: Device is mounted
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
Fix Released
|
High
|
Chris MacNaughton |
Bug Description
It looks like the ceph-osd charm is trying to initialize my storage twice:
2018-01-29 22:35:21 INFO juju-log mon:43: osdize cmd: ['ceph-disk', 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore', '/dev/sdb']
2018-01-29 22:36:03 INFO juju-log mon:43: osdize cmd: ['ceph-disk', 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore', '/dev/sdb']
The second time in fails with an error:
2018-01-29 22:36:03 DEBUG mon-relation-
ceph-osd/5 log: http://
Jason Hobbs (jason-hobbs) wrote : | #1 |
description: | updated |
Christian Reis (kiko) wrote : | #2 |
Nobuto Murata (nobuto) wrote : | #3 |
Somehow, "is_device_mounted" check didn't work before running "ceph-disk prepare"?
def osdize_dev(dev, osd_format, osd_journal, reformat_osd=False,
if not os.path.
log('Path {} does not exist - bailing'
return
if not is_block_
log('Path {} is not a block device - bailing'
return
if is_osd_disk(dev) and not reformat_osd:
log('Looks like {} is already an'
' OSD data or journal, skipping.
return
if is_device_
log('Looks like {} is in use, skipping.
return
status_
cmd = ['ceph-disk', 'prepare']
tags: |
added: foundations-engine removed: cpe-foundations |
Christian Reis (kiko) wrote : | #4 |
Looks like the easy way to solve this is to do check for is_device_mounted at the end of osdize_dev() and raise an error if it's not.
Christian Reis (kiko) wrote : | #5 |
We believe the race here is the sysfs nodes not being set up in time for lsblk to return the right data, which is_device_mounted() uses to confirm the mount is present.
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master) | #6 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit eeacba1614859c2
Author: Chris MacNaughton <email address hidden>
Date: Thu Feb 15 15:03:07 2018 +0100
Sync in charms.ceph change for udev settle
Change-Id: Ideb8dbe8e6e439
Closes-Bug: #1746118
Changed in charm-ceph-osd: | |
status: | New → Fix Committed |
Jason Hobbs (jason-hobbs) wrote : | #7 |
- juju-crashdump-cf00a592-15a4-442a-ac16-510ad1fcdbf8.tar.gz Edit (11.0 MiB, application/x-tar)
We hit this bug again yesterday running against the tip version of the charms.
Changed in charm-ceph-osd: | |
status: | Fix Committed → New |
Jason Hobbs (jason-hobbs) wrote : | #8 |
Marked back to 'new' as the commit above didn't seem to fix this issue.
Chris Gregan (cgregan) wrote : | #9 |
Escalated to Field High to increase visibility
Christian Reis (kiko) wrote : | #10 |
Dupe of bug 1751127?
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1746118] Re: Error: Device is mounted | #11 |
No, Chris said they are separate issues.
On Wed, Mar 7, 2018 at 10:21 AM, Christian Reis <email address hidden> wrote:
> Dupe of bug 1751127?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Error: Device is mounted
>
> Status in OpenStack ceph-osd charm:
> New
>
> Bug description:
> It looks like the ceph-osd charm is trying to initialize my storage
> twice:
>
> 2018-01-29 22:35:21 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> 2018-01-29 22:36:03 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> The second time in fails with an error:
> 2018-01-29 22:36:03 DEBUG mon-relation-
>
> ceph-osd/5 log: http://
>
> bundle: http://
>
> To manage notifications about this bug go to:
> https:/
Ashley Lai (alai) wrote : | #12 |
- juju-crashdump-5c1eda5d-c237-44b2-b8c3-2603964eb219.tar.gz Edit (10.7 MiB, application/x-tar)
Our test run pointed to cs:~chris.
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
2018-03-07 12:45:39 DEBUG mon-relation-
https:/
Changed in charm-ceph-osd: | |
assignee: | nobody → Chris MacNaughton (chris.macnaughton) |
importance: | Undecided → High |
milestone: | none → 18.05 |
Chris MacNaughton (chris.macnaughton) wrote : | #13 |
cs:~chris.
Ashley Lai (alai) wrote : | #14 |
The charm at cs:~chris.
@Chris - let us know if you need a crash dump to take a look at the log. Also let us know when we should switch back to testing -next charm. Thanks!!
Chris MacNaughton (chris.macnaughton) wrote : | #15 |
The version in my namespace shouldn't fix anything, it should only break things more noisily if it breaks. The only thing it adds is some more output around lsblk, as well as an assert to verify that the white-listed devices are mounted
Chris MacNaughton (chris.macnaughton) wrote : | #16 |
For reference, the changes that are in my namespace will not be going into openstack-
Ashley Lai (alai) wrote : | #17 |
It is possible that it hits other issues before getting to here. We will keep monitoring.
Ashley Lai (alai) wrote : | #18 |
We still have not seen the issue as of today. @Chris if there are any changes in the -next charm, please rebase the patch on top of -next. That way our test will cover the new updates. Thanks !!
Changed in charm-ceph-osd: | |
status: | New → Incomplete |
status: | Incomplete → In Progress |
Chris MacNaughton (chris.macnaughton) wrote : | #19 |
@alai there haven't been any changes to the ceph-osd charm in 29 days; the fact that you're not hitting this bug is somewhat interesting; however, nothing in the charm changes in my namespace should have any effect on that.
As a longer term solution to this issue, and other similar things that can cause the OSD to fail to be (re)created, we could wrap the ceph-disk command in some exception handling, and log failures with more information; however, I suspect that doing the above would have ended up with this bug being filed with: "No OSD devices detected with current configuration" ;-)
I'll work on updating charms.ceph to more gracefully (and verbosely) handling errors in the ceph-disk commands, at which point, re-targeting to openstack-
Chris MacNaughton (chris.macnaughton) wrote : | #20 |
It looks like this has been reproduced in a crashdump attached to https:/
From the juju status output in that crashdump, it looks like the ceph-osd in use is _not_ the one from my namespace, and, as such, doesn't have the additional logging added to try to diagnose this issue :-/
On the other hand, the time between successfully calling `ceph-disk` and failing the same is around 40 seconds, so not likely too fast for the kernel to see it as mounted.
Jason Hobbs (jason-hobbs) wrote : | #21 |
Right, that crashdump came from march 2nd, before we got your updated code.
Chris MacNaughton (chris.macnaughton) wrote : | #22 |
@jhobbs then why was it posted on 22-March?
Jason Hobbs (jason-hobbs) wrote : | #23 |
Because you asked for a crashdump with load logs.
On Mon, Mar 26, 2018 at 2:17 AM, Chris MacNaughton
<email address hidden> wrote:
> @jhobbs then why was it posted on 22-March?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Error: Device is mounted
>
> Status in OpenStack ceph-osd charm:
> In Progress
>
> Bug description:
> It looks like the ceph-osd charm is trying to initialize my storage
> twice:
>
> 2018-01-29 22:35:21 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> 2018-01-29 22:36:03 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> The second time in fails with an error:
> 2018-01-29 22:36:03 DEBUG mon-relation-
>
> ceph-osd/5 log: http://
>
> bundle: http://
>
> To manage notifications about this bug go to:
> https:/
Chris MacNaughton (chris.macnaughton) wrote : | #24 |
@jason-hobbs, @alai is this bug still occurring? If so, can we get a fresh crashdump where this bug has been reproduced along with the new load information in the crashdump?
In addition to waiting on more information for this bug, there are reviews in progress to improve logging under error conditions during disk changes in addition to improving disk management in general.
Jason Hobbs (jason-hobbs) wrote : | #25 |
Chris, we've already attached a crashdump with load information. Does
that not have what you're looking for?
On Fri, Mar 30, 2018 at 10:05 AM, Chris MacNaughton
<email address hidden> wrote:
> @jason-hobbs, @alai is this bug still occurring? If so, can we get a
> fresh crashdump where this bug has been reproduced along with the new
> load information in the crashdump?
>
> In addition to waiting on more information for this bug, there are
> reviews in progress to improve logging under error conditions during
> disk changes in addition to improving disk management in general.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Error: Device is mounted
>
> Status in OpenStack ceph-osd charm:
> In Progress
>
> Bug description:
> It looks like the ceph-osd charm is trying to initialize my storage
> twice:
>
> 2018-01-29 22:35:21 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> 2018-01-29 22:36:03 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> The second time in fails with an error:
> 2018-01-29 22:36:03 DEBUG mon-relation-
>
> ceph-osd/5 log: http://
>
> bundle: http://
>
> To manage notifications about this bug go to:
> https:/
Jason Hobbs (jason-hobbs) wrote : | #26 |
https:/
load information in reproducer here
On Fri, Mar 30, 2018 at 10:12 AM, Jason Hobbs <email address hidden> wrote:
> Chris, we've already attached a crashdump with load information. Does
> that not have what you're looking for?
>
> On Fri, Mar 30, 2018 at 10:05 AM, Chris MacNaughton
> <email address hidden> wrote:
>> @jason-hobbs, @alai is this bug still occurring? If so, can we get a
>> fresh crashdump where this bug has been reproduced along with the new
>> load information in the crashdump?
>>
>> In addition to waiting on more information for this bug, there are
>> reviews in progress to improve logging under error conditions during
>> disk changes in addition to improving disk management in general.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https:/
>>
>> Title:
>> Error: Device is mounted
>>
>> Status in OpenStack ceph-osd charm:
>> In Progress
>>
>> Bug description:
>> It looks like the ceph-osd charm is trying to initialize my storage
>> twice:
>>
>> 2018-01-29 22:35:21 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
>> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
>> '/dev/sdb']
>>
>> 2018-01-29 22:36:03 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
>> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
>> '/dev/sdb']
>>
>> The second time in fails with an error:
>> 2018-01-29 22:36:03 DEBUG mon-relation-
>>
>> ceph-osd/5 log: http://
>>
>> bundle: http://
>>
>> To manage notifications about this bug go to:
>> https:/
Chris MacNaughton (chris.macnaughton) wrote : | #27 |
The reproducer linked does _not_ have the additional logging from the charm in my namespace, meaning it is still an incomplete picture.
Chris MacNaughton (chris.macnaughton) wrote : | #28 |
The change in https:/
Changed in charm-ceph-osd: | |
status: | In Progress → Fix Committed |
Ashley Lai (alai) wrote : | #29 |
- juju-crashdump-17ce8e45-053d-4eeb-a0fc-f8fbf1eea5be.tar.gz Edit (16.4 MiB, application/x-tar)
We just hit the same issue again. ceph-osd points to cs:~chris.
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
2018-04-14 06:06:05 DEBUG mon-relation-
https:/
Ashley Lai (alai) wrote : | #30 |
Ashley Lai (alai) wrote : | #31 |
It's running Xenial Queens.
Chris MacNaughton (chris.macnaughton) wrote : | #32 |
While this is an interesting crashdump, and bit in the logs, especially:
NAME="sdb" MAJ:MIN="8:16" RM="0" SIZE="931.5G" RO="0" TYPE="disk" MOUNTPOINT=""
NAME="sdb1" MAJ:MIN="8:17" RM="0" SIZE="930.5G" RO="0" TYPE="part" MOUNTPOINT=""
NAME="sdb2" MAJ:MIN="8:18" RM="0" SIZE="1G" RO="0" TYPE="part" MOUNTPOINT=""
this bug is marked fix-committed as a result of https:/
Jason Hobbs (jason-hobbs) wrote : | #33 |
Chris,
Actually, this fix for this should be backported to stable. Why hasn't it been?
Jason
On Wed, Apr 18, 2018 at 12:59 AM, Chris MacNaughton
<email address hidden> wrote:
> While this is an interesting crashdump, and bit in the logs, especially:
>
> NAME="sdb" MAJ:MIN="8:16" RM="0" SIZE="931.5G" RO="0" TYPE="disk" MOUNTPOINT=""
> NAME="sdb1" MAJ:MIN="8:17" RM="0" SIZE="930.5G" RO="0" TYPE="part" MOUNTPOINT=""
> NAME="sdb2" MAJ:MIN="8:18" RM="0" SIZE="1G" RO="0" TYPE="part" MOUNTPOINT=""
>
> this bug is marked fix-committed as a result of
> https:/
> openstack-
> go to stable after the next release where this bug is resolved
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Error: Device is mounted
>
> Status in OpenStack ceph-osd charm:
> Fix Committed
>
> Bug description:
> It looks like the ceph-osd charm is trying to initialize my storage
> twice:
>
> 2018-01-29 22:35:21 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> 2018-01-29 22:36:03 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> The second time in fails with an error:
> 2018-01-29 22:36:03 DEBUG mon-relation-
>
> ceph-osd/5 log: http://
>
> bundle: http://
>
> To manage notifications about this bug go to:
> https:/
Ryan Beisner (1chb1n) wrote : | #34 |
This is a tricky backport, as it changes the type of a config option if we do a straight backport. We will have to look further into how/if we can selectively backport a change which is also backward-compatible with current stable config types.
Chris Gregan (cgregan) wrote : | #35 |
This issue continues to plague our Pike deployments. The field will continue to deploy Pike on Xenial. Is there a reasonable workaround for this on Pike? Otherwise any deployment going up in the next month will be effected.
Jason Hobbs (jason-hobbs) wrote : | #36 |
It seems to me the fix could have been done in a backportable way if
it was planned on being backported. You could introduce a new config
option and deprecate the old one, for example, rather than changing
the existing one.
On Tue, Apr 24, 2018 at 2:11 PM, Chris Gregan
<email address hidden> wrote:
> This issue continues to plague our Pike deployments. The field will
> continue to deploy Pike on Xenial. Is there a reasonable workaround for
> this on Pike? Otherwise any deployment going up in the next month will
> be effected.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Error: Device is mounted
>
> Status in OpenStack ceph-osd charm:
> Fix Committed
>
> Bug description:
> It looks like the ceph-osd charm is trying to initialize my storage
> twice:
>
> 2018-01-29 22:35:21 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> 2018-01-29 22:36:03 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> The second time in fails with an error:
> 2018-01-29 22:36:03 DEBUG mon-relation-
>
> ceph-osd/5 log: http://
>
> bundle: http://
>
> To manage notifications about this bug go to:
> https:/
Chris MacNaughton (chris.macnaughton) wrote : | #37 |
This change, excepting the config change, has been backported in https:/
Changed in charm-ceph-osd: | |
status: | Fix Committed → Fix Released |
Chris MacNaughton (chris.macnaughton) wrote : | #38 |
When is the last time that a failure was experienced on this bug? The fix was made to the charms at master a little over 1 week ago, and backported to the stable charms 6 days ago.
If we have another occurrence of this bug with the fix linked above applied, can we get a new crashdump to evaluate where that is failing?
Jason Hobbs (jason-hobbs) wrote : | #39 |
Hey Chris, we hit it again within the last couple of days, but that's
because we were still running from your test branch
(cs:~chris.
the charms. Sorry about that!
On Wed, Apr 25, 2018 at 7:51 AM, Chris MacNaughton
<email address hidden> wrote:
> When is the last time that a failure was experienced on this bug? The
> fix was made to the charms at master a little over 1 week ago, and
> backported to the stable charms 6 days ago.
>
> If we have another occurrence of this bug with the fix linked above
> applied, can we get a new crashdump to evaluate where that is failing?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Error: Device is mounted
>
> Status in OpenStack ceph-osd charm:
> Fix Released
>
> Bug description:
> It looks like the ceph-osd charm is trying to initialize my storage
> twice:
>
> 2018-01-29 22:35:21 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> 2018-01-29 22:36:03 INFO juju-log mon:43: osdize cmd: ['ceph-disk',
> 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore',
> '/dev/sdb']
>
> The second time in fails with an error:
> 2018-01-29 22:36:03 DEBUG mon-relation-
>
> ceph-osd/5 log: http://
>
> bundle: http://
>
> To manage notifications about this bug go to:
> https:/
2018-01-29 22:36:04 ERROR juju-log mon:43: Unable to initialize device: /dev/sdb changed Traceback (most recent call last): changed File "/var/lib/ juju/agents/ unit-ceph- osd-5/charm/ hooks/mon- relation- changed" , line 559, in <module> changed hooks.execute( sys.argv) changed File "/var/lib/ juju/agents/ unit-ceph- osd-5/charm/ hooks/charmhelp ers/core/ hookenv. py", line 800, in execute changed self._hooks[ hook_name] () changed File "/var/lib/ juju/agents/ unit-ceph- osd-5/charm/ hooks/mon- relation- changed" , line 486, in mon_relation changed prepare_ disks_and_ activate( ) changed File "/var/lib/ juju/agents/ unit-ceph- osd-5/charm/ hooks/mon- relation- changed" , line 389, in prepare_ disks_and_ activate changed config( 'bluestore' )) changed File "lib/ceph/ utils.py" , line 1436, in osdize changed bluestore) changed File "lib/ceph/ utils.py" , line 1504, in osdize_dev changed subprocess. check_call( cmd) changed File "/usr/lib/ python3. 5/subprocess. py", line 581, in check_call changed raise CalledProcessEr ror(retcode, cmd) changed subprocess. CalledProcessEr ror: Command '['ceph-disk', 'prepare', '--fs-type', 'xfs', '--zap-disk', '--filestore', '/dev/sdb']' returned non-zero exit status 1 uniter. operation runhook.go:114 hook "mon-relation- changed" failed: exit status 1
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 DEBUG mon-relation-
2018-01-29 22:36:04 ERROR juju.worker.