1/3 nodes errors on ceph deployment with message 'ceph-osd/0*hook failed: "config-changed"

Bug #1946504 reported by Mia Altieri
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Fix Released
Undecided
Unassigned

Bug Description

While deploying ceph on Juju with an Openstack cloud via this tutorial (https://ubuntu.com/kubernetes/docs/storage#relate-to-charmed-kubernetes). One of the three nodes was not able to deploy (Command used: juju deploy -n 3 ceph-osd --storage osd-devices=32G,2 --storage osd-journals=8G,1). Instead it had a workload of error and the application had a status of error. Both had the same message which stated: hook failed: "config-changed". When juju ssh -ing into that node and looking at the log file (/var/log/juju/unit-ceph-osd-0.log) There is an error message saying the following:
2021-10-08 07:55:30 ERROR juju.worker.uniter.operation runhook.go:146 hook "config-changed" (via explicit, bespoke hook script) failed: exit status 1

2021-10-08 07:55:30 WARNING unit.ceph-osd/0.config-changed logger.go:60 ValueError: `osd-journal` and `osd-devices` options must notoverlap.
To understand this error message better you can view the source code here (https://opendev.org/openstack/charm-ceph-osd/src/commit/352d6993870be2547f37463e4e3cffc7605f749c/hooks/ceph_hooks.py) Essentially this error gets thrown when the following if statement is satisfied
   if not osd_journal.isdisjoint(set(get_devices())):
        raise ValueError('`osd-journal` and `osd-devices` options must not'
                         'overlap.')
What this error message is essentially saying is that the osd-journal and the osd-device are on the same device.

We can validate that this is indeed the case after further inspection. It seems that the osd-journal for this device is the same as the default device for osd-devices.
Default for osd-devices:
 juju config ceph-osd | grep -i -C 5 devices
Location of osd-journal:
juju run -u ceph-osd/0 storage-list
 juju run -u ceph-osd/0 'storage-get -s osd-journals/2'.

The workaround solution here is to configure the osd-device to a different device with: juju config ceph-osd osd-devices=/dev/sdb . Although you may want to check if /dev/sbd is available.

However in a more permanent solution the charm would be able to manage this during deployment and make sure that both osd-journals and osd-devices were not allocated to the same resource

Mia Altieri (miaaltieri)
description: updated
Revision history for this message
Felipe Reyes (freyes) wrote :

This is an issue with the ceph-osd charm default for osd-devices which is '/dev/vdb', when using --storage (juju storage) there is no guarantee of which device will be associated with osd-devices and which one to osd-journals.

A workaround to make this work at deploy time is to pass `--config osd-devices="" ` this will pass an empty list of device in the juju config.

We should re-evaluate if it's a good idea to use /dev/vdb as the default.

Changed in charm-ceph-osd:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (master)
Changed in charm-ceph-osd:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-osd/+/813241
Committed: https://opendev.org/openstack/charm-ceph-osd/commit/c5a2f2f7761804cf60ad494054884b15be17bf1a
Submitter: "Zuul (22348)"
Branch: master

commit c5a2f2f7761804cf60ad494054884b15be17bf1a
Author: Felipe Reyes <email address hidden>
Date: Fri Oct 8 15:35:30 2021 -0300

    Clear the default value for osd-devices

    Using /dev/vdb as default can go in conflict when using juju storage to
    attach devices dynamically as OSD and journal, because juju may be
    attaching as the first disk a volume that's meant to be used as a
    journal and making the list of devices overlap.

    Change-Id: I97c7657a82ea463aa090fc6266a4988c8c6bfeb4
    Closes-Bug: #1946504

Changed in charm-ceph-osd:
status: In Progress → Fix Committed
Changed in charm-ceph-osd:
milestone: none → 22.04
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.