Ceph OSD Charm

Hammer -> Jewel resulted in osd re-init + rocksdb corruption

Bug #1832418 reported by Dan Hill on 2019-06-11

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Ceph OSD Charm	Confirmed	High	Unassigned

Bug Description

Ceph uses a 'ready' touchfile to indicate that an OSD has already been
initialized by mkfs [0]. The ceph-osd charm overloads the 'ready' touchfile to
also indicate whether an OSD is enabled/disabled for upstart [1].

Ceph's upstart config prevents ceph-osd from being started when 'ready' doesn't
exit. However, it does not prevent calls to ceph-disk [2].

During upgrade from Hammer -> Jewel, encountered a scenario where ceph-disk was
started while the OSD 'ready' touchfile was removed from an existing OSD. This
resulted in re-initialization of the OSD. Normally re-init would be benign, but
on Jewel 10.2.11 new OSDs are rocksdb by default [3]. The OSD re-init behavior
corrupted the existing leveldb by partially converting it to rocksdb.

Recommend modifying the charm to use an upstart method to police the service.
For example, this would disable a service:
`echo manual | sudo tee /etc/init/<SERVICE>.override`

This will let ceph manage the ready touchfile and keep it tightly bound to
whether the OSD should mkfs.

[0] https://github.com/ceph/ceph/blob/jewel/src/ceph-disk/ceph_disk/main.py#L3360
[1] https://github.com/openstack/charm-ceph-osd/blob/master/lib/ceph/utils.py#L2424
[2] https://github.com/ceph/ceph/blob/jewel/src/upstart/ceph-osd-all-starter.conf#L11
[3] https://github.com/ceph/ceph/pull/18010

Tags:

Revision history for this message

Dan Hill (hillpd) wrote on 2019-06-12:

Discussed the proposed solution to use upstart's service 'override' with Felipe.
Unfortunately, this would not allow granular control of which OSDs get enabled/disabled.

Stepping back and looking at the code flow in this scenario: The only reason the
charm needs granular control to disable an OSD is because file ownership changes
may take a significantly long time to complete within the osd directory.

Some thoughts here:
* Why does the OSD need to be stopped while ownership updates are completed?
* Why aren't the OSDs disabled prior to updating code?

It seems like the flow should be:
1. Disable all OSDs
2. Update Version
3. Fix Ownership
4. Enable all OSDs
5. Restart all OSDs

The disable should be on the leading edge in case the update or ownership
changes get interrupted or fail. If we can't guarantee the OSDs are sourcing a
stable update with proper permissions, they shouldn't be allowed to start.

Revision history for this message

Dan Hill (hillpd) wrote on 2019-06-14:

Billy clarified that stopping the OSDs is required because Hammer ran ceph-osd
processes as root. Jewel fixed this security flaw by changing the user to ceph.

The proposed flow for Hammer -> Jewel when an ownership update is required:
    Disable all OSDs [0]
    Update Version
    For each OSD
        Stop OSD
        Fix Ownership
        Enable OSD [1]
        Start OSD
    Enable all OSDs [2]

[0] Systemd: disable all. Upstart: create ceph-osd-all manual override file.
[1] Systemd: enable osd. Upstart: Noop. Individual disable not supported.
[2] Systemd: Noop (already enabled). Upstart: Remove the ceph-osd-all override.

I want to clarify that this bug is tracking two issues:
1. Charm should not modify 'ready' touchfiles.
2. OSDs should be disabled on the leading edge of version updates.

I'm proposing these issues be fixed together because Trusty is not accepting any
updates. By requiring a leading-edge disable, the charm can remove reliance on
the touchfile without making any changes to the ceph upstart config files.

Alex Kavanagh (ajkavanagh) on 2019-10-28

Changed in charm-ceph-osd:
status:	New → Confirmed
importance:	Undecided → High

Alex Kavanagh (ajkavanagh) on 2019-11-08

tags:

added: ceph-upgrade

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.