check_for_upgrade resume logic needs improvement

Bug #1823822 reported by Trent Lloyd
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Confirmed
Undecided
Unassigned

Bug Description

During config_changed we currently get check_for_upgrade() to resume an unfinished upgrade by checking if any directories still need an ownership change.

It then re-enters the upgrade process waiting for a K/V store signal that the lost host finished it's upgrade. But we never actually confirm an upgrade was in progress in the first place. Further more the upgrade logic later is faulty because it then things its upgrading form the current version to the current version.

In Bug #1779828 we changed dirs_need_ownership_update() to ignore empty OSD directories so that it did not attempt to resume the upgrade if an OSD was not mounted or in-use. While that fixes the main issue where a hook runs before an OSD is started or an OSD is stopped this logic is not really correct.

We need to better track when an upgrade is actually started both cluster-wide and on a specific ceph-osd host. We may also skip doing the chown on an OSD if it is not mounted at the time of the upgrade.

Trent Lloyd (lathiat)
Changed in charm-ceph-osd:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.