None is a valid upgrade path

Bug #2071875 reported by Samuel Allan
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
New
Undecided
Unassigned

Bug Description

During an openstack upgrade from focal/ussuri to focal/victoria, I observed this crash resulting in error status for the ceph-osd charm at channel octopus/stable:

```
unit-ceph-osd-0: 03:37:49 INFO unit.ceph-osd/0.juju-log old_version: octopus
unit-ceph-osd-0: 03:37:49 INFO unit.ceph-osd/0.juju-log new_version: None
unit-ceph-osd-0: 03:37:49 INFO unit.ceph-osd/0.juju-log octopus to None is a valid upgrade path. Proceeding.
unit-ceph-osd-0: 03:37:49 INFO unit.ceph-osd/0.juju-log Making dir /var/lib/charm/ceph-osd ceph:ceph 555
unit-ceph-osd-0: 03:37:49 INFO unit.ceph-osd/0.juju-log Monitor hosts are ['10.REDACTED', '10.REDACTED', '10.REDACTED']
unit-ceph-osd-0: 03:37:49 INFO unit.ceph-osd/0.juju-log AZ Info: rack=zone1
unit-ceph-osd-0: 03:37:49 INFO unit.ceph-osd/0.juju-log roll_osd_cluster called with None
unit-ceph-osd-0: 03:37:50 INFO unit.ceph-osd/0.juju-log osd_sorted_list: [<charms_ceph.utils.CrushLocation object at 0x7fc7c1dde940>, <charms_ceph.utils.CrushLocation object at 0x7fc7c1dde0a0>, <charms_ceph.utils.CrushLocation object at 0x7fc7c1dde370>, <charms_ceph.utils.CrushLocation object at 0x7fc7c1ddeaf0>, <charms_ceph.utils.CrushLocation object at 0x7fc7c206e730>]
unit-ceph-osd-0: 03:37:50 INFO unit.ceph-osd/0.juju-log upgrade position: 0
unit-ceph-osd-0: 03:37:50 INFO unit.ceph-osd/0.juju-log monitor_key_set osd_pc6a-rb3-n1_None_start 1720064270.0629272
unit-ceph-osd-0: 03:37:50 WARNING unit.ceph-osd/0.config-changed set osd_pc6a-rb3-n1_None_start
unit-ceph-osd-0: 03:37:50 INFO unit.ceph-osd/0.juju-log Rolling
unit-ceph-osd-0: 03:37:50 WARNING unit.ceph-osd/0.juju-log DEPRECATION WARNING: Function one_shot_log is being removed : Support for use of upstream ``apt_pkg`` module in conjunctionwith charm-helpers is deprecated since 2019-06-25
unit-ceph-osd-0: 03:37:50 INFO unit.ceph-osd/0.juju-log Current ceph version is 15.2
unit-ceph-osd-0: 03:37:50 INFO unit.ceph-osd/0.juju-log Upgrading to: None
unit-ceph-osd-0: 03:37:50 INFO unit.ceph-osd/0.juju-log Installing [] with options: ['--option=Dpkg::Options::=--force-confold']
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed set osd_pc6a-rb3-n1_None_alive
unit-ceph-osd-0: 03:37:52 INFO unit.ceph-osd/0.juju-log Installing ['ceph', 'gdisk', 'radosgw', 'xfsprogs', 'lvm2', 'parted', 'smartmontools', 'btrfs-progs'] with options: ['--option=Dpkg::Options::=--force-confold']
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed Traceback (most recent call last):
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/config-changed", line 907, in <module>
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed hooks.execute(sys.argv)
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/core/hookenv.py", line 962, in execute
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed self._hooks[hook_name]()
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 93, in _harden_inner2
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed return f(*args, **kwargs)
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/config-changed", line 472, in config_changed
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed check_for_upgrade()
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/config-changed", line 160, in check_for_upgrade
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed ceph.roll_osd_cluster(new_version=new_version,
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/lib/charms_ceph/utils.py", line 2633, in roll_osd_cluster
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed lock_and_roll(upgrade_key=upgrade_key,
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/lib/charms_ceph/utils.py", line 2326, in lock_and_roll
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed upgrade_osd(version, kick_function=dog.kick_the_dog)
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/lib/charms_ceph/utils.py", line 2693, in upgrade_osd
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed with maintain_all_osd_states():
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed return next(self.gen)
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/lib/charms_ceph/utils.py", line 2962, in maintain_all_osd_states
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed osd_states = get_all_osd_states()
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/lib/charms_ceph/utils.py", line 2928, in get_all_osd_states
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed for osd_num in get_local_osd_ids():
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed File "/var/lib/juju/agents/unit-ceph-osd-0/charm/lib/charms_ceph/utils.py", line 698, in get_local_osd_ids
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed osd_id = osd_dir.split('-')[1]
unit-ceph-osd-0: 03:37:52 WARNING unit.ceph-osd/0.config-changed IndexError: list index out of range
unit-ceph-osd-0: 03:37:53 ERROR juju.worker.uniter.operation hook "config-changed" (via explicit, bespoke hook script) failed: exit status 1
```

NOTE: the traceback here may be related to https://bugs.launchpad.net/charm-ceph-osd/+bug/2072920 ; the issue here is in the logs above the traceback - the references to None.

Some config must have been set to an invalid or unknown value, but the None filtered through and somehow the ceph-osd charm determined that "octopus to None" was a valid upgrade path.

Note that there should be no upgrade here, since ceph octopus is the officially supported release for both openstack ussuri and victoria.

The charm should support this transition.

description: updated
Revision history for this message
Samuel Allan (samuelallan) wrote (last edit ):

This is with the charm on channel stable/octopus, during an upgrade from ussuri to victoria.

I think the issue is:

1. the `source` config option was set to `cloud:focal-victoria`
2. octopus/stable charm channel has no information about openstack victoria --or ceph pacific--
3. `new_version` resolves to `None` on this line: https://opendev.org/openstack/charm-ceph-osd/src/commit/b4642b0e2f3c145252a3e29ff5f5a1453656abed/hooks/ceph_hooks.py#L138-L139
4. `ceph.UPGRADE_PATHS.get(old_version)` resolves to `None` on this line: https://opendev.org/openstack/charm-ceph-osd/src/commit/b4642b0e2f3c145252a3e29ff5f5a1453656abed/hooks/ceph_hooks.py#L150
5. the conditional then evaluates to true, entering the path to an invalid upgrade from 'octopus' to None.

Not sure what a fix should be? Perhaps the conditional should be improved to avoid this None == None case. Or information about upgrade paths and openstack release -> ceph release mappings updated to include victoria --and pacific--?

EDIT: pacific is actually not relevant to this; my mistake. this is about the openstack upgrade from ussuri to victoria. ceph should remain at octopus for both openstack releases.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (stable/21.10)

Fix proposed to branch: stable/21.10
Review: https://review.opendev.org/c/openstack/charm-ceph-osd/+/923538

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

The stable/pacific ceph-osd charm should have the required mapping for victoria:octopus -- you'd need to upgrade to that. The stable/21.10 branch is EOL

Revision history for this message
Samuel Allan (samuelallan) wrote :

Ok thanks for that. Is this documented somewhere? It doesn't seem intuitive that one would need to upgrade to the pacific charm while keeping ceph at octopus.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (stable/octopus)

Fix proposed to branch: stable/octopus
Review: https://review.opendev.org/c/openstack/charm-ceph-osd/+/923605

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ceph-osd (stable/21.10)

Change abandoned by "Samuel Allan <email address hidden>" on branch: stable/21.10
Review: https://review.opendev.org/c/openstack/charm-ceph-osd/+/923538
Reason: stable/21.10 is EOL

Revision history for this message
Samuel Allan (samuelallan) wrote :

Also could you confirm the difference between stable/21.10 and stable/octopus branches? They both currently point to the same revision.

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Hi Sam,

you referred to upgrading Ceph from octopus to pacific above. To do this you will need to first upgrade the charms. Charms are always backward compatible with at least the last release, ie. you can always run charm release N and Ceph release N-1

Here is some documentation around the upgrade order:

https://static.openstack.org/docs/project-deploy-guide/charm-deployment-guide/yoga/upgrade-overview.html#upgrade-order

There's no code difference between stable/21.10 and stable/octopus for the ceph-osd charm (however from a maintenance perspective the difference is that stable/octopus would receive bug fixes, while stable/21.10 is unmaintained as it's EOL).

Revision history for this message
Samuel Allan (samuelallan) wrote (last edit ):

Ooh sorry, I see what happened... let me update the issue description. This is not for upgrading ceph from octopus to pacific, but for upgrading the openstack cloud from ussuri to victoria (via changing the `source` config option). Ceph should be on octopus for both ussuri and victoria.

Please see my comment (#1) for the explanation of what is happening when upgrading openstack from ussuri to victoria with ceph on stable/octopus.

description: updated
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

For future travellers, discussed this with Sam directly.

There indeed appears to be a bug in the stable/21.10 charms (missing the mapping for focal-victoria). However, this charm release is EOL and won't receive any updates, and upgrading to it is not recommended. The next supported charm release is stable/pacific, and this release should be used for purposes of upgrading Ceph software.

Otoh, the focal-victoria UCA does not have any updates for Ceph, so updateing to this release would not have any benefits. The latest available point release for Octopus is 15.2.17 and is available via the focal-updates distro archive.

I will close this as wontfix.

Changed in charm-ceph-osd:
status: New → Won't Fix
Revision history for this message
Samuel Allan (samuelallan) wrote :

Hi Peter, I thought we discussed that stable/21.10 was EOL, but stable/octopus is still supported? Can we reopen and fix the bug in stable/octopus?

I don't know where the stable/21.10 charm is used; I don't want to address anything on that. I'm looking to get it fixed for octopus.

description: updated
Revision history for this message
Samuel Allan (samuelallan) wrote (last edit ):

I would like to re-open this bug please and request review for https://review.opendev.org/c/openstack/charm-ceph-osd/+/923605

This bug was observed on ceph-osd charm channel stable/octopus (this is not related to 21.10; that was an accident while submitting to gerrit).

The octopus charm track should support running on victoria, and the transition from openstack ussuri to victoria. The stable/octopus channel is still maintained, and octopus is the official ceph release for both ussuri and victoria.

I updated the bug report description to be clearer. :)

description: updated
Changed in charm-ceph-osd:
status: Won't Fix → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.