Failed to upgrade ceph-mon from pacific (cloud:focal-xena) to quincy (cloud:focal-yoga)

Bug #2040327 reported by Gabriel Cocenza
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
In Progress
Medium
Unassigned
Ceph RADOS Gateway Charm
New
Undecided
Unassigned

Bug Description

This bug looks like with #2007976, but it's a little bit different.

I tried to upgrade ceph-mon from pacific to quincy and the following problem happened:

2023-10-23 20:45:26 DEBUG juju.worker.uniter agent.go:22 [AGENT-STATUS] executing: running config-changed hook
2023-10-23 20:45:26 DEBUG juju.worker.uniter.runner runner.go:728 starting jujuc server {unix @/var/lib/juju/agents/unit-ceph-mon-0/agent.socket <nil>}
2023-10-23 20:45:26 DEBUG unit.ceph-mon/0.juju-log server.go:316 Hardening function 'config_changed'
2023-10-23 20:45:26 DEBUG unit.ceph-mon/0.juju-log server.go:316 No hardening applied to 'config_changed'
2023-10-23 20:45:26 INFO unit.ceph-mon/0.juju-log server.go:316 old_version: pacific
2023-10-23 20:45:26 INFO unit.ceph-mon/0.juju-log server.go:316 new_version: pacific
2023-10-23 20:45:26 ERROR unit.ceph-mon/0.juju-log server.go:316 Invalid upgrade path from pacific to pacific. Valid paths are: ['firefly -> hammer', 'hammer -> jewel', 'jewel -> luminous', 'luminous -> mimic', 'mimic -> nautilus', 'nautilus -> octopus', 'octopus -> pacific', 'pacific -> quincy']
2023-10-23 20:45:26 INFO unit.ceph-mon/0.juju-log server.go:316 Monitor hosts are ['10.18.4.21', '10.18.4.23', '10.18.4.26']
2023-10-23 20:45:26 DEBUG unit.ceph-mon/0.juju-log server.go:316 Updating sysctl_file: /etc/sysctl.d/50-ceph-charm.conf values: {'kernel.pid_max': 2097152, 'vm.max_map_count': 524288, 'kernel.threads-max': 2097152}
2023-10-23 20:45:26 WARNING unit.ceph-mon/0.config-changed logger.go:60 sysctl: permission denied on key "kernel.pid_max", ignoring
2023-10-23 20:45:26 WARNING unit.ceph-mon/0.config-changed logger.go:60 sysctl: permission denied on key "vm.max_map_count", ignoring
2023-10-23 20:45:26 WARNING unit.ceph-mon/0.config-changed logger.go:60 sysctl: permission denied on key "kernel.threads-max", ignoring
2023-10-23 20:45:26 INFO unit.ceph-mon/0.juju-log server.go:316 Installing ['python-dbus', 'lockfile-progs'] with options: ['--option=Dpkg::Options::=--force-confold']
2023-10-23 20:45:26 DEBUG unit.ceph-mon/0.config-changed logger.go:60 Reading package lists...
2023-10-23 20:45:27 DEBUG unit.ceph-mon/0.config-changed logger.go:60 Building dependency tree...
2023-10-23 20:45:27 DEBUG unit.ceph-mon/0.config-changed logger.go:60 Reading state information...
2023-10-23 20:45:27 DEBUG unit.ceph-mon/0.config-changed logger.go:60 lockfile-progs is already the newest version (0.1.18)

As the logs shows, for some reason both old and new version are as pacific which does not make sense and the ceph packages didn't get upgrade.

The workaround was setting back the source to cloud:focal-xena and then again to cloud:focal-yoga. In the second time the new version was correctly pointed to quincy and the packages upgraded

Revision history for this message
Alan Baghumian (alanbach) wrote :

I was also upgrading from Focal/Xena to Focal/Yoga and encountered a very similar issue.

Partial logs:

2023-08-18 22:42:50 WARNING unit.ceph-mon-nvme/8.juju-log server.go:316 0 containers are present in metadata.yaml and refresh_event was not specified. Defaulting to update_status. Metrics IP may not be set in a timely fashion.
2023-08-18 22:42:50 INFO unit.ceph-mon-nvme/8.juju-log server.go:316 old_version: quincy
2023-08-18 22:42:50 INFO unit.ceph-mon-nvme/8.juju-log server.go:316 new_version: quincy
2023-08-18 22:42:50 ERROR unit.ceph-mon-nvme/8.juju-log server.go:316 Invalid upgrade path from quincy to quincy. Valid paths are: ['firefly -> hammer', 'hammer -> jewel', 'jewel -> luminous', 'luminous -> mimic', 'mimic -> nautilus', 'nautilus -> octopus', 'octopus -> pacific', 'pacific -> quincy']
2023-08-18 22:42:50 INFO unit.ceph-mon-nvme/8.juju-log server.go:316 Monitor hosts are ['10.1.15.140', '10.1.15.199', '10.1.15.222']

I'm attaching a more verbose version here.

Please let me know if you need anything else.

Best,
Alan

Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

Just to be sure: Have you performed ceph-osd upgrades before running the upgrade path for ceph-mon? The procedure needs to be in that order for it to work properly.

Revision history for this message
Alan Baghumian (alanbach) wrote :

The upgrade path according to our official Ceph upgrade KB is:

MON > OSD > RGW and that is what I followed.

Looking at the upstream Charmed OpenStack upgrade guide, I see the same order.

Best,
Alan

(1) https://docs.openstack.org/charm-guide/latest/admin/upgrades/openstack.html

Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

Sorry, I meant setting the `require-osd-release` option, not the charm itself. The documentation has been recently updated to reflect that. See: https://docs.openstack.org/charm-guide/latest/project/issues/upgrade-issues.html#ceph-require-osd-release

Revision history for this message
Alan Baghumian (alanbach) wrote (last edit ):
Download full text (4.8 KiB)

@Luciano Thank you very much for the link, I'd use this to update the KB, but looking deeper I did also notice these traces:

2023-08-18 22:40:40 INFO unit.ceph-mon-nvme/8.juju-log server.go:316 old_version: None
2023-08-18 22:40:40 INFO unit.ceph-mon-nvme/8.juju-log server.go:316 new_version: None
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 Traceback (most recent call last):
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-mon-nvme-8/charm/hooks/config-changed", line 1351, in <module>
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 hooks.execute(sys.argv)
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-mon-nvme-8/charm/hooks/charmhelpers/core/hookenv.py", line 962, in execute
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 self._hooks[hook_name]()
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-mon-nvme-8/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 93, in _harden_inner2
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 return f(*args, **kwargs)
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-mon-nvme-8/charm/hooks/config-changed", line 248, in config_changed
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 check_for_upgrade()
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-mon-nvme-8/charm/hooks/config-changed", line 144, in check_for_upgrade
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 old_version_os < new_version_os):
2023-08-18 22:40:41 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 TypeError: '<' not supported between instances of 'str' and 'NoneType'
2023-08-18 22:40:41 ERROR juju.worker.uniter.operation runhook.go:140 hook "config-changed" (via explicit, bespoke hook script) failed: exit status 1
2023-08-18 22:40:41 INFO juju.worker.uniter resolver.go:145 awaiting error resolution for "config-changed" hook
2023-08-18 22:40:46 INFO juju.worker.uniter resolver.go:145 awaiting error resolution for "config-changed" hook
2023-08-18 22:40:46 INFO unit.ceph-mon-nvme/8.juju-log server.go:316 old_version: None
2023-08-18 22:40:46 INFO unit.ceph-mon-nvme/8.juju-log server.go:316 new_version: None
2023-08-18 22:40:46 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 Traceback (most recent call last):
2023-08-18 22:40:46 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-mon-nvme-8/charm/hooks/config-changed", line 1351, in <module>
2023-08-18 22:40:46 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 hooks.execute(sys.argv)
2023-08-18 22:40:46 WARNING unit.ceph-mon-nvme/8.config-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-mon-nvme-8/charm/hooks/charmhelpers/core/hookenv.py", line 962, in execute
2023-08-18 22:40:46 W...

Read more...

Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :

> Just to be sure: Have you performed ceph-osd upgrades before running the upgrade path for ceph-mon? The procedure needs to be in that order for it to work properly.

I do believe so. In my case I'm performing the pacific to quincy upgrade, so by the documentation, we should start from ceph-mon and require_osd_release should be at pacific:

> sudo ceph osd dump | grep require_osd_release
require_osd_release pacific

Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :

This also happened from octopus to pacific.

https://pastebin.canonical.com/p/sNvN4Ks9Z2/

The workaround is the same, pointing the source to (N-1) and then to (N) again. In this case from cloud:focal-wallaby to cloud:focal-victoria and then change again to cloud:focal-wallaby

Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

Managed to reproduce on a fresh deployment. The fix appears to be the same as was applied to the main branch, so I'm backporting it.

Changed in charm-ceph-mon:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (stable/pacific)

Fix proposed to branch: stable/pacific
Review: https://review.opendev.org/c/openstack/charm-ceph-mon/+/919940

Changed in charm-ceph-mon:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (stable/pacific)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/919940
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/6bf91d11d78777e7d3ea867fcfef0cd2c4c68e41
Submitter: "Zuul (22348)"
Branch: stable/pacific

commit 6bf91d11d78777e7d3ea867fcfef0cd2c4c68e41
Author: Luciano Lo Giudice <email address hidden>
Date: Fri May 17 08:28:58 2024 -0300

    Fix upgrade path for Ceph on pacific

    Change-Id: I5032f43e521e64e9e0d42cc51c3f692bb3c11eb6
    Closes-Bug: 2040327

tags: added: in-stable-pacific
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.