osd-mon charm is in error state after charm-upgrade from revision 161 up to 167

Bug #2024253 reported by Yevhenii Preobrazhenskyi
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
Medium
Unassigned

Bug Description

All three unit of osd-mon stay in error state after osd-mon charm-upgrade in channel quincy/stable from revision 161 up to 167

unit-ceph-mon-X.log files have the next rows:

2023-06-16 17:36:53 ERROR unit.ceph-mon/0.juju-log server.go:316 Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/./src/charm.py", line 308, in <module>
    main(CephMonCharm)
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops/framework.py", line 354, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops/framework.py", line 830, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops/framework.py", line 919, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops_openstack/core.py", line 258, in _on_config
    self.on_config(event)
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/./src/charm.py", line 85, in on_config
    if hooks.config_changed():
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/charmhelpers/contrib/hardening/harden.py", line 90, in _harden_inner2
    return f(*args, **kwargs)
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/src/ceph_hooks.py", line 269, in config_changed
    check_for_upgrade()
  File "/var/lib/juju/agents/unit-ceph-mon-0/charm/src/ceph_hooks.py", line 140, in check_for_upgrade
    old_version_os < new_version_os):
TypeError: '<' not supported between instances of 'NoneType' and 'NoneType'
2023-06-16 17:36:54 ERROR juju.worker.uniter.operation runhook.go:153 hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1
2023-06-16 17:36:54 INFO juju.worker.uniter resolver.go:155 awaiting error resolution for "config-changed" hook
2023-06-16 17:38:37 INFO juju.worker.uniter resolver.go:155 awaiting error resolution for "config-changed" hook

Could you please help with the issue?
Best regards,
Yevhenii

Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

Hello Yevhenii,

Can you tell me if you see in the ceph-mon logs some lines with the form:

"old_version: "
"new_version: "

and what they actually say?

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Also just for completeness sake can I ask you run `lsb_release -a` and add the output?

Revision history for this message
Yevhenii Preobrazhenskyi (melkin) wrote :

Dear Luciano,

thank you a lot for your prompt reply. Yes, I can see the next rows:

2023-06-16 15:22:50 INFO unit.ceph-mon/0.juju-log server.go:316 old_version: None
2023-06-16 15:22:50 INFO unit.ceph-mon/0.juju-log server.go:316 new_version: None

As is see the ceph cluster works stable, but I warry if this state is safe for cluster health.

Sure, Peter

Here is the command output:

root@juju-9bd73c-0-lxd-3:/var/log/juju# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy

Sorry for disturbance,

Best regards,

Yevhenii

Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

OK, I think I see the issue. Just so we can reproduce fully, we would also need the commands you used to deploy the charm and how you upgraded it.

Revision history for this message
Yevhenii Preobrazhenskyi (melkin) wrote :

I deployed charmed Openstack exactly as described in this manual: https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/zed/install-openstack.html
Just step by step.
My ceph-mon.yaml and ceph-osd.yaml was:
maasadm@maascont:~/openstack$ cat ceph-mon.yaml
ceph-mon:
  expected-osd-count: 4
  monitor-count: 3
maasadm@maascont:~/openstack$ cat ceph-osd.yaml
ceph-osd:
  osd-devices: /dev/sdb /dev/sdc /dev/sdd

And after successful deployment one month ago I tried to upgrade charm simply command 'juju upgrade-charm ceph-mon'

Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

Thanks for the info, Yevhenii, we'll see what we can do about this. FWIW, your cluster should be fine, as the charms didn't really do anything with it in the upgrade path that they took.

Changed in charm-ceph-mon:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
David Negreira (dnegreira) wrote :

I have submitted a patch [0[ that fixes this issue.

[0] - https://review.opendev.org/c/openstack/charm-ceph-mon/+/887148

Revision history for this message
David Negreira (dnegreira) wrote :

Alas, my PR was super wrong andI have cancelled it.

Yevhenii, can you also show the output of:
juju config ceph-mon source

Revision history for this message
Yevhenii Preobrazhenskyi (melkin) wrote :

Here you are:

maasadm@maascont:~$ juju config ceph-mon source
quincy

BR

Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

Setting source to "distro" can be a workaround until this is fixed.

Revision history for this message
Yevhenii Preobrazhenskyi (melkin) wrote :

Hi everyone,
I need some advice on how to proceed with the issue. Do you have some workaround already? I don't want to risk breaking anything in the production Openstack, so I've stopped updating it for now.

Thanks for your help.

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Hi Yevhenii,

we're working on a fix in
https://review.opendev.org/c/openstack/charm-ceph-mon/+/887733

Meanwhile you could try to set source=distro as Ponnuvel suggested above as a workaround.

Changed in charm-ceph-mon:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.