mon api incompatibility during upgrade breaks upgrade

Bug #1897594 reported by Edward Hope-Morley
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
High
Edward Hope-Morley

Bug Description

The ceph-mon charm uses the mon api to coordinate upgrades (using it to get/set keys that represent upgrade state). There is a know issue in Ceph where older clients sometimes have problems reaching mons that have been upgraded and when this happens it prevents the charm from progressing with the upgrade. This is a problem because if the charm just went ahead and installed the new client it would not have this problem and the part of an upgrade that really requires coordination - the restarting of mon daemons - would be allowed to continue.

An example is the following, where 2/3 mon units are already running Nautilus and the remaining unit is still running Mimic. When the Mimic unit tries to reach the mon api to read its keys it gets:

root@juju-985145-sf00292484-test-2:~# ceph -s
Traceback (most recent call last):
  File "/usr/bin/ceph", line 1241, in <module>
    retval = main()
  File "/usr/bin/ceph", line 1165, in main
    sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
  File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
    cmd['sig'] = parse_funcsig(cmd['sig'])
  File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 728, in parse_funcsig
    raise JsonFormat(s)
ceph_argparse.JsonFormat: unknown type CephBool

To fix this we manually upgrade the client to Nautilus and that allowed the charm to proceed. So, one solution to this would be have the charm install the new apt source and upgrade packages in an uncoordinated way i.e. just do it and then use its existed rolling upgrade mechanism to coordinate mon daemon restarts.

Changed in charm-ceph-mon:
assignee: nobody → Edward Hope-Morley (hopem)
milestone: none → 20.10
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (master)

Fix proposed to branch: master
Review: https://review.opendev.org/754743

Changed in charm-ceph-mon:
status: New → In Progress
Changed in charm-ceph-mon:
importance: Undecided → High
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ceph-mon (master)

Change abandoned by James Page (<email address hidden>) on branch: master
Review: https://review.opendev.org/754743
Reason: Sync done under 754512

David Ames (thedac)
Changed in charm-ceph-mon:
milestone: 20.10 → 21.01
David Ames (thedac)
Changed in charm-ceph-mon:
milestone: 21.01 → none
Revision history for this message
David Negreira (dnegreira) wrote :

Just to ping that I have hit this same issue during an upgrade from Mimic to Nautilus, working on upgrading manually the packages as we speak to unlock the unit in error.

tags: added: openstack-upgrade series-upgrade
Changed in charm-ceph-mon:
milestone: none → 20.10
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.