Ceph Monitor Charm

Upgrade from Nautilus to Octopus does not restart services, leaving them running Nautilus versions

Bug #1943854 reported by Chris Johnston on 2021-09-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ceph Monitor Charm	Triaged	Medium	Unassigned
	OpenStack Ceph-FS Charm	Triaged	Medium	Unassigned

Bug Description

$ juju export-bundle >> ~/juju_export_bundle_before_ceph_upgrade.txt # [1]
$ juju status >> ~/juju_status_before_ceph_upgrade.txt # [1]
$ juju run -u ceph-mon/leader -- sudo ceph -s && juju run -u ceph-mon/leader -- sudo ceph versions
  cluster:
    id: f2b72582-1703-11ec-82f3-fa163e15a8b3
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum juju-0d931d-ck-3 (age 66m)
    mgr: juju-0d931d-ck-3(active, since 66m)
    mds: ceph-fs:1 {0=juju-0d931d-ck-1=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 65m), 3 in (since 65m)

  task status:
    scrub status:
        mds.juju-0d931d-ck-1: idle

  data:
    pools: 2 pools, 16 pgs
    objects: 22 objects, 2.2 KiB
    usage: 3.0 GiB used, 27 GiB / 30 GiB avail
    pgs: 16 active+clean

{
    "mon": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1
    },
    "mgr": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "mds": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "overall": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 8
    }
}
$ juju upgrade-charm ceph-mon --revision 58
Added charm-store charm "ceph-mon", revision 58 in channel stable, to the model
Leaving endpoints in "alpha": admin, bootstrap-source, client, cluster, mds, mon, nrpe-external-master, osd, prometheus, public, radosgw, rbd-mirror
$ juju run -u ceph-mon/leader -- sudo ceph -s && juju run -u ceph-mon/leader -- sudo ceph versions
  cluster:
    id: f2b72582-1703-11ec-82f3-fa163e15a8b3
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum juju-0d931d-ck-3 (age 77m)
    mgr: juju-0d931d-ck-3(active, since 77m)
    mds: ceph-fs:1 {0=juju-0d931d-ck-1=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 76m), 3 in (since 76m)

  task status:
    scrub status:
        mds.juju-0d931d-ck-1: idle

  data:
    pools: 2 pools, 16 pgs
    objects: 22 objects, 2.2 KiB
    usage: 3.0 GiB used, 27 GiB / 30 GiB avail
    pgs: 16 active+clean

{
    "mon": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1
    },
    "mgr": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "mds": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "overall": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 8
    }
}
$ juju config ceph-mon source=cloud:bionic-ussuri
$ juju run -u ceph-mon/leader -- sudo ceph -s && juju run -u ceph-mon/leader -- sudo ceph versions
  cluster:
    id: f2b72582-1703-11ec-82f3-fa163e15a8b3
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum juju-0d931d-ck-3 (age 85m)
    mgr: juju-0d931d-ck-3(active, since 6m)
    mds: ceph-fs:1 {0=juju-0d931d-ck-1=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 84m), 3 in (since 84m)

  task status:
    scrub status:
        mds.juju-0d931d-ck-1: idle

  data:
    pools: 2 pools, 16 pgs
    objects: 22 objects, 2.2 KiB
    usage: 3.0 GiB used, 27 GiB / 30 GiB avail
    pgs: 16 active+clean

{
    "mon": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1
    },
    "mgr": {
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "mds": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "overall": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 7,
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1
    }
}
$ juju status >> ~/juju_status_after_ceph_mon_upgrade.txt # [2]
$ juju export-bundle >> ~/juju_export_bundle_after_ceph_mon_upgrade.txt # [2]

### Note:
The 'mon' version still reports 14.2.18 where the mgr version reports 15.2.13.

[1] https://pastebin.canonical.com/p/ZWKMMRJN9F/
[2] https://pastebin.canonical.com/p/yN4SqPthwm/

Tags:

Revision history for this message

Chris Johnston (cjohnston) wrote on 2021-09-16:

Manually restarting ceph-mon causes the new version to be running:
$ juju run -u ceph-mon/0 -- sudo systemctl restart ceph-mon.target
$ juju run -u ceph-mon/leader -- sudo ceph -s && juju run -u ceph-mon/leader -- sudo ceph versions
  cluster:
    id: f2b72582-1703-11ec-82f3-fa163e15a8b3
    health: HEALTH_WARN
            client is using insecure global_id reclaim
            mon is allowing insecure global_id reclaim
            2 pools have too few placement groups

  services:
    mon: 1 daemons, quorum juju-0d931d-ck-3 (age 114s)
    mgr: juju-0d931d-ck-3(active, since 108s)
    mds: ceph-fs:1 {0=juju-0d931d-ck-1=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 99m), 3 in (since 99m)

  task status:
    scrub status:
        mds.juju-0d931d-ck-1: idle

  data:
    pools: 3 pools, 17 pgs
    objects: 22 objects, 2.7 KiB
    usage: 3.0 GiB used, 27 GiB / 30 GiB avail
    pgs: 17 active+clean

io:
client: 170 B/s wr, 0 op/s rd, 0 op/s wr

{
    "mon": {
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1
    },
    "mgr": {
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "mds": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "overall": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6,
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 2
    }
}

Revision history for this message

Chris Johnston (cjohnston) wrote on 2021-09-16:

Also seeing very similar results with ceph-fs:

$ juju status >> ~/juju_status_before_ceph_fs_upgrade.txt # [3]
$ juju export-bundle >> ~/juju_export_bundle_before_ceph_fs_upgrade.txt # [3]
$ juju upgrade-charm ceph-fs --revision 43
Added charm-store charm "ceph-fs", revision 43 in channel stable, to the model
Adding endpoint "certificates" to default space "alpha"
Leaving endpoints in "alpha": ceph-mds, public
$ juju config ceph-fs source=cloud:bionic-ussuri
$ juju run -u ceph-mon/leader -- sudo ceph -s && juju run -u ceph-mon/leader -- sudo ceph versions
  cluster:
    id: f2b72582-1703-11ec-82f3-fa163e15a8b3
    health: HEALTH_WARN
            client is using insecure global_id reclaim
            mon is allowing insecure global_id reclaim
            3 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats
            2 pools have too few placement groups

  services:
    mon: 1 daemons, quorum juju-0d931d-ck-3 (age 20m)
    mgr: juju-0d931d-ck-3(active, since 20m)
    mds: ceph-fs:1 {0=juju-0d931d-ck-0=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 9m), 3 in (since 118m)

  task status:
    scrub status:
        mds.juju-0d931d-ck-0: idle

  data:
    pools: 3 pools, 17 pgs
    objects: 22 objects, 2.7 KiB
    usage: 3.0 GiB used, 27 GiB / 30 GiB avail
    pgs: 17 active+clean

{
    "mon": {
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1
    },
    "mgr": {
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1
    },
    "osd": {
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 3
    },
    "mds": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1,
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 2
    },
    "overall": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1,
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 7
    }
}

### Note:
It looks like one of the three mds units didn't get properly restarted. This appears to be the active mds.

$ juju status >> ~/juju_status_after_ceph_fs_upgrade.txt # [4]
$ juju export-bundle >> ~/juju_export_bundle_after_ceph_fs_upgrade.txt # [4]

[3] https://pastebin.canonical.com/p/TqqGxQ2n9z/
[4] https://pastebin.canonical.com/p/CcH6PMkp8G/

Also seeing very similar results with ceph-fs:

$ juju status >> ~/juju_status_before_ceph_fs_upgrade.txt # [3]
$ juju export-bundle >> ~/juju_export_bundle_before_ceph_fs_upgrade.txt # [3]
$ juju upgrade-charm ceph-fs --revision 43
Added charm-store charm "ceph-fs", revision 43 in channel stable, to the model
Adding endpoint "certificates" to default space "alpha"
Leaving endpoints in "alpha": ceph-mds, public
$ juju config ceph-fs source=cloud:bionic-ussuri
$ juju run -u ceph-mon/leader -- sudo ceph -s && juju run -u ceph-mon/leader -- sudo ceph versions
  cluster:
    id:     f2b72582-1703-11ec-82f3-fa163e15a8b3
    health: HEALTH_WARN
            client is using insecure global_id reclaim
            mon is allowing insecure global_id reclaim
            3 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats
            2 pools have too few placement groups

task status:
    scrub status:
        mds.juju-0d931d-ck-0: idle

data:
    pools:   3 pools, 17 pgs
    objects: 22 objects, 2.7 KiB
    usage:   3.0 GiB used, 27 GiB / 30 GiB avail
    pgs:     17 active+clean

{
    "mon": {
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1
    },
    "mgr": {
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 1
    },
    "osd": {
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 3
    },
    "mds": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1,
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 2
    },
    "overall": {
        "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1,
        "ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)": 7
    }
}

### Note:
It looks like one of the three mds units didn't get properly restarted. This appears to be the active mds.

$ juju status >> ~/juju_status_after_ceph_fs_upgrade.txt # [4]
$ juju export-bundle >> ~/juju_export_bundle_after_ceph_fs_upgrade.txt # [4]

$ sudo ceph fs status
ceph-fs - 0 clients
=======
RANK  STATE         MDS            ACTIVITY     DNS    INOS
 0    active  juju-0d931d-ck-0  Reqs:    0 /s    10     13
      POOL          TYPE     USED  AVAIL
ceph-fs_metadata  metadata  1536k  8693M
  ceph-fs_data      data       0   8693M
  STANDBY MDS
juju-0d931d-ck-1
juju-0d931d-ck-2
                                     VERSION                                                    DAEMONS
ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)           juju-0d931d-ck-0
 ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)  juju-0d931d-ck-1, juju-0d931d-ck-2
 
 [3] https://pastebin.canonical.com/p/TqqGxQ2n9z/
 [4] https://pastebin.canonical.com/p/CcH6PMkp8G/

Dominique Poulain (dominique-poulain) on 2021-09-16

summary:

- Upgrade from Nautilus to Octopus does not restart services, leaving the
- versions running Nautilus versions
+ Upgrade from Nautilus to Octopus does not restart services, leaving them
+ running Nautilus versions

Aurelien Lourot (aurelien-lourot) on 2021-12-14

tags:	added: openstack-upgrade
Changed in charm-ceph-mon:
importance:	Undecided → Medium
Changed in charm-ceph-fs:
importance:	Undecided → Medium
Changed in charm-ceph-mon:
status:	New → Triaged
Changed in charm-ceph-fs:
status:	New → Triaged

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.