[FFe] Update Ceph to new 18.2.0 (Reef) version

Bug #2033428 reported by Luciano Lo Giudice
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Fix Released
High
Unassigned
Mantic
Fix Released
High
Unassigned

Bug Description

Hello,
The release of Ceph Reef [0] was delayed and didn't happen before the feature freeze in Mantic.

Reef includes fixes and features that are important to our community and us, so we would like to include it in the Ubuntu Mantic release.

Our plan is to upload a package based on this PPA [1], corresponding to the following git branch [2] if the FFe is accepted and the package and sources pass their review.

0: https://docs.ceph.com/en/latest/releases/reef/
1: https://launchpad.net/~lmlogiudice/+archive/ubuntu/ceph-reef-mantic
2: https://git.launchpad.net/~lmlogiudice/ubuntu/+source/ceph/log/?h=mantic-reef-18.2.0

Please let us know if you need any additional details.

James Page (james-page)
Changed in ceph (Ubuntu):
importance: Undecided → High
Revision history for this message
Steve Langasek (vorlon) wrote :

Although this is a major new upstream version vs 17.2.6-0ubuntu1 currently in mantic, provided the test plan described at https://wiki.ubuntu.com/OpenStack/StableReleaseUpdates is applied here and this is landed before beta, I'm ok with this going in.

Changed in ceph (Ubuntu Mantic):
status: New → Triaged
Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

Test plan:

We start by cloning the openstack-tester repo:

git clone https://github.com/openstack-charmers/charmed-openstack-tester

Afterwards, we apply the following diff to the file "tests/distro-regression/tests/bundles/jammy-antelope.yaml":

```
diff --git a/tests/distro-regression/tests/bundles/jammy-antelope.yaml b/tests/distro-regression/tests/bundles/jammy-antelope.yaml
index 3b80ab9..4ded2a5 100644
--- a/tests/distro-regression/tests/bundles/jammy-antelope.yaml
+++ b/tests/distro-regression/tests/bundles/jammy-antelope.yaml
@@ -3,13 +3,15 @@ variables:
   openstack-origin: &openstack-origin cloud:jammy-antelope/proposed
   retrofit-uca-pocket: &retrofit-uca-pocket antelope
   openstack-channel: &openstack-channel 2023.1/edge
- ceph-channel: &ceph-channel quincy/edge
+ ceph-channel: &ceph-channel latest/edge
   ovn-channel: &ovn-channel 23.03/edge
   mysql-channel: &mysql-channel 8.0/edge
   rabbitmq-channel: &rabbitmq-channel 3.9/edge
   memcached-channel: &memcached-channel latest/edge
   vault-channel: &vault-channel 1.8/edge

+local_overlay_enabled: false
+
 series: &series jammy
 applications:
   aodh:
@@ -46,21 +48,21 @@ applications:
     num_units: 1
     charm: ch:ceph-fs
     options:
- source: *source
+ source: 'ppa:lmlogiudice/ceph-reef-jammy'
     channel: *ceph-channel
   ceph-mon:
     charm: ch:ceph-mon
     num_units: 3
     options:
       expected-osd-count: 3
- source: *source
+ source: 'ppa:lmlogiudice/ceph-reef-jammy'
     constraints: mem=1024
     channel: *ceph-channel
   ceph-osd:
     charm: ch:ceph-osd
     num_units: 3
     options:
- source: *source
+ source: 'ppa:lmlogiudice/ceph-reef-jammy'
     storage:
       osd-devices: cinder,10G
     constraints: mem=4096
```

This changes are meant to make the ceph units use the Reef packages from our PPA.

We then deploy the model by running:

`tox -e func-target -- jammy-antelope`

Eventually, the model will settle and run the tests, resulting in something like:

======
Totals
======
Ran: 179 tests in 729.1815 sec.
 - Passed: 167
 - Skipped: 12
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 841.2089 sec.

We can further check that Reef (18.2.0) is installed by running the following:

```
juju ssh ceph-mon/0
sudo ceph -v
```

The output will be something like the following:

ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable)

Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

It's been pointed out that the above test plan isn't sufficient since it uses jammy-reef instead of mantic-reef. Since deploying Mantic machines on Juju isn't supported yet, here's an alternative test plan using cephadm:

First, we create a Mantic VM using LXD.

```
# The Mantic images aren't included by default, so download it first
lxc image copy images:ubuntu/mantic local: --copy-aliases --vm
lxc launch ubuntu/23.10 admin --vm -c limits.cpu=4 -c limits.memory=16GiB
# Jump into the VM
lxc exec admin /bin/bash
```

Now that we are in the VM, we install the necessary packages and add our PPA:

```
apt install -y openssh-server software-properties-common && systemctl enable ssh --now
add-apt-repository ppa:lmlogiudice/ceph-reef-mantic
apt update
```

With the PPA in place, we now install Ceph and cephadm:

```
apt install ceph cephadm
```

Before proceeding, we verify that the Ceph version is the expected one:

```
root@admin:~# ceph -v
ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable)
```

Afterwards, we bootstrap the Ceph cluster:

`cephadm bootstrap --mon-ip 10.122.104.220 --single-host-defaults --cluster-network=10.122.104.0/24`

(Replace with your IP where necessary)

With this done, we have a Ceph cluster, but no OSD's. We can add one with the following steps:

```
touch loop.img
truncate --size 3G ./loop.img
losetup -fP ./loop.img
losetup -a
```

Assuming the loop device has been mounted in the path `/dev/loop0`, we proceed with the following:

`ceph orch daemon add osd `hostname`:/dev/loop0 raw`

And with that, we have an OSD ready, which we can verify by running:

`ceph -s`

which should show something like:

  cluster:
    id: 37ff43ae-4d08-11ee-b0f8-00163e4c7d50
    health: HEALTH_WARN
            OSD count 1 < osd_pool_default_size 2

  services:
    mon: 1 daemons, quorum admin (age 11m)
    mgr: admin.ybcvya(active, since 9m), standbys: admin.aksuhv
    osd: 1 osds: 0 up, 1 in (since 26s)

  data:
    pools: 0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage: 0 B used, 0 B / 0 B avail
    pgs:

Revision history for this message
Steve Langasek (vorlon) wrote :

the alternative test plan looks fine, thanks.

Revision history for this message
Steve Langasek (vorlon) wrote :

--- ceph-17.2.6/debian/cephadm.postinst 2023-05-15 13:39:45.000000000 +0000
+++ ceph-18.2.0/debian/cephadm.postinst 2023-09-21 08:45:43.000000000 +0000
@@ -25,7 +25,7 @@
        # 1. create user if not existing
        if ! getent passwd | grep -q "^cephadm:"; then
          echo -n "Adding system user cephadm.."
- adduser --quiet --system --disabled-password --gecos 'Ceph-dameon user for cephadm' --shell /bin/bash cephadm 2>/dev/null || true
+ adduser --quiet --system --disabled-password --home /home/cephadm --gecos 'Ceph-dameon user for cephadm' --shell /bin/bash cephadm 2>/dev/null || true
          echo "..done"
        fi

Please do not use paths under /home as home directories for system users.

Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote (last edit ):

Alright, that can be done. However, the rest of the script [1] assumes that `/home/cephadm` is created after that command. Is it OK to create the directory, but _not_ through the `adduser` command? Or should every path in the script be modified so it refers to somewhere else that's acceptable?

To give a bit more context: The change was introduced because the post-installation of cephadm would otherwise fail, and every invocation to `dpkg` would continously warn that `cephadm` was broken.

[1]: https://git.launchpad.net/~lmlogiudice/ubuntu/+source/ceph/tree/debian/cephadm.postinst?h=mantic-reef-18.2.0#n41

Revision history for this message
Steve Langasek (vorlon) wrote :

The script should be modified to use a more appropriate path for a system user (/var/lib/cephadm?)

BTW it appears this update has also added a dependency on thrift, which is not in main. https://ubuntu-archive-team.ubuntu.com/component-mismatches-proposed.svg

thrift would need to go through the MainInclusionProcess to get into main. And fixes would be needed to not pull all of qt into main with it.

Revision history for this message
James Page (james-page) wrote :

Putting thrift through the MIR process so late in cycle is not feasible so we need to figure out what's pulled that in and disable those features.

I missed the switch to /home/cephadm in the maintainer scripts - will get that resolved in the next upload as well.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 18.2.0-0ubuntu2

---------------
ceph (18.2.0-0ubuntu2) mantic; urgency=medium

  * d/control,rules: Disable Jaeger support to avoid dependency on Thrift
    which is not in Ubuntu main.
  * d/cephadm.postinst: Switch back to default location for system user
    cephadm and update all references to /home -> /var/lib.

 -- James Page <email address hidden> Tue, 26 Sep 2023 16:25:14 +0100

Changed in ceph (Ubuntu Mantic):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.