stx-openstack: Ceph-related alarm is triggered after app is applied

Bug #2021887 reported by Luan Nunes Utimura
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Luan Nunes Utimura

Bug Description

Brief Description
-----------------
After applying stx-openstack and creating images/volumes, it has been observed that a Ceph alarm is triggered shortly afterwards, due to pools not being associated with the applications using them.

Severity
--------
Major.

Steps to Reproduce
------------------
1) Upload/apply stx-openstack;
2) Create images/volumes;
3) Verify that a Ceph alarm was triggered.

Expected Behavior
------------------
Ceph should be healthy before/after the app is applied and the images/volumes are created.

Actual Behavior
----------------
Ceph is unhealthy after the app is applied and the images/volumes are created.

Reproducibility
---------------
Reproducible.

System Configuration
--------------------
AIO-SX, but should be observable in all configurations.

Branch/Pull Time/Commit
-----------------------
Branch `master`.

Last Pass
---------
N/A.

Timestamp/Logs
--------------

sysadmin@controller-0:~$ fm alarm-list
+--------------------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------+----------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Management Affecting | Severity | Time Stamp |
+--------------------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------+----------------------+----------+----------------------------+
| <uuid> | 800.001 | Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details. | cluster=<cluster-uuid> | True | warning | 2023-05-11T07:05:29.285634 |
+--------------------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------+----------------------+----------+----------------------------+

sysadmin@controller-0:~$ ceph -s
  cluster:
    id: <uuid>
    health: HEALTH_WARN
            application not enabled on 2 pool(s)

  services:
    mon: 3 daemons, quorum controller-0,controller-1,compute-0 (age 12h)
    mgr: controller-0(active, since 20h), standbys: controller-1
    mds: kube-cephfs:1 {0=controller-0=up:active} 2 up:standby
    osd: 2 osds: 2 up (since 20h), 2 in (since 22h)

  data:
    pools: 7 pools, 704 pgs
    objects: 634 objects, 2.5 GiB
    usage: 28 GiB used, 3.2 TiB / 3.3 TiB avail
    pgs: 704 active+clean

  io:
    client: 440 KiB/s wr, 0 op/s rd, 49 op/s wr

Test Activity
-------------
Developer Testing.

Workaround
----------
To work around this, one must manually enable the applications on pools:

* ceph osd pool application enable [...];

(Check the output of `ceph health detail` for more information)

description: updated
Changed in starlingx:
assignee: nobody → Luan Nunes Utimura (lutimura)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-armada-app (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
Luan Nunes Utimura (lutimura) wrote :

With the proposed fix, the `storage-init` jobs are successfully associating pools with their corresponding applications, e.g., Cinder's `storage-init` job:

```
[...]
+ CEPH_RELEASE_NAME=nautilus
+ CEPH_RELEASES_PRIOR_TO_LUMINOUS=(kraken jewel infernalis hammer giant firefly emperor dumpling)
+ [[ Development -eq Development ]]
+ [[ kraken jewel infernalis hammer giant firefly emperor dumpling =~ nautilus ]]
+ ceph osd pool application enable cinder-volumes cinder-volumes
enabled application 'cinder-volumes' on pool 'cinder-volumes'
[...]
```

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/openstack-armada-app/+/884762
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/b70da7c6217c8ce5d5e72566fd399808eb1a86b7
Submitter: "Zuul (22348)"
Branch: master

commit b70da7c6217c8ce5d5e72566fd399808eb1a86b7
Author: Luan Nunes Utimura <email address hidden>
Date: Tue May 30 14:56:41 2023 -0300

    Support ceph dev version during pool creation

    It has been observed that, right after applying stx-openstack, an alarm
    related to Ceph is being triggered by the platform due to pools not
    being associated with the applications using them.

    According the official documentation, [1] and [2], the pool/application
    association is mandatory for Ceph releases equal to or greater than the
    Luminous release (12.2.13).

    In theory, this is already handled in openstack-helm's helm charts, such
    as in Cinder's `storage-init` job [3]. One can even see that it only
    performs the association for Ceph major versions >= 12, which matches
    the official documentation's requirements.

    However, the problem is that the code in [3] assumes that `ceph mgr
    versions` will always report a numeric version, e.g.:

    - `ceph version 14.2.15-2-g7407245e7b \
        (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)`

    The problem is that this is not always the case, especially after the
    platform was migrated to Debian, which Ceph started to report:

    - `ceph version Development (no_version) nautilus (stable)`

    As a result, version checks like the one done in [3] are failing and
    consequently pools are being created without the necessary associations.

    Therefore, this change updates the storage init scripts for Cinder and
    Glance to account for the scenario where a development version of Ceph
    is used.

    [1] https://docs.ceph.com/en/latest/rados/operations/pools/#create-a-pool
    [2] https://docs.ceph.com/en/latest/rados/operations/pools/#associate-pool-to-application
    [3] https://opendev.org/openstack/openstack-helm/src/commit/7803000a545687ec40b0ddc41d46a6b377dea45f/cinder/templates/bin/_storage-init.sh.tpl#L32-L34

    Test Plan:
    PASS - Build openstack-helm package
    PASS - Build stx-openstack-helm-fluxcd package
    PASS - Build stx-openstack helm charts
    PASS - Upload/apply stx-openstack
    PASS - Verify that Ceph is healthy

    Closes-Bug: 2021887

    Change-Id: I11291f220cb15fe616fc5e555c69f872254cf2c9
    Signed-off-by: Luan Nunes Utimura <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.distro.openstack
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.