HPE health warn after AIO-SX system restore operation

Bug #2009227 reported by Felipe Sanches Zanoni
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Felipe Sanches Zanoni

Bug Description

Brief Description
-----------------
After (backup and) restore operation on simplex HPE system, ceph health is warn state

Severity
--------
Critical

Steps to Reproduce
------------------
- Fresh install AIO-SX with Ceph backend
- Make a backup
- Do a new install and recover the backup without wiping osd disk.

Expected Behavior
------------------
System should be restored with no alarms

Actual Behavior
----------------
System reports ceph alarms

$ ceph -s
  cluster:
    id: fb81f294-8364-4cc0-bf3c-96890e22ecf5
    health: HEALTH_WARN
            1 filesystem is degraded
            1 MDSs report slow metadata IOs
            Reduced data availability: 192 pgs inactive
            21 slow ops, oldest one blocked for 2003 sec, osd.0 has slow ops

  services:
    mon: 1 daemons, quorum controller-0 (age 53m)
    mgr: controller-0(active, since 51m)
    mds: kube-cephfs:1/1 {0=controller-0=up:replay}
    osd: 1 osds: 1 up (since 51m), 1 in (since 27h)

  data:
    pools: 3 pools, 192 pgs
    objects: 0 objects, 0 B
    usage: 7.2 GiB used, 1015 GiB / 1022 GiB avail
    pgs: 100.000% pgs unknown
             192 unknown

Reproducibility
---------------
100%

System Configuration
--------------------
AIO-SX with Ceph backend

Branch/Pull Time/Commit
-----------------------
Master

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Feature Testing

Workaround
----------
N/A

Changed in starlingx:
assignee: nobody → Felipe Sanches Zanoni (fsanches)
status: New → In Progress
Revision history for this message
Felipe Sanches Zanoni (fsanches) wrote :
Changed in starlingx:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/877174
Committed: https://opendev.org/starlingx/integ/commit/0de8da2116a46405221ac6965c1f8b77c6ad47c8
Submitter: "Zuul (22348)"
Branch: master

commit 0de8da2116a46405221ac6965c1f8b77c6ad47c8
Author: Felipe Sanches Zanoni <email address hidden>
Date: Sat Mar 11 11:47:02 2023 -0300

    Fix puppet-ceph multipath osd disk partition detection

    The puppet-ceph module is not correctly checking the OSD
    partition when it belongs to a multipath disk or any /dev/dm-X
    device.

    This fix changes the parsing string when running ceph-disk list
    command to verify osd disk is already created.

    Without multipath disk, the readlink command will return,
    for example, '/dev/sdb' for any partition of that disk.
    The output of ceph-disk is like:

    /dev/sdb :
      /dev/sdb1 ceph data, prepared, cluster ceph, osd.0, osd uuid
    e3c08a72-c755-4dec-b353-e4df4b4690c4, journal /dev/sdb2
      /dev/sdb2 ceph journal, for /dev/sdb1

    This way when grepping '/dev/sdb.*ceph data', it will detect
    the line with the partition '/dev/sdb1' with no errors.

    But with multipath disk the readlink command returns /dev/dm-X
    for disks and partitions. For example, it will return /dev/dm-6
    when using
    /dev/dm-6 :
      /dev/dm-7 ceph data, prepared, cluster ceph, osd.0, osd uuid
    e3c08a72-c755-4dec-b353-e4df4b4690c4, journal /dev/dm-8
      /dev/dm-8 ceph journal, for /dev/dm-7

    This way when grepping '/dev/dm-6.*ceph data', it will not
    detect the line with the partition /dev/dm-7.

    Test-Plan:
      PASS: Fresh install AIO-SX with ceph backend and verify ceph
            is HEALTH_OK (with multipath disks)
      PASS: Lock/Unlock controller-0 and verify ceph is HEALTH_OK
            (with multipath disks)
      PASS: Fresh install AIO-SX with ceph backend and verify ceph
            is HEALTH_OK (with regular disks)
      PASS: Lock/Unlock controller-0 and verify ceph is HEALTH_OK
            (with regular disks)

    Closes-bug: 2009227

    Signed-off-by: Felipe Sanches Zanoni <email address hidden>
    Change-Id: Iad11c803b68983ad70fb1edfce5a9acc156a10f4

Changed in starlingx:
status: Fix Committed → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.storage
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.