StarlingX

HPE health warn after AIO-SX system restore operation

Bug #2009227 reported by Felipe Sanches Zanoni on 2023-03-03

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Medium	Felipe Sanches Zanoni

Bug Description

Brief Description
-----------------
After (backup and) restore operation on simplex HPE system, ceph health is warn state

Severity
--------
Critical

Steps to Reproduce
------------------
- Fresh install AIO-SX with Ceph backend
- Make a backup
- Do a new install and recover the backup without wiping osd disk.

Expected Behavior
------------------
System should be restored with no alarms

Actual Behavior
----------------
System reports ceph alarms

$ ceph -s
  cluster:
    id: fb81f294-8364-4cc0-bf3c-96890e22ecf5
    health: HEALTH_WARN
            1 filesystem is degraded
            1 MDSs report slow metadata IOs
            Reduced data availability: 192 pgs inactive
            21 slow ops, oldest one blocked for 2003 sec, osd.0 has slow ops

  services:
    mon: 1 daemons, quorum controller-0 (age 53m)
    mgr: controller-0(active, since 51m)
    mds: kube-cephfs:1/1 {0=controller-0=up:replay}
    osd: 1 osds: 1 up (since 51m), 1 in (since 27h)

  data:
    pools: 3 pools, 192 pgs
    objects: 0 objects, 0 B
    usage: 7.2 GiB used, 1015 GiB / 1022 GiB avail
    pgs: 100.000% pgs unknown
             192 unknown

Reproducibility
---------------
100%

System Configuration
--------------------
AIO-SX with Ceph backend

Branch/Pull Time/Commit
-----------------------
Master

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Feature Testing

Workaround
----------
N/A

Tags:

Felipe Sanches Zanoni (fsanches) on 2023-03-03

Changed in starlingx:
assignee:	nobody → Felipe Sanches Zanoni (fsanches)
status:	New → In Progress

Revision history for this message

Felipe Sanches Zanoni (fsanches) wrote on 2023-03-09:

Commit: https://github.com/starlingx-staging/stx-ceph/pull/51

Changed in starlingx:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-03-13: Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/877174
Committed: https://opendev.org/starlingx/integ/commit/0de8da2116a46405221ac6965c1f8b77c6ad47c8
Submitter: "Zuul (22348)"
Branch: master

commit 0de8da2116a46405221ac6965c1f8b77c6ad47c8
Author: Felipe Sanches Zanoni <email address hidden>
Date: Sat Mar 11 11:47:02 2023 -0300

Fix puppet-ceph multipath osd disk partition detection

    The puppet-ceph module is not correctly checking the OSD
    partition when it belongs to a multipath disk or any /dev/dm-X
    device.

This fix changes the parsing string when running ceph-disk list
command to verify osd disk is already created.

    Without multipath disk, the readlink command will return,
    for example, '/dev/sdb' for any partition of that disk.
    The output of ceph-disk is like:

    /dev/sdb :
      /dev/sdb1 ceph data, prepared, cluster ceph, osd.0, osd uuid
    e3c08a72-c755-4dec-b353-e4df4b4690c4, journal /dev/sdb2
      /dev/sdb2 ceph journal, for /dev/sdb1

This way when grepping '/dev/sdb.*ceph data', it will detect
the line with the partition '/dev/sdb1' with no errors.

    But with multipath disk the readlink command returns /dev/dm-X
    for disks and partitions. For example, it will return /dev/dm-6
    when using
    /dev/dm-6 :
      /dev/dm-7 ceph data, prepared, cluster ceph, osd.0, osd uuid
    e3c08a72-c755-4dec-b353-e4df4b4690c4, journal /dev/dm-8
      /dev/dm-8 ceph journal, for /dev/dm-7

This way when grepping '/dev/dm-6.*ceph data', it will not
detect the line with the partition /dev/dm-7.

    Test-Plan:
      PASS: Fresh install AIO-SX with ceph backend and verify ceph
            is HEALTH_OK (with multipath disks)
      PASS: Lock/Unlock controller-0 and verify ceph is HEALTH_OK
            (with multipath disks)
      PASS: Fresh install AIO-SX with ceph backend and verify ceph
            is HEALTH_OK (with regular disks)
      PASS: Lock/Unlock controller-0 and verify ceph is HEALTH_OK
            (with regular disks)

Closes-bug: 2009227

Signed-off-by: Felipe Sanches Zanoni <email address hidden>
Change-Id: Iad11c803b68983ad70fb1edfce5a9acc156a10f4

Reviewed:  https://review.opendev.org/c/starlingx/integ/+/877174
Committed: https://opendev.org/starlingx/integ/commit/0de8da2116a46405221ac6965c1f8b77c6ad47c8
Submitter: "Zuul (22348)"
Branch:    master

commit 0de8da2116a46405221ac6965c1f8b77c6ad47c8
Author: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Date:   Sat Mar 11 11:47:02 2023 -0300

Fix puppet-ceph multipath osd disk partition detection
    
    The puppet-ceph module is not correctly checking the OSD
    partition when it belongs to a multipath disk or any /dev/dm-X
    device.
    
    This fix changes the parsing string when running ceph-disk list
    command to verify osd disk is already created.
    
    Without multipath disk, the readlink command will return,
    for example, '/dev/sdb' for any partition of that disk.
    The output of ceph-disk is like:
    
    /dev/sdb :
      /dev/sdb1 ceph data, prepared, cluster ceph, osd.0, osd uuid
    e3c08a72-c755-4dec-b353-e4df4b4690c4, journal /dev/sdb2
      /dev/sdb2 ceph journal, for /dev/sdb1
    
    This way when grepping '/dev/sdb.*ceph data', it will detect
    the line with the partition '/dev/sdb1' with no errors.
    
    But with multipath disk the readlink command returns /dev/dm-X
    for disks and partitions. For example, it will return /dev/dm-6
    when using
    /dev/dm-6 :
      /dev/dm-7 ceph data, prepared, cluster ceph, osd.0, osd uuid
    e3c08a72-c755-4dec-b353-e4df4b4690c4, journal /dev/dm-8
      /dev/dm-8 ceph journal, for /dev/dm-7
    
    This way when grepping '/dev/dm-6.*ceph data', it will not
    detect the line with the partition /dev/dm-7.
    
    Test-Plan:
      PASS: Fresh install AIO-SX with ceph backend and verify ceph
            is HEALTH_OK (with multipath disks)
      PASS: Lock/Unlock controller-0 and verify ceph is HEALTH_OK
            (with multipath disks)
      PASS: Fresh install AIO-SX with ceph backend and verify ceph
            is HEALTH_OK (with regular disks)
      PASS: Lock/Unlock controller-0 and verify ceph is HEALTH_OK
            (with regular disks)
    
    Closes-bug: 2009227
    
    Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
    Change-Id: Iad11c803b68983ad70fb1edfce5a9acc156a10f4

Changed in starlingx:
status:	Fix Committed → Fix Released

Ghada Khalil (gkhalil) on 2023-08-04

Changed in starlingx:
importance:	Undecided → Medium
tags:	added: stx.9.0 stx.storage

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.