Ceph health warn after backup and restore with replication=3

Bug #1789908 reported by Frank Miller
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Medium
Daniel Badea

Bug Description

Brief Description
-----------------
The below is taken from Maria Yousaf's testing:

During backup and restore, I noticed ceph was in health warn state as follows and appears to be stuck:

[wrsroot@controller-0 scratch(keystone_admin)]$ ceph -s
    cluster 2d62cbb0-2f6c-4382-a4ea-a024c0dc166e
     health HEALTH_WARN
            555 pgs degraded
            555 pgs stuck degraded
            1536 pgs stuck unclean
            555 pgs stuck undersized
            555 pgs undersized
     monmap e1: 3 mons at {controller-0=192.168.215.103:6789/0,controller-1=192.168.215.104:6789/0,storage-0=192.168.215.105:6789/0}
            election epoch 6, quorum 0,1,2 controller-0,controller-1,storage-0
     osdmap e82: 12 osds: 12 up, 12 in; 981 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v449: 1920 pgs, 10 pools, 1588 bytes data, 1116 objects
            460 MB used, 11383 GB / 11384 GB avail
                 561 active+remapped
                 555 active+undersized+degraded
                 420 active
                 384 active+clean

ceph osd tree reports the following:

[wrsroot@controller-0 scratch(keystone_admin)]$ ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -7 8.21172 root default
 -6 1.45279 host storage-2
  4 0.72639 osd.4 up 1.00000 1.00000
  5 0.72639 osd.5 up 1.00000 1.00000
 -8 2.25298 host storage-3
  9 1.81749 osd.9 up 1.00000 1.00000
  8 0.43549 osd.8 up 1.00000 1.00000
 -9 2.25298 host storage-5
 11 1.81749 osd.11 up 1.00000 1.00000
 10 0.43549 osd.10 up 1.00000 1.00000
-10 2.25298 host storage-4
  7 1.81749 osd.7 up 1.00000 1.00000
  6 0.43549 osd.6 up 1.00000 1.00000
 -2 0 root cache-tier
 -1 2.90558 root storage-tier
 -3 2.90558 chassis group-0
 -4 1.45279 host storage-0
  0 0.72639 osd.0 up 1.00000 1.00000
  1 0.72639 osd.1 up 1.00000 1.00000
 -5 1.45279 host storage-1
  2 0.72639 osd.2 up 1.00000 1.00000
  3 0.72639 osd.3 up 1.00000 1.00000

Severity
--------
Major: B&R fails when using more than 2 storage nodes

Steps to Reproduce
------------------
With more than 2 storage nodes, execute a B&R

Expected Behavior
------------------
No CEPH health warning should occur

Actual Behavior
----------------
see above

Reproducibility
---------------
100% reproducible with >2 storage nodes

System Configuration
--------------------
Dedicated storage config with >2 storage nodes

Branch/Pull Time/Commit
-----------------------
Any StarlingX

Timestamp/Logs
--------------
n/a

Frank Miller (sensfan22)
Changed in starlingx:
assignee: nobody → Daniel Badea (daniel.badea)
Frank Miller (sensfan22)
description: updated
Ghada Khalil (gkhalil)
tags: added: stx.2018.10 stx.config
Changed in starlingx:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Ghada Khalil (gkhalil) wrote :

After further investigation, it was concluded that this is not an issue in starlingx master. Issue was opened in error. Marking as Invalid based on review with Frank Miller.

Changed in starlingx:
status: Triaged → Invalid
Ken Young (kenyis)
tags: added: stx.1.0
removed: stx.2018.10
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.