StarlingX

Ceph health warn after backup and restore with replication=3

Bug #1789908 reported by Frank Miller on 2018-08-30

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Invalid	Medium	Daniel Badea

Bug Description

Brief Description
-----------------
The below is taken from Maria Yousaf's testing:

During backup and restore, I noticed ceph was in health warn state as follows and appears to be stuck:

[wrsroot@controller-0 scratch(keystone_admin)]$ ceph -s
    cluster 2d62cbb0-2f6c-4382-a4ea-a024c0dc166e
     health HEALTH_WARN
            555 pgs degraded
            555 pgs stuck degraded
            1536 pgs stuck unclean
            555 pgs stuck undersized
            555 pgs undersized
     monmap e1: 3 mons at {controller-0=192.168.215.103:6789/0,controller-1=192.168.215.104:6789/0,storage-0=192.168.215.105:6789/0}
            election epoch 6, quorum 0,1,2 controller-0,controller-1,storage-0
     osdmap e82: 12 osds: 12 up, 12 in; 981 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v449: 1920 pgs, 10 pools, 1588 bytes data, 1116 objects
            460 MB used, 11383 GB / 11384 GB avail
                 561 active+remapped
                 555 active+undersized+degraded
                 420 active
                 384 active+clean

ceph osd tree reports the following:

[wrsroot@controller-0 scratch(keystone_admin)]$ ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-7 8.21172 root default
-6 1.45279 host storage-2
  4 0.72639 osd.4 up 1.00000 1.00000
  5 0.72639 osd.5 up 1.00000 1.00000
-8 2.25298 host storage-3
  9 1.81749 osd.9 up 1.00000 1.00000
  8 0.43549 osd.8 up 1.00000 1.00000
-9 2.25298 host storage-5
11 1.81749 osd.11 up 1.00000 1.00000
10 0.43549 osd.10 up 1.00000 1.00000
-10 2.25298 host storage-4
  7 1.81749 osd.7 up 1.00000 1.00000
  6 0.43549 osd.6 up 1.00000 1.00000
-2 0 root cache-tier
-1 2.90558 root storage-tier
-3 2.90558 chassis group-0
-4 1.45279 host storage-0
  0 0.72639 osd.0 up 1.00000 1.00000
  1 0.72639 osd.1 up 1.00000 1.00000
-5 1.45279 host storage-1
  2 0.72639 osd.2 up 1.00000 1.00000
  3 0.72639 osd.3 up 1.00000 1.00000

Severity
--------
Major: B&R fails when using more than 2 storage nodes

Steps to Reproduce
------------------
With more than 2 storage nodes, execute a B&R

Expected Behavior
------------------
No CEPH health warning should occur

Actual Behavior
----------------
see above

Reproducibility
---------------
100% reproducible with >2 storage nodes

System Configuration
--------------------
Dedicated storage config with >2 storage nodes

Branch/Pull Time/Commit
-----------------------
Any StarlingX

Timestamp/Logs
--------------
n/a

See original description

Tags:

Frank Miller (sensfan22) on 2018-08-30

Changed in starlingx:
assignee:	nobody → Daniel Badea (daniel.badea)

Frank Miller (sensfan22) on 2018-08-30

description:

updated

Ghada Khalil (gkhalil) on 2018-08-31

tags:	added: stx.2018.10 stx.config
Changed in starlingx:
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2018-09-06:

After further investigation, it was concluded that this is not an issue in starlingx master. Issue was opened in error. Marking as Invalid based on review with Frank Miller.

Changed in starlingx:
status:	Triaged → Invalid

Ken Young (kenyis) on 2019-04-06

tags:

added: stx.1.0
removed: stx.2018.10

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.