2018-08-30 14:46:46 |
Frank Miller |
description |
Brief Description
-----------------
During backup and restore, I noticed ceph was in health warn state as follows and appears to be stuck:
[wrsroot@controller-0 scratch(keystone_admin)]$ ceph -s
cluster 2d62cbb0-2f6c-4382-a4ea-a024c0dc166e
health HEALTH_WARN
555 pgs degraded
555 pgs stuck degraded
1536 pgs stuck unclean
555 pgs stuck undersized
555 pgs undersized
monmap e1: 3 mons at {controller-0=192.168.215.103:6789/0,controller-1=192.168.215.104:6789/0,storage-0=192.168.215.105:6789/0}
election epoch 6, quorum 0,1,2 controller-0,controller-1,storage-0
osdmap e82: 12 osds: 12 up, 12 in; 981 remapped pgs
flags sortbitwise,require_jewel_osds
pgmap v449: 1920 pgs, 10 pools, 1588 bytes data, 1116 objects
460 MB used, 11383 GB / 11384 GB avail
561 active+remapped
555 active+undersized+degraded
420 active
384 active+clean
ceph osd tree reports the following:
[wrsroot@controller-0 scratch(keystone_admin)]$ ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-7 8.21172 root default
-6 1.45279 host storage-2
4 0.72639 osd.4 up 1.00000 1.00000
5 0.72639 osd.5 up 1.00000 1.00000
-8 2.25298 host storage-3
9 1.81749 osd.9 up 1.00000 1.00000
8 0.43549 osd.8 up 1.00000 1.00000
-9 2.25298 host storage-5
11 1.81749 osd.11 up 1.00000 1.00000
10 0.43549 osd.10 up 1.00000 1.00000
-10 2.25298 host storage-4
7 1.81749 osd.7 up 1.00000 1.00000
6 0.43549 osd.6 up 1.00000 1.00000
-2 0 root cache-tier
-1 2.90558 root storage-tier
-3 2.90558 chassis group-0
-4 1.45279 host storage-0
0 0.72639 osd.0 up 1.00000 1.00000
1 0.72639 osd.1 up 1.00000 1.00000
-5 1.45279 host storage-1
2 0.72639 osd.2 up 1.00000 1.00000
3 0.72639 osd.3 up 1.00000 1.00000
Severity
--------
Major: B&R fails when using more than 2 storage nodes
Steps to Reproduce
------------------
With more than 2 storage nodes, execute a B&R
Expected Behavior
------------------
No CEPH health warning should occur
Actual Behavior
----------------
see above
Reproducibility
---------------
100% reproducible with >2 storage nodes
System Configuration
--------------------
Dedicated storage config with >2 storage nodes
Branch/Pull Time/Commit
-----------------------
Any StarlingX
Timestamp/Logs
--------------
n/a |
Brief Description
-----------------
The below is taken from Maria Yousaf's testing:
During backup and restore, I noticed ceph was in health warn state as follows and appears to be stuck:
[wrsroot@controller-0 scratch(keystone_admin)]$ ceph -s
cluster 2d62cbb0-2f6c-4382-a4ea-a024c0dc166e
health HEALTH_WARN
555 pgs degraded
555 pgs stuck degraded
1536 pgs stuck unclean
555 pgs stuck undersized
555 pgs undersized
monmap e1: 3 mons at {controller-0=192.168.215.103:6789/0,controller-1=192.168.215.104:6789/0,storage-0=192.168.215.105:6789/0}
election epoch 6, quorum 0,1,2 controller-0,controller-1,storage-0
osdmap e82: 12 osds: 12 up, 12 in; 981 remapped pgs
flags sortbitwise,require_jewel_osds
pgmap v449: 1920 pgs, 10 pools, 1588 bytes data, 1116 objects
460 MB used, 11383 GB / 11384 GB avail
561 active+remapped
555 active+undersized+degraded
420 active
384 active+clean
ceph osd tree reports the following:
[wrsroot@controller-0 scratch(keystone_admin)]$ ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-7 8.21172 root default
-6 1.45279 host storage-2
4 0.72639 osd.4 up 1.00000 1.00000
5 0.72639 osd.5 up 1.00000 1.00000
-8 2.25298 host storage-3
9 1.81749 osd.9 up 1.00000 1.00000
8 0.43549 osd.8 up 1.00000 1.00000
-9 2.25298 host storage-5
11 1.81749 osd.11 up 1.00000 1.00000
10 0.43549 osd.10 up 1.00000 1.00000
-10 2.25298 host storage-4
7 1.81749 osd.7 up 1.00000 1.00000
6 0.43549 osd.6 up 1.00000 1.00000
-2 0 root cache-tier
-1 2.90558 root storage-tier
-3 2.90558 chassis group-0
-4 1.45279 host storage-0
0 0.72639 osd.0 up 1.00000 1.00000
1 0.72639 osd.1 up 1.00000 1.00000
-5 1.45279 host storage-1
2 0.72639 osd.2 up 1.00000 1.00000
3 0.72639 osd.3 up 1.00000 1.00000
Severity
--------
Major: B&R fails when using more than 2 storage nodes
Steps to Reproduce
------------------
With more than 2 storage nodes, execute a B&R
Expected Behavior
------------------
No CEPH health warning should occur
Actual Behavior
----------------
see above
Reproducibility
---------------
100% reproducible with >2 storage nodes
System Configuration
--------------------
Dedicated storage config with >2 storage nodes
Branch/Pull Time/Commit
-----------------------
Any StarlingX
Timestamp/Logs
--------------
n/a |
|