StarlingX

Bug #1789908
Activity log

Activity log for bug #1789908

Date	Who	What changed	Old value	New value	Message
2018-08-30 14:04:20	Frank Miller	bug			added bug
2018-08-30 14:04:59	Frank Miller	starlingx: assignee		Daniel Badea (daniel.badea)
2018-08-30 14:46:46	Frank Miller	description	Brief Description ----------------- During backup and restore, I noticed ceph was in health warn state as follows and appears to be stuck: [wrsroot@controller-0 scratch(keystone_admin)]$ ceph -s cluster 2d62cbb0-2f6c-4382-a4ea-a024c0dc166e health HEALTH_WARN 555 pgs degraded 555 pgs stuck degraded 1536 pgs stuck unclean 555 pgs stuck undersized 555 pgs undersized monmap e1: 3 mons at {controller-0=192.168.215.103:6789/0,controller-1=192.168.215.104:6789/0,storage-0=192.168.215.105:6789/0} election epoch 6, quorum 0,1,2 controller-0,controller-1,storage-0 osdmap e82: 12 osds: 12 up, 12 in; 981 remapped pgs flags sortbitwise,require_jewel_osds pgmap v449: 1920 pgs, 10 pools, 1588 bytes data, 1116 objects 460 MB used, 11383 GB / 11384 GB avail 561 active+remapped 555 active+undersized+degraded 420 active 384 active+clean ceph osd tree reports the following: [wrsroot@controller-0 scratch(keystone_admin)]$ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -7 8.21172 root default -6 1.45279 host storage-2 4 0.72639 osd.4 up 1.00000 1.00000 5 0.72639 osd.5 up 1.00000 1.00000 -8 2.25298 host storage-3 9 1.81749 osd.9 up 1.00000 1.00000 8 0.43549 osd.8 up 1.00000 1.00000 -9 2.25298 host storage-5 11 1.81749 osd.11 up 1.00000 1.00000 10 0.43549 osd.10 up 1.00000 1.00000 -10 2.25298 host storage-4 7 1.81749 osd.7 up 1.00000 1.00000 6 0.43549 osd.6 up 1.00000 1.00000 -2 0 root cache-tier -1 2.90558 root storage-tier -3 2.90558 chassis group-0 -4 1.45279 host storage-0 0 0.72639 osd.0 up 1.00000 1.00000 1 0.72639 osd.1 up 1.00000 1.00000 -5 1.45279 host storage-1 2 0.72639 osd.2 up 1.00000 1.00000 3 0.72639 osd.3 up 1.00000 1.00000 Severity -------- Major: B&R fails when using more than 2 storage nodes Steps to Reproduce ------------------ With more than 2 storage nodes, execute a B&R Expected Behavior ------------------ No CEPH health warning should occur Actual Behavior ---------------- see above Reproducibility --------------- 100% reproducible with >2 storage nodes System Configuration -------------------- Dedicated storage config with >2 storage nodes Branch/Pull Time/Commit ----------------------- Any StarlingX Timestamp/Logs -------------- n/a	Brief Description ----------------- The below is taken from Maria Yousaf's testing: During backup and restore, I noticed ceph was in health warn state as follows and appears to be stuck: [wrsroot@controller-0 scratch(keystone_admin)]$ ceph -s cluster 2d62cbb0-2f6c-4382-a4ea-a024c0dc166e health HEALTH_WARN 555 pgs degraded 555 pgs stuck degraded 1536 pgs stuck unclean 555 pgs stuck undersized 555 pgs undersized monmap e1: 3 mons at {controller-0=192.168.215.103:6789/0,controller-1=192.168.215.104:6789/0,storage-0=192.168.215.105:6789/0} election epoch 6, quorum 0,1,2 controller-0,controller-1,storage-0 osdmap e82: 12 osds: 12 up, 12 in; 981 remapped pgs flags sortbitwise,require_jewel_osds pgmap v449: 1920 pgs, 10 pools, 1588 bytes data, 1116 objects 460 MB used, 11383 GB / 11384 GB avail 561 active+remapped 555 active+undersized+degraded 420 active 384 active+clean ceph osd tree reports the following: [wrsroot@controller-0 scratch(keystone_admin)]$ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -7 8.21172 root default -6 1.45279 host storage-2 4 0.72639 osd.4 up 1.00000 1.00000 5 0.72639 osd.5 up 1.00000 1.00000 -8 2.25298 host storage-3 9 1.81749 osd.9 up 1.00000 1.00000 8 0.43549 osd.8 up 1.00000 1.00000 -9 2.25298 host storage-5 11 1.81749 osd.11 up 1.00000 1.00000 10 0.43549 osd.10 up 1.00000 1.00000 -10 2.25298 host storage-4 7 1.81749 osd.7 up 1.00000 1.00000 6 0.43549 osd.6 up 1.00000 1.00000 -2 0 root cache-tier -1 2.90558 root storage-tier -3 2.90558 chassis group-0 -4 1.45279 host storage-0 0 0.72639 osd.0 up 1.00000 1.00000 1 0.72639 osd.1 up 1.00000 1.00000 -5 1.45279 host storage-1 2 0.72639 osd.2 up 1.00000 1.00000 3 0.72639 osd.3 up 1.00000 1.00000 Severity -------- Major: B&R fails when using more than 2 storage nodes Steps to Reproduce ------------------ With more than 2 storage nodes, execute a B&R Expected Behavior ------------------ No CEPH health warning should occur Actual Behavior ---------------- see above Reproducibility --------------- 100% reproducible with >2 storage nodes System Configuration -------------------- Dedicated storage config with >2 storage nodes Branch/Pull Time/Commit ----------------------- Any StarlingX Timestamp/Logs -------------- n/a
2018-08-31 19:12:25	Ghada Khalil	tags		stx.2018.10 stx.config
2018-08-31 19:12:37	Ghada Khalil	starlingx: status	New	Triaged
2018-08-31 19:13:16	Ghada Khalil	starlingx: importance	Undecided	Medium
2018-09-06 21:18:11	Ghada Khalil	starlingx: status	Triaged	Invalid
2019-04-06 16:14:38	Ken Young	tags	stx.2018.10 stx.config	stx.1.0 stx.config