2nd time, disk replaced with the one that was originally removed failed.
puppet.log
2019-12-03T18:35:26.489 [mNotice: 2019-12-03 18:35:26 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-1]/Ceph::Osd[/dev/disk/by-path/pci-0000:04:00.0-sas-0x5001e6738bc90001-lun-0]/Exec[ceph-osd-activate-/dev/disk/by-path/pci-0000:04:00.0-sas-0x5001e6738bc90001-lun-0]/returns: mount_activate: Failed to activate[0m
2019-12-03T18:35:26.490 [mNotice: 2019-12-03 18:35:26 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-1]/Ceph::Osd[/dev/disk/by-path/pci-0000:04:00.0-sas-0x5001e6738bc90001-lun-0]/Exec[ceph-osd-activate-/dev/disk/by-path/pci-0000:04:00.0-sas-0x5001e6738bc90001-lun-0]/returns: '['ceph', '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', '/var/lib/ceph/bootstrap-osd/ceph.keyring', '-i', '-', 'osd', 'new', u'c9027940-c8cd-4b9c-a73a-c7d367e6ed50']' failed with status code 17[0m
$ fm alarm-list
+-------+----------------------------------------------------------------------+---------------------------------------+----------+---------------+
| Alarm | Reason Text | Entity ID | Severity | Time Stamp |
| ID | | | | |
+-------+----------------------------------------------------------------------+---------------------------------------+----------+---------------+
| 200. | controller-1 is degraded due to the failure of its 'ceph (osd.0, )' | host=controller-1.process=ceph (osd.0 | major | 2019-12-03T18 |
| 006 | process. Auto recovery of this major process is in progress. | , ) | | :38:32.749026 |
| | | | | |
| 200. | controller-1 experienced a service-affecting failure. Auto-recovery | host=controller-1 | critical | 2019-12-03T18 |
| 004 | in progress. Manual Lock and Unlock may be required if auto-recovery | | | :30:25.259416 |
| | is unsuccessful. | | | |
| | | | | |
| 800. | Storage Alarm Condition: HEALTH_WARN [PGs are degraded/stuck or | cluster=0c80fb8a-d97e-411b-b6ed- | warning | 2019-12-03T18 |
| 001 | undersized]. Please check 'ceph -s' for more details. | 5192273a17c6 | | :11:36.558925 |
| | | | | |
| 800. | Loss of replication in replication group group-0: OSDs are down | cluster=0c80fb8a-d97e-411b-b6ed- | major | 2019-12-03T18 |
| 011 | | 5192273a17c6.peergroup=group-0.host= | | :10:35.973510 |
| | | controller-1 | | |
| | | | | |
| 400. | Service group directory-services loss of redundancy; expected 2 | service_domain=controller. | major | 2019-12-03T18 |
| 002 | active members but only 1 active member available | service_group=directory-services | | :10:15.111380 |
| | | | | |
| 400. | Service group web-services loss of redundancy; expected 2 active | service_domain=controller. | major | 2019-12-03T18 |
| 002 | members but only 1 active member available | service_group=web-services | | :10:14.899447 |
| | | | | |
| 400. | Service group storage-services loss of redundancy; expected 2 active | service_domain=controller. | major | 2019-12-03T18 |
| 002 | members but only 1 active member available | service_group=storage-services | | :10:14.444396 |
BUILD_ID= "2019-11- 06_10-52- 51
yow-ironpass-18543
1st time, disk replaced with this new one without issue; the manifest applied, the host is unlocked, enabled and available and the health is OK
/dev/disk/ by-path/ pci-0000: 04:00.0- sas-0x5001e6738 bc90001- lun-0 SSD 167.68 0.0 N/A CVLT626405AL180BGN INTEL SSDSC2KW18
2nd time, disk replaced with the one that was originally removed failed. 03T18:35: 26.489 [mNotice: 2019-12-03 18:35:26 +0000 /Stage[ main]/Platform: :Ceph:: Osds/Platform_ ceph_osd[ stor-1] /Ceph:: Osd[/dev/ disk/by- path/pci- 0000:04: 00.0-sas- 0x5001e6738bc90 001-lun- 0]/Exec[ ceph-osd- activate- /dev/disk/ by-path/ pci-0000: 04:00.0- sas-0x5001e6738 bc90001- lun-0]/ returns: mount_activate: Failed to activate[0m 03T18:35: 26.490 [mNotice: 2019-12-03 18:35:26 +0000 /Stage[ main]/Platform: :Ceph:: Osds/Platform_ ceph_osd[ stor-1] /Ceph:: Osd[/dev/ disk/by- path/pci- 0000:04: 00.0-sas- 0x5001e6738bc90 001-lun- 0]/Exec[ ceph-osd- activate- /dev/disk/ by-path/ pci-0000: 04:00.0- sas-0x5001e6738 bc90001- lun-0]/ returns: '['ceph', '--cluster', 'ceph', '--name', 'client. bootstrap- osd', '--keyring', '/var/lib/ ceph/bootstrap- osd/ceph. keyring' , '-i', '-', 'osd', 'new', u'c9027940- c8cd-4b9c- a73a-c7d367e6ed 50']' failed with status code 17[0m
puppet.log
2019-12-
2019-12-
$ system host-disk-list controller-1 ------- ------- ------- ------- ----+-- ------- --+---- -----+- ------- -+----- --+---- ------- -+----- ------- --+---- -----+- ------- ------- ------- ------- -----+ ------- ------- ------- ------- ----+-- ------- --+---- -----+- ------- -+----- --+---- ------- -+----- ------- --+---- -----+- ------- ------- ------- ------- -----+ b367-4d9c- a7a9-4aca0ccd61 ec | /dev/sda | 2048 | HDD | 838. | 0.0 | Undetermined | S0N1RS0 | /dev/disk/ by-path/ pci-0000: 04:00 | 0x5000c50076366 105-lun- 0 | 30a0-4087- a841-da223e4929 85 | /dev/sdb | 2064 | SSD | 223. | 0.0 | N/A | CVTR528 | /dev/disk/ by-path/ pci-0000: 04:00 | 0x5001e6738bc90 001-lun- 0 |
+------
| uuid | device_no | device_ | device_ | size_ | available_ | rpm | serial_ | device_path |
| | de | num | type | gib | gib | | id | |
+------
| 33d64f76-
| | | | | 362 | | | G0000B4 | .0-sas-
| | | | | | | | 44BGA1 | |
| | | | | | | | | |
| 91f0d50d-
| | | | | 57 | | | 101E824 | .0-sas-
| | | | | | | | 0CGN | |
| | | | | | | | |
$ fm alarm-list -+----- ------- ------- ------- ------- ------- ------- ------- ------- ------- --+---- ------- ------- ------- ------- ------- +------ ----+-- ------- ------+ -+----- ------- ------- ------- ------- ------- ------- ------- ------- ------- --+---- ------- ------- ------- ------- ------- +------ ----+-- ------- ------+ -1.process= ceph (osd.0 | major | 2019-12-03T18 | 0c80fb8a- d97e-411b- b6ed- | warning | 2019-12-03T18 | 0c80fb8a- d97e-411b- b6ed- | major | 2019-12-03T18 | peergroup= group-0. host= | | :10:35.973510 | domain= controller. | major | 2019-12-03T18 | group=directory -services | | :10:15.111380 | domain= controller. | major | 2019-12-03T18 | group=web- services | | :10:14.899447 | domain= controller. | major | 2019-12-03T18 | group=storage- services | | :10:14.444396 |
+------
| Alarm | Reason Text | Entity ID | Severity | Time Stamp |
| ID | | | | |
+------
| 200. | controller-1 is degraded due to the failure of its 'ceph (osd.0, )' | host=controller
| 006 | process. Auto recovery of this major process is in progress. | , ) | | :38:32.749026 |
| | | | | |
| 200. | controller-1 experienced a service-affecting failure. Auto-recovery | host=controller-1 | critical | 2019-12-03T18 |
| 004 | in progress. Manual Lock and Unlock may be required if auto-recovery | | | :30:25.259416 |
| | is unsuccessful. | | | |
| | | | | |
| 800. | Storage Alarm Condition: HEALTH_WARN [PGs are degraded/stuck or | cluster=
| 001 | undersized]. Please check 'ceph -s' for more details. | 5192273a17c6 | | :11:36.558925 |
| | | | | |
| 800. | Loss of replication in replication group group-0: OSDs are down | cluster=
| 011 | | 5192273a17c6.
| | | controller-1 | | |
| | | | | |
| 400. | Service group directory-services loss of redundancy; expected 2 | service_
| 002 | active members but only 1 active member available | service_
| | | | | |
| 400. | Service group web-services loss of redundancy; expected 2 active | service_
| 002 | members but only 1 active member available | service_
| | | | | |
| 400. | Service group storage-services loss of redundancy; expected 2 active | service_
| 002 | members but only 1 active member available | service_