Comment 6 for bug 1851585

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

BUILD_ID="2019-11-06_10-52-51
yow-ironpass-18543

1st time, disk replaced with this new one without issue; the manifest applied, the host is unlocked, enabled and available and the health is OK

/dev/disk/by-path/pci-0000:04:00.0-sas-0x5001e6738bc90001-lun-0 SSD 167.68 0.0 N/A CVLT626405AL180BGN INTEL SSDSC2KW18

2nd time, disk replaced with the one that was originally removed failed.
puppet.log
2019-12-03T18:35:26.489 Notice: 2019-12-03 18:35:26 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-1]/Ceph::Osd[/dev/disk/by-path/pci-0000:04:00.0-sas-0x5001e6738bc90001-lun-0]/Exec[ceph-osd-activate-/dev/disk/by-path/pci-0000:04:00.0-sas-0x5001e6738bc90001-lun-0]/returns: mount_activate: Failed to activate
2019-12-03T18:35:26.490 Notice: 2019-12-03 18:35:26 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-1]/Ceph::Osd[/dev/disk/by-path/pci-0000:04:00.0-sas-0x5001e6738bc90001-lun-0]/Exec[ceph-osd-activate-/dev/disk/by-path/pci-0000:04:00.0-sas-0x5001e6738bc90001-lun-0]/returns: '['ceph', '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', '/var/lib/ceph/bootstrap-osd/ceph.keyring', '-i', '-', 'osd', 'new', u'c9027940-c8cd-4b9c-a73a-c7d367e6ed50']' failed with status code 17

$ system host-disk-list controller-1
+--------------------------------------+-----------+---------+---------+-------+------------+--------------+---------+----------------------------------+
| uuid | device_no | device_ | device_ | size_ | available_ | rpm | serial_ | device_path |
| | de | num | type | gib | gib | | id | |
+--------------------------------------+-----------+---------+---------+-------+------------+--------------+---------+----------------------------------+
| 33d64f76-b367-4d9c-a7a9-4aca0ccd61ec | /dev/sda | 2048 | HDD | 838. | 0.0 | Undetermined | S0N1RS0 | /dev/disk/by-path/pci-0000:04:00 |
| | | | | 362 | | | G0000B4 | .0-sas-0x5000c50076366105-lun-0 |
| | | | | | | | 44BGA1 | |
| | | | | | | | | |
| 91f0d50d-30a0-4087-a841-da223e492985 | /dev/sdb | 2064 | SSD | 223. | 0.0 | N/A | CVTR528 | /dev/disk/by-path/pci-0000:04:00 |
| | | | | 57 | | | 101E824 | .0-sas-0x5001e6738bc90001-lun-0 |
| | | | | | | | 0CGN | |
| | | | | | | | |

$ fm alarm-list
+-------+----------------------------------------------------------------------+---------------------------------------+----------+---------------+
| Alarm | Reason Text | Entity ID | Severity | Time Stamp |
| ID | | | | |
+-------+----------------------------------------------------------------------+---------------------------------------+----------+---------------+
| 200. | controller-1 is degraded due to the failure of its 'ceph (osd.0, )' | host=controller-1.process=ceph (osd.0 | major | 2019-12-03T18 |
| 006 | process. Auto recovery of this major process is in progress. | , ) | | :38:32.749026 |
| | | | | |
| 200. | controller-1 experienced a service-affecting failure. Auto-recovery | host=controller-1 | critical | 2019-12-03T18 |
| 004 | in progress. Manual Lock and Unlock may be required if auto-recovery | | | :30:25.259416 |
| | is unsuccessful. | | | |
| | | | | |
| 800. | Storage Alarm Condition: HEALTH_WARN [PGs are degraded/stuck or | cluster=0c80fb8a-d97e-411b-b6ed- | warning | 2019-12-03T18 |
| 001 | undersized]. Please check 'ceph -s' for more details. | 5192273a17c6 | | :11:36.558925 |
| | | | | |
| 800. | Loss of replication in replication group group-0: OSDs are down | cluster=0c80fb8a-d97e-411b-b6ed- | major | 2019-12-03T18 |
| 011 | | 5192273a17c6.peergroup=group-0.host= | | :10:35.973510 |
| | | controller-1 | | |
| | | | | |
| 400. | Service group directory-services loss of redundancy; expected 2 | service_domain=controller. | major | 2019-12-03T18 |
| 002 | active members but only 1 active member available | service_group=directory-services | | :10:15.111380 |
| | | | | |
| 400. | Service group web-services loss of redundancy; expected 2 active | service_domain=controller. | major | 2019-12-03T18 |
| 002 | members but only 1 active member available | service_group=web-services | | :10:14.899447 |
| | | | | |
| 400. | Service group storage-services loss of redundancy; expected 2 active | service_domain=controller. | major | 2019-12-03T18 |
| 002 | members but only 1 active member available | service_group=storage-services | | :10:14.444396 |