Failed to reinstall controller on AIO-DX system

Bug #1860165 reported by David Sullivan on 2020-01-17
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ovidiu Poncea

Bug Description

Brief Description
Controller-0 enters a failed state after performing a host-reinstall on AIO-DX. Likely an issue with controller-1 as well. Appears to be related to ceph journals.


Steps to Reproduce
Install AIO-DX system
Swact to controller-1
Lock controller-0
issue system host-reinstall controller-0
Unlock controller-0

Expected Behavior
Controller-0 unlocks and becomes available

Actual Behavior
Controller-0 enters a failed state

100% on AIO-DX

System Configuration
Seen on AIO-DX systems. Not seen on 2+2 systems. Have not tested on dedicated storage systems.

Branch/Pull Time/Commit
cengn 20200107T000000Z

Last Pass

Controller-0 Puppet
2020-01-16T04:32:38.851 ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.0']' returned non-zero exit status 1
Controller-1 sysinv
2020-01-16 04:28:23.981 1131440 WARNING ceph_client ... [{u'outb': u'{"checks":{"OSD_DOWN":{"severity":"HEALTH_WARN","summary":{"message":"1 osds down"},"detail":[{"message":"osd.0 (root=storage-tier,chassis=group-0,host=controller-0) is down"}]

Test Activity
Developer Testing

lock node before reinstall, identify all ceph osd disks, wipe the journal by hand on the node you want to reinstall, then reinstall
controller-0# dd if=/dev/zero of=/dev/sdb2 bs=1M

Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - workaround exists

Changed in starlingx:
importance: Undecided → Medium
tags: added:
Changed in starlingx:
assignee: nobody → Ovidiu Poncea (ovidiu.poncea)
status: New → Triaged
tags: added: stx.4.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers