Activity log for bug #1852065

Date Who What changed Old value New value Message
2019-11-11 11:10:59 Ovidiu Poncea bug added bug
2019-11-11 11:11:07 Ovidiu Poncea starlingx: assignee Ovidiu Poncea (ovidiu.poncea)
2019-11-11 12:14:31 Ovidiu Poncea description Bug Description : On duplex deployments, after reinstalling controller-1 sysinv agent does not report inventory nor does it connect to the rabbitmq of controller-0. When wipe_ceph_osds=true partitions of OSD drives are wiped and this wipe is reported to sysinv-conductor thorough sysinv-agent. W/o this report, OSD partitions are still in the database and, on unlock, puppet manifests tries to create them and fails. Problem is caused by https://review.opendev.org/691713 merged on 04.11.2019 as, on restore, ast it does not recreate /opt/platform/sysinv/.../sysinv.conf.default. W/o this file other nodes (except controller-0) will not be able to start their sysinv-agents correctly (service starts but it does nothing). Two solutions: A. Copy the file from backup on restore (this file is backed up) B. Fix code in https://review.opendev.org/691713 so that it recreates this file on restore, same as before the commit Severity -------- Major - B&R no longer works with wipe_ceph_osds=true on DX (tested). Also, on standard, reinstalling new hosts will be denied as, if sysinv.conf.default is not presend, inventory of new nodes will not report their inventory (assumption) Steps to Reproduce ------------------ 1. Install an AIO-DX deployment, do a backup 2. Reinstall controller-0 3. Run ansible restore with wipe_ceph_osds=true 4. Unlock controller-0 & wait for it to be available 5. Re-install controller-1 6. unlock controller-1 => it fails to apply manifests as it tries to create the ceph osd partitions which are no longer present Expected Behavior ------------------ When wipe_ceph_osds is set to true we should see that the partitions for the OSD nodes are removed from the database. Actual Behavior ---------------- As per description Reproducibility --------------- 100% Reproduce-able System Configuration -------------------- AIO-DX Branch/Pull Time/Commit ----------------------- StarlingX_Upstream_build release branch build as of 2018-11-04 Bug Description : On duplex deployments, after reinstalling controller-1 sysinv agent does not report inventory nor does it connect to the rabbitmq of controller-0. When wipe_ceph_osds=true partitions of OSD drives are wiped and this wipe is reported to sysinv-conductor thorough sysinv-agent. W/o this report, OSD partitions are still in the database and, on unlock, puppet manifests tries to create them and fails. Problem is caused by https://review.opendev.org/691713 merged on 04.11.2019 as, on restore, ast it does not recreate /opt/platform/sysinv/.../sysinv.conf.default. W/o this file other nodes (except controller-0) will not be able to start their sysinv-agents correctly (service starts but it does nothing). Two solutions: A. Copy the file from backup on restore (this file is backed up) B. Fix code in https://review.opendev.org/691713 so that it recreates this file on restore, same as before the commit Severity -------- Major - B&R no longer works with wipe_ceph_osds=true on DX (tested). Also, on standard, reinstalling new hosts will be denied as, if sysinv.conf.default is not presend, new nodes installed will not report their inventory => we won't be able to install new nodes at all on restored setups (supposition) Steps to Reproduce ------------------ 1. Install an AIO-DX deployment, do a backup 2. Reinstall controller-0 3. Run ansible restore with wipe_ceph_osds=true 4. Unlock controller-0 & wait for it to be available 5. Re-install controller-1 6. unlock controller-1 => it fails to apply manifests as it tries to create the ceph osd partitions which are no longer present Expected Behavior ------------------ When wipe_ceph_osds is set to true we should see that the partitions for the OSD nodes are removed from the database. Actual Behavior ---------------- As per description Reproducibility --------------- 100% Reproduce-able System Configuration -------------------- AIO-DX Branch/Pull Time/Commit ----------------------- StarlingX_Upstream_build release branch build as of 2018-11-04
2019-11-13 19:00:57 Ghada Khalil bug added subscriber Bill Zvonar
2019-11-13 19:01:01 Ghada Khalil starlingx: importance Undecided Medium
2019-11-13 19:01:02 Ghada Khalil starlingx: status New Triaged
2019-11-15 19:39:26 Ghada Khalil tags stx.3.0 stx.update
2019-11-18 11:04:05 Ovidiu Poncea starlingx: status Triaged In Progress
2019-11-19 17:49:08 OpenStack Infra starlingx: status In Progress Fix Released