Activity log for bug #1614914

Date Who What changed Old value New value Message
2016-08-19 10:35:13 Yury Tregubov bug added bug
2016-08-19 10:37:22 Yury Tregubov attachment added Diagnositc snapshot of MOS9.1 #153 https://bugs.launchpad.net/mos/+bug/1614914/+attachment/4723940/+files/fuel-snapshot-2016-08-19_10-13-56.tar.gz
2016-08-19 13:10:36 Denis Meltsaykin mos: assignee MOS Ceph (mos-ceph)
2016-08-19 13:10:44 Denis Meltsaykin mos: status New Confirmed
2016-08-19 13:10:54 Denis Meltsaykin mos: milestone 9.1
2016-08-19 13:10:56 Denis Meltsaykin mos: importance Undecided High
2016-08-19 13:11:29 Denis Meltsaykin tags area-mos
2016-08-24 11:51:39 Sergey Shevorakov tags area-mos area-mos blocker-for-qa
2016-08-24 13:09:42 Alexei Sheplyakov mos: assignee MOS Ceph (mos-ceph)
2016-08-24 13:35:42 Denis Meltsaykin mos: assignee Fuel QA Team (fuel-qa)
2016-08-24 13:35:52 Denis Meltsaykin mos: status Confirmed New
2016-08-24 13:36:16 Denis Meltsaykin tags area-mos blocker-for-qa area-qa blocker-for-qa
2016-08-24 14:57:41 Nastya Urlapova mos: status New Incomplete
2016-08-24 14:57:59 Nastya Urlapova mos: assignee Fuel QA Team (fuel-qa) Yury Tregubov (ytregubov)
2016-09-05 14:09:31 Timur Nurlygayanov mos: status Incomplete Confirmed
2016-09-07 11:29:53 Alexei Sheplyakov description Subj is seen On MOS9.1 somewhere after snapshot #107. It's not 100% reproducible, but it's quite stable. We've caught it on nearly every 3'd CI run during last two weeks. The problem itself is that OSTF are failed after execution of 'restart ceph-all' on all controllers and ceph nodes in the following way Test "Check state of haproxy backends on controllers" status is failure; Some haproxy backend has down state.. Please refer to OpenStack logs for more details. No errors were found in logs. Diagnostic snapshot is attached. To reproduce the fault: - deploy env with 3 controllers and 2 ceph+compute nodes. - revert it - run 'restart ceph-all' on each node in the env - run OSTF tests Subj is seen On MOS9.1 somewhere after snapshot #107. It's not 100% reproducible, but it's quite stable. We've caught it on nearly every 3'd CI run during last two weeks. The problem itself is that OSTF are failed after execution of 'restart ceph-all' on all controllers and ceph nodes in the following way Test "Check state of haproxy backends on controllers" status is failure; Some haproxy backend has down state.. Please refer to OpenStack logs for more details. No errors were found in logs. Diagnostic snapshot is attached. To reproduce the fault: - deploy env with 3 controllers and 2 ceph+compute nodes. - revert it - run 'restart ceph-all' on each node in the env - run OSTF tests The root cause is that fuel-qa restarts the whole ceph cluster at once, and launches OSTF tests immediately after restarting the cluster. However ceph is NOT designed to withstand the *whole cluster* outage, so there's a time interval during which (ceph) cluster can't serve clients' requests. fuel-qa should either - tolerate temporarily unavailable cluster - restart ceph daemons one by one giving each instance (monitor, OSD) enough time to join the cluster
2016-09-07 11:44:00 Timur Nurlygayanov mos: status Confirmed In Progress
2016-09-07 13:36:02 Timur Nurlygayanov mos: status In Progress Fix Committed
2016-09-12 11:57:17 Timur Nurlygayanov mos: status Fix Committed Fix Released