Activity log for bug #1484080

Date Who What changed Old value New value Message
2015-08-12 11:22:24 Dennis Dmitriev bug added bug
2015-08-12 11:25:16 Dennis Dmitriev description There are several types of issues faced using qemu2.0.0: 1) Fuel admin node freezes after suspend / resume sequence (even without taking a snapshot) eating 100% CPU: [1], [2] Reproduced in ~70% system tests when snapshot is taken after deploy of cluster. We use workaround for this issue in system tests: destroy and start Fuel admin node if it inaccessible after revert. Duration of each job that perform system tests can be increased more than on an hour because of this issue. 2) Docker containers on Fuel master node can be completely broken after destroy Fuel admin node that freezes by the reason from the issue №1 : [3] No workaround here because there is an unrecoverable data loss. Tests are failed. 3) Sometimes libvirt unable to take a snapshot because a volume can be busy for unknown reason at the moment of taking the snapshot: [4] No workaround here. Tests are failed. To investigate the issue, was performed the following actions: - Stop docker containers before taking a snapshot. Issue №1 is reproduced. - Reduce IO on Fuel admin node by stopping docker containers and rsyslogd. Issue №1 is reproduced. - Snapshot slave nodes first in order to get Fuel admin node snapshoted when there is no other activity on the host. Issue №1 is reproduced. - Force closing all SSH sessions on Fuel admin node to reduce the amount of ESTABLISHED connections before taking a snapshot. Issue №1 is reproduced. - All the above methods at the same time. No success. Issue №1 can be not reproduced or reproducing constantly on the same host, when no other activity on the host was performed. -------------- As the possible solution, we can test on Jenkins qemu v2.3 , which doesn't show issues with snapshoting or suspending on local hosts, and upgrade to qemu2.3 in case if the tests did not detect new problems. [1] https://bugs.launchpad.net/fuel/+bug/1418204 [2] https://bugs.launchpad.net/fuel/+bug/1450508 [3] https://bugs.launchpad.net/fuel/+bug/1457802 [4] https://bugs.launchpad.net/fuel/+bug/1415079 There are several types of issues faced using qemu2.0.0: 1) Fuel admin node freezes after suspend / resume sequence (even without taking a snapshot) eating 100% CPU: [1], [2] Reproduced in ~70% system tests when snapshot is taken after deploy of cluster. We use workaround for this issue in system tests: destroy and start Fuel admin node if it inaccessible after revert. Duration of each job that perform system tests can be increased more than on an hour because of this issue. 2) Docker containers on Fuel master node can be completely broken after destroy Fuel admin node that freezes by the reason from the issue №1 : [3] No workaround here because there is an unrecoverable data loss. Tests are failed. 3) Sometimes libvirt unable to take a snapshot because a volume can be busy for unknown reason at the moment of taking the snapshot: [4] No workaround here. Tests are failed. To investigate the issue, was performed the following actions:   - Stop docker containers before taking a snapshot. Issue №1 is reproduced.   - Reduce IO on Fuel admin node by stopping docker containers and rsyslogd. Issue №1 is reproduced.   - Snapshot slave nodes first in order to get Fuel admin node snapshoted when there is no other activity on the host. Issue №1 is reproduced.   - Force closing all SSH sessions on Fuel admin node to reduce the amount of ESTABLISHED connections before taking a snapshot. Issue №1 is reproduced.   - All the above methods at the same time. No success. Issue №1 can be not reproduced or reproducing constantly on the same host, when no other activity on the host was performed. -------------- As the possible solution, we can test qemu v2.3 on Jenkins, which doesn't show issues with snapshoting or suspending on local hosts, and upgrade to qemu2.3 in case if the tests did not detect new problems. [1] https://bugs.launchpad.net/fuel/+bug/1418204 [2] https://bugs.launchpad.net/fuel/+bug/1450508 [3] https://bugs.launchpad.net/fuel/+bug/1457802 [4] https://bugs.launchpad.net/fuel/+bug/1415079
2015-08-12 11:25:43 Dennis Dmitriev information type Public Private
2015-08-12 11:25:55 Dennis Dmitriev information type Private Public
2015-08-12 12:45:38 Andrey Nikitin fuel: status New Triaged
2015-08-28 10:45:24 Igor Shishkin fuel: assignee Fuel DevOps (fuel-devops) Fuel build team (fuel-build)
2015-09-03 10:53:24 Aleksandra Fedorova fuel: importance Medium High
2015-09-04 10:12:46 Fuel Devops McRobotson fuel: status Triaged In Progress
2015-09-04 16:25:27 Sergey Otpuschennikov fuel: assignee Fuel build team (fuel-build) Sergey Otpuschennikov (sotpuschennikov)
2015-09-10 11:45:16 Dennis Dmitriev fuel: assignee Sergey Otpuschennikov (sotpuschennikov) Dennis Dmitriev (ddmitriev)
2015-09-29 11:00:07 Dennis Dmitriev summary qemu2.0.0 should be upgraded to 2.3 on Jenkins Product CI qemu2.0.0 should be upgraded to 2.4 on Jenkins Product CI
2015-09-29 11:06:32 Dennis Dmitriev fuel: assignee Dennis Dmitriev (ddmitriev) Fuel DevOps (fuel-devops)
2015-09-29 11:06:44 Dennis Dmitriev fuel: status In Progress Confirmed
2015-10-12 13:57:44 Igor Shishkin fuel: assignee Fuel DevOps (fuel-devops) Fuel build team (fuel-build)
2015-10-12 15:17:50 Sergey Otpuschennikov fuel: status Confirmed In Progress
2015-10-12 15:17:55 Sergey Otpuschennikov fuel: assignee Fuel build team (fuel-build) Sergey Otpuschennikov (sotpuschennikov)
2015-10-12 17:36:16 Fuel Devops McRobotson fuel: status In Progress Fix Committed
2015-10-12 17:39:11 Sergey Otpuschennikov fuel: assignee Sergey Otpuschennikov (sotpuschennikov) Fuel DevOps (fuel-devops)
2015-10-12 17:39:20 Sergey Otpuschennikov fuel: status Fix Committed Confirmed
2015-10-12 18:54:36 Nastya Urlapova tags swarm-blocker
2015-10-16 10:53:57 Dmitry Pyzhov tags swarm-blocker devops swarm-blocker
2015-10-22 03:42:32 Dmitry Pyzhov tags devops swarm-blocker area-devops devops swarm-blocker
2015-11-05 15:16:47 Fuel Devops McRobotson fuel: status Confirmed In Progress
2015-11-09 11:09:35 Igor Shishkin fuel: status In Progress Triaged
2015-11-09 11:18:40 Mateusz Matuszkowiak fuel: assignee Fuel DevOps (fuel-devops) Mateusz Matuszkowiak (mmatuszkowiak)
2015-11-09 15:30:34 Mateusz Matuszkowiak fuel: status Triaged In Progress
2015-11-10 17:31:16 Mateusz Matuszkowiak fuel: status In Progress Fix Committed
2015-11-10 17:32:53 Mateusz Matuszkowiak fuel: status Fix Committed Fix Released