Fuel for OpenStack

Activity log for bug #1629031

Date	Who	What changed	Old value	New value	Message
2016-09-29 17:24:43	Alexander Gordeev	bug			added bug
2016-09-29 17:28:09	Alexander Gordeev	attachment added		example of astute.log https://bugs.launchpad.net/fuel/+bug/1629031/+attachment/4751173/+files/delayed_provisioning.log
2016-09-29 17:29:03	Alexander Gordeev	tags		module-astute scale
2016-09-29 17:42:50	Alexander Gordeev	description	Detailed bug description: IBP provisioning requires specific file with provisioning data to be uploaded on every node before the actual provisioning script would be executed. This upload data task executes synchronously for every node in a row; one by one. Unless the task accomplished for one node, it won't start to upload the data for next node. If one node for some reasons becomes irresponsible, than it would take 11.5minutes to recognize the failure. Steps to reproduce: 0. Emulate mcollective outage for some slave nodes: stop mcollective service on some of nodes. 1. Start provisioning of the nodes Expected results: Provisioning task provisions all nodes on which mcollective is still operatable. No significant delay due to some nodes being unavailable for mcollective. Actual result: Provisioning task provisions all nodes on which mcollective is still operatable. Every unavailable node add additional 11.5mins of delay to provisioning. Eg.: for 8 unavailable nodes it would be 1.5H Reproducibility: Always Workaround: ??? Impact: UX, abnormally increased provisioning time on large scale. Description of the environment: Fuel 9.0 Additional information: Perhaps, somebody should add more shorter timeout for non-responding mcollective agents for UploadTask. Since UploadTask for provisioning usually takes few secs, than the timeout should be adjusted to a dozen of seconds. Not a dozen of minutes. https://github.com/openstack/fuel-astute/blob/stable/mitaka/lib/astute/image_provision.rb#L48-L58	Detailed bug description: IBP provisioning requires specific file with provisioning data to be uploaded on every node before the actual provisioning script would be executed. This upload data task is executed synchronously for every node in a row; one by one. Unless the task accomplished for one node, it won't start to upload the data for next node. If one node for some reasons becomes irresponsible, then it would take 11.5minutes to recognize the failure. Steps to reproduce: 0. Emulate mcollective outage for some slave nodes: stop mcollective service on some of nodes. 1. Start provisioning of the nodes Expected results: Provisioning task provisions all nodes on which mcollective is still operatable. No significant delay due to some nodes are being unavailable for mcollective. Actual result: Provisioning task provisions all nodes on which mcollective is still operatable. Every unavailable node add additional 11.5mins of delay to provisioning. Eg.: for 8 unavailable nodes it would be 1.5H Reproducibility: Always Workaround: ??? Impact: UX, abnormally increased provisioning time on large scale. Description of the environment: Fuel 9.0 Additional information: Perhaps, somebody should add more shorter timeout for non-responding mcollective agents for UploadTask. Since UploadTask for provisioning usually takes few secs, therefore the timeout should be adjusted to a dozen of seconds. Not a dozen of minutes. https://github.com/openstack/fuel-astute/blob/stable/mitaka/lib/astute/image_provision.rb#L48-L58
2016-09-29 17:43:56	Vladimir Sharshov	fuel: milestone		9.2
2016-09-29 17:43:58	Vladimir Sharshov	fuel: assignee		Vladimir Sharshov (vsharshov)
2016-09-29 17:44:00	Vladimir Sharshov	fuel: importance	Undecided	High
2016-09-29 17:44:03	Vladimir Sharshov	fuel: status	New	Confirmed
2016-09-29 17:54:38	Alexander Gordeev	tags	module-astute scale	area-python module-astute scale
2016-09-30 13:39:51	Alexander Gordeev	description	Detailed bug description: IBP provisioning requires specific file with provisioning data to be uploaded on every node before the actual provisioning script would be executed. This upload data task is executed synchronously for every node in a row; one by one. Unless the task accomplished for one node, it won't start to upload the data for next node. If one node for some reasons becomes irresponsible, then it would take 11.5minutes to recognize the failure. Steps to reproduce: 0. Emulate mcollective outage for some slave nodes: stop mcollective service on some of nodes. 1. Start provisioning of the nodes Expected results: Provisioning task provisions all nodes on which mcollective is still operatable. No significant delay due to some nodes are being unavailable for mcollective. Actual result: Provisioning task provisions all nodes on which mcollective is still operatable. Every unavailable node add additional 11.5mins of delay to provisioning. Eg.: for 8 unavailable nodes it would be 1.5H Reproducibility: Always Workaround: ??? Impact: UX, abnormally increased provisioning time on large scale. Description of the environment: Fuel 9.0 Additional information: Perhaps, somebody should add more shorter timeout for non-responding mcollective agents for UploadTask. Since UploadTask for provisioning usually takes few secs, therefore the timeout should be adjusted to a dozen of seconds. Not a dozen of minutes. https://github.com/openstack/fuel-astute/blob/stable/mitaka/lib/astute/image_provision.rb#L48-L58	Detailed bug description: IBP provisioning requires specific file with provisioning data to be uploaded on every node before the actual provisioning script would be executed. This upload data task is executed synchronously for every node in a row; one by one. Unless the task accomplished for one node, it won't start to upload the data for next node. If one node for some reasons becomes irresponsible, then it would take 11.5minutes to recognize the failure. Steps to reproduce: 0. Emulate mcollective outage for some slave nodes: stop mcollective service on some of nodes. 1. Start provisioning of the nodes Expected results: Provisioning task provisions all nodes on which mcollective is still operatable. No significant delay due to some nodes are being unavailable for mcollective. Actual result: Provisioning task provisions all nodes on which mcollective is still operatable. Every unavailable node add additional 11.5mins of delay to provisioning. Eg.: for 8 unavailable nodes it would be 1.5H Reproducibility: Always Workaround: ??? Impact: UX, abnormally increased OS provisioning time on large scale could lead to provisioning task failure by timeout. Description of the environment: Fuel 9.0 Additional information: Perhaps, somebody should add more shorter timeout for non-responding mcollective agents for UploadTask. Since UploadTask for provisioning usually takes few secs, therefore the timeout should be adjusted to a dozen of seconds. Not a dozen of minutes. https://github.com/openstack/fuel-astute/blob/stable/mitaka/lib/astute/image_provision.rb#L48-L58
2016-12-27 15:11:18	OpenStack Infra	fuel: status	Confirmed	In Progress
2016-12-28 14:49:06	Vladimir Sharshov	nominated for series		fuel/newton
2016-12-28 14:49:06	Vladimir Sharshov	bug task added		fuel/newton
2016-12-28 14:49:14	Vladimir Sharshov	fuel/newton: status	New	In Progress
2016-12-28 14:49:17	Vladimir Sharshov	fuel/newton: importance	Undecided	High
2016-12-28 14:49:19	Vladimir Sharshov	fuel/newton: assignee		Vladimir Sharshov (vsharshov)
2016-12-28 14:49:32	Vladimir Sharshov	fuel/newton: milestone		10.x-updates
2016-12-28 14:50:24	OpenStack Infra	fuel: status	In Progress	Fix Committed
2016-12-28 16:46:47	OpenStack Infra	fuel/newton: status	In Progress	Fix Committed
2016-12-28 20:19:02	OpenStack Infra	tags	area-python module-astute scale	area-python in-stable-mitaka module-astute scale
2017-02-03 10:06:33	Andrew Kalach	fuel: status	Fix Committed	Fix Released