primary-rabbitmq task hangs causing deployment due to task timeout

Bug #1592842 reported by Nikolay Starodubtsev
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Sustaining
Mitaka
Invalid
High
Fuel Sustaining

Bug Description

During Fuel installations all nodes are finished but Task[primary-rabbitmq/1] failed.
In astute.log we can found this:

2016-06-12 00:26:32 DEBUG [32182] Node[1]: Node 1: task primary-rabbitmq, task status running
2016-06-12 00:26:32 WARNING [32182] Puppet agent 1 didn't respond within the allotted time
2016-06-12 00:26:32 DEBUG [32182] Task time summary: primary-rabbitmq with status failed on node 1 took 00:15:00

Looks like, puppet agent timeout.

Failed build is here: https://product-ci.infra.mirantis.net/view/10.0/job/10.0.main.ubuntu.bvt_2/295/

Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
milestone: none → 10.0
importance: Undecided → High
status: New → Confirmed
tags: added: area-python module-astute
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Alex Schultz (alex-schultz) wrote : Re: Puppet agent timeout

Reproduced on BVT#334. Logs indicate rabbitmq_user took excessively long. From the last puppet report:
- - rabbitmq_user
  - "Rabbitmq user"
  - 12531.535754044999

Points to the rabbitmq_user creation hanging and not finishing until the environment was reverted.

Changed in fuel:
importance: High → Critical
Revision history for this message
Alex Schultz (alex-schultz) wrote :
summary: - Puppet agent timeout
+ primary-rabbitmq task hangs causing deployment due to task timeout
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Georgy Kibardin (gkibardin)
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Georgy Kibardin (gkibardin) wrote :

rabbitmq has hung on startup, digging further.

Revision history for this message
Georgy Kibardin (gkibardin) wrote :

rabbitmqctl_report.txt has been generated successfully, this means that rabbitmq has started successfully and the problem is in the puppet.

Revision history for this message
Georgy Kibardin (gkibardin) wrote :

The maximum we can get from logs is that puppet was about to ensure rabbitmq is started. Rabbitmq was started but puppet is unaware of that. For some reason debug logging in puppet is off and it is hard to understand what exactly happened.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-astute (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/333211

Changed in fuel:
importance: Critical → High
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Not reproducible

Changed in fuel:
status: In Progress → Incomplete
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/336947

Changed in fuel:
assignee: Georgy Kibardin (gkibardin) → Maksim Malchuk (mmalchuk)
status: Incomplete → In Progress
Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Reproduced locally on 10.0 ISO #432.
Found that we miss something during upgrade puppet-rabbitmq module.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/336947
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=7c425798d35e226b16aa4c736e644d2837136aae
Submitter: Jenkins
Branch: master

commit 7c425798d35e226b16aa4c736e644d2837136aae
Author: Maksim Malchuk <email address hidden>
Date: Sun Jul 3 23:42:17 2016 +0300

    Configure correct host_ip for rabbitmq_ocf

    This commit fix forgotten change in the rabbitmq_ocf during pinning
    puppetlabs-rabbitmq to 5.4.0 [0]. The host_ip value should be set to
    the service ip address instead of the management ip address which
    used only for management plugin. The appropriate comment about the
    node_ip_address setting added. Also, this commit contains some
    puppet cleanups and fixes noop tests.

    [0] Ic589f5b0e978a3884d80f0fbcbdafc1e62f6c843

    Change-Id: I0cd511c8822f8569e3f2892cc04d7c0ca5660ada
    Closes-Bug: #1592842
    Signed-off-by: Maksim Malchuk <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/333211
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=668287af1c342b419745e51755b2e06d0943dfd6
Submitter: Jenkins
Branch: master

commit 668287af1c342b419745e51755b2e06d0943dfd6
Author: Georgy Kibardin <email address hidden>
Date: Thu Jun 23 12:34:23 2016 +0300

    Puppet debug is turned on via puppet_debug

    Do not lose the puppet debug flag.

    Change-Id: I697b92714f555e406de7214329c287435f26e0c4
    Related-Bug: #1592842
    Related-Bug: #1560505

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-astute (stable/mitaka)

Reviewed: https://review.openstack.org/340973
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=596cab2441a63852e4585633131cf04678ff67e7
Submitter: Jenkins
Branch: stable/mitaka

commit 596cab2441a63852e4585633131cf04678ff67e7
Author: Georgy Kibardin <email address hidden>
Date: Thu Jun 23 12:34:23 2016 +0300

    Puppet debug is turned on via puppet_debug

    Do not lose the puppet debug flag.

    Change-Id: I697b92714f555e406de7214329c287435f26e0c4
    Related-Bug: #1592842
    Related-Bug: #1560505
    (cherry picked from commit 668287af1c342b419745e51755b2e06d0943dfd6)

tags: added: in-stable-mitaka
Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :

2016-07-27 10:10:55 INFO [30084] Cluster[]: Stop deployment by internal reason
2016-07-27 10:10:55 WARNING [30084] Cluster[]: Fault tolerance exceeded the stop conditions [{"fault_tolerance"=>-1, "name"=>"primary-controller", "node_ids"=>[], "failed_node_ids"=>["1"]}]
2016-07-27 10:10:55 DEBUG [30084] Cluster[]: Count faild node 1 for group primary-controller
2016-07-27 10:10:55 DEBUG [30084] Node[1]: Decreasing node concurrency to: 0
2016-07-27 10:10:55 DEBUG [30084] Task[primary-rabbitmq/1]: Decreasing task concurrency to: 0
2016-07-27 10:10:55 DEBUG [30084] Task time summary: primary-rabbitmq with status failed on node 1 took 00:15:00
2016-07-27 10:10:55 WARNING [30084] Puppet agent 1 didn't respond within the allotted time
2016-07-27 10:10:55 DEBUG [30084] Node[1]: Node 1: task primary-rabbitmq, task status running

2016-07-27 10:10:55 INFO [30084] Cluster[]: All nodes are finished. Failed tasks: Task[primary-rabbitmq/1] Stopping the deployment process!
2016-07-27 10:10:55 DEBUG [30084] Graph[5]: Found failed tasks on node 5: openstack-network-agents-l3, ceilometer-controller, keystone, openstack-network-end, openstack-controller, ceilometer-radosgw-user, controller_remaining_tasks, murano-cfapi, vmware-vcenter, rabbitmq, ironic-compute, dns-server, swift-proxy_storage, disable_keystone_service_token, ceph-mon, ceph-radosgw, openstack-network-agents-metadata, horizon, openstack-network-common-config, glance, aodh, swift-rebalance-cron, heat, ironic-api, murano-rabbitmq, openstack-network-start, sahara, murano, openstack-network-plugins-l2, openstack-network-agents-dhcp, openstack-network-server-config, openstack-cinder, openstack-network-server-nova

[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_release:
 10.0
cat /etc/fuel_openstack_version:
 newton-10.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-misc-10.0.0-1.mos8692.git.ffb4152.noarch
 nailgun-mcagents-10.0~b1-1.el7~mos2.noarch
 fuel-openstack-metadata-10.0~b1-1.el7~mos6.noarch
 fuel-migrate-10.0.0-1.mos8692.git.ffb4152.noarch
 fuel-release-10.0~b1-2.el7~mos1.noarch
 python-fuelclient-10.0.0~b1-1.el7~mos2.noarch
 fuelmenu-10.0~b1-1.el7~mos0.noarch
 fuel-utils-10.0.0-1.mos8692.git.ffb4152.noarch
 fuel-nailgun-10.0~b1-1.el7~mos6.noarch
 fuel-ostf-10.0~b1-1.el7~mos2.noarch
 fuel-setup-10.0~b1-2.el7~mos1.noarch
 rubygem-astute-10.0~b1-1.el7~mos2.noarch
 fuel-library10.0-10.0.0-1.mos8692.git.ffb4152.noarch
 shotgun-10.0~b1-1.el7~mos0.noarch
 fuel-agent-10.0~b1-1.el7~mos4.noarch
 fuel-ui-10.0~b1-1.el7~mos10.noarch
 fuel-10.0~b1-2.el7~mos1.noarch
 fuel-bootstrap-cli-10.0~b1-1.el7~mos4.noarch
 fuel-notify-10.0.0-1.mos8692.git.ffb4152.noarch
 network-checker-10.0~b1-1.el7~mos0.noarch

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/348226

Changed in fuel:
assignee: Maksim Malchuk (mmalchuk) → Kyrylo Galanov (kgalanov)
status: Confirmed → In Progress
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

Puppet run takes a little bit more time than the actual timeout is.
2016-07-27T10:11:55.580323+00:00 notice: Finished catalog run in 953.18 seconds

A patch was created to address the issue https://review.openstack.org/348226

Dmitry Pyzhov (dpyzhov)
tags: added: 9.1-proposed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Kyrylo Galanov (<email address hidden>) on branch: master
Review: https://review.openstack.org/348226

Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

Is this issue still reproduced? It should be already fixed by multiple commits.

Changed in fuel:
status: In Progress → Incomplete
Changed in fuel:
assignee: Kyrylo Galanov (kgalanov) → Fuel Sustaining (fuel-sustaining-team)
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

I closed this as it seems to not introduced anymore. Please, reopen if it will take place again.

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.