The deployment hangs on provision stage

Bug #1644618 reported by Sergey Novikov
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Georgy Kibardin

Bug Description

Detailed bug description:
the issue was found by https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.huge_ha_neutron/133/testReport/(root)/huge_ha_neutron_vlan_ceph_ceilometer_rados/huge_ha_neutron_vlan_ceph_ceilometer_rados/

the provision takes ~2 hours:
 - 15 minutes is the build of IBP image;
 - 6 minutes matches an OS installation
 - other time is the hanging

interesting part of astute log http://paste.openstack.org/show/590361/

Steps to reproduce https://github.com/openstack/fuel-qa/blob/stable/mitaka/fuelweb_test/tests/tests_strength/test_huge_environments.py#L248-L257

Description of the environment:
snapshot #549

Revision history for this message
Sergey Novikov (snovikov) wrote :
description: updated
description: updated
Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
Changed in fuel:
importance: Undecided → High
status: New → Confirmed
tags: added: area-python
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Georgy Kibardin (gkibardin)
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Georgy Kibardin (gkibardin) wrote :

/usr/bin/provision script hangs on exit, i.e. the last line of code is executed - it writes a message to the log but script hangs somewhere else. I cannot reproduce it on my env, so, need an access to the test env at the time provisioning is hanging.

Changed in fuel:
status: In Progress → Incomplete
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
Changed in fuel:
status: Incomplete → New
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
Revision history for this message
Georgy Kibardin (gkibardin) wrote :

Dmitry, I need an access to it when a provisioning is in progress.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: New → Confirmed
tags: added: swarm-blocker
Revision history for this message
Dmitry Sutyagin (dsutyagin) wrote :
Revision history for this message
Georgy Kibardin (gkibardin) wrote :

Dmitry, it looks like not. In this case provisioning script exits correctly, i.e. after provisioning is complete. The problem that astute on master node seems not to be aware of this - it waits until time is out.

Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

According to CI 30-40% of jobs failed due to this bug. I am raising it to critical

Changed in fuel:
importance: High → Critical
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-astute (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/415218

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-agent (stable/mitaka)

Reviewed: https://review.openstack.org/415166
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=682d3492f7526cd63fa2a234fdb25072bb8ea066
Submitter: Jenkins
Branch: stable/mitaka

commit 682d3492f7526cd63fa2a234fdb25072bb8ea066
Author: Georgy Kibardin <email address hidden>
Date: Tue Dec 27 09:30:11 2016 +0000

    Revert "Ignore heartbeats lock fails"

    It seems that there are at least two level of ruby-stomp brokenness and
    the fact that the mutex in original commit is locked actually means
    there is no heartbeat received in time and we need to do something about
    this.

    This reverts commit 898bcca75224ad82fa98a85b77651faaf554e2b6.

    Closes-Bug: #1644618
    Change-Id: I245c2dee68539ec99feb48e2bb60f9600e85b91f

tags: added: in-stable-mitaka
Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-agent (master)

Reviewed: https://review.openstack.org/415168
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=f15987c3f2f1e23c260726a108669df7bf1b9b81
Submitter: Jenkins
Branch: master

commit f15987c3f2f1e23c260726a108669df7bf1b9b81
Author: Georgy Kibardin <email address hidden>
Date: Tue Dec 27 09:30:45 2016 +0000

    Revert "Ignore heartbeats lock fails"

    It seems that there are at least two level of ruby-stomp brokenness and
    the fact that the mutex in original commit is locked actually means
    there is no heartbeat received in time and we need to do something about
    this.

    This reverts commit b50241a7b243f553cc35e521ab99bb7f94d8b54a.

    Closes-Bug: #1644618
    Change-Id: I8351abaf0078b094bff2aa20994575c15aec213b

tags: added: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-agent (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/415450

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-agent (stable/newton)

Reviewed: https://review.openstack.org/415450
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=c02f49f40d3c406de3011c20c39b0f45ce3090e4
Submitter: Jenkins
Branch: stable/newton

commit c02f49f40d3c406de3011c20c39b0f45ce3090e4
Author: Georgy Kibardin <email address hidden>
Date: Tue Dec 27 09:30:45 2016 +0000

    Revert "Ignore heartbeats lock fails"

    It seems that there are at least two level of ruby-stomp brokenness and
    the fact that the mutex in original commit is locked actually means
    there is no heartbeat received in time and we need to do something about
    this.

    This reverts commit b50241a7b243f553cc35e521ab99bb7f94d8b54a.

    Closes-Bug: #1644618
    Change-Id: I8351abaf0078b094bff2aa20994575c15aec213b
    (cherry picked from commit f15987c3f2f1e23c260726a108669df7bf1b9b81)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-astute (stable/newton)

Related fix proposed to branch: stable/newton
Review: https://review.openstack.org/415469

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/415218
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=dc47550460741e0a22429662161e4a65509c2dc6
Submitter: Jenkins
Branch: master

commit dc47550460741e0a22429662161e4a65509c2dc6
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Tue Dec 27 15:38:26 2016 +0300

    Use async shell call for provision

    This change allow to use async shell task based on
    puppet to run provision commands.

    It is transition change between old run way of image
    provision and provision as graph which will also
    used async shell to run.

    It is more fault tolerance way to provision because
    temporary problem with connection between master node
    and provisioning node do not block or fail provision.

    Important notice: it is allow only if bootstrap image
    has puppet and daemonize packages which is true for 9.2
    or higher releases.

    Change-Id: Ie634fae9b63bf0c103ec8926647af75b57cefe23
    Related-Bug: #1644618

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-astute (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/415471

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-astute (stable/mitaka)

Reviewed: https://review.openstack.org/415471
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=a2e01d91eea068d04f67036fc61bc3387ede9031
Submitter: Jenkins
Branch: stable/mitaka

commit a2e01d91eea068d04f67036fc61bc3387ede9031
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Tue Dec 27 15:38:26 2016 +0300

    Use async shell call for provision

    This change allow to use async shell task based on
    puppet to run provision commands.

    It is transition change between old run way of image
    provision and provision as graph which will also
    used async shell to run.

    It is more fault tolerance way to provision because
    temporary problem with connection between master node
    and provisioning node do not block or fail provision.

    Important notice: it is allow only if bootstrap image
    has puppet and daemonize packages which is true for 9.2
    or higher releases.

    Change-Id: Ie634fae9b63bf0c103ec8926647af75b57cefe23
    Related-Bug: #1644618
    (cherry picked from commit dc47550460741e0a22429662161e4a65509c2dc6)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-astute (stable/newton)

Reviewed: https://review.openstack.org/415469
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=9d7ba716fc1da84a65a21e6ee64ed74424d6b106
Submitter: Jenkins
Branch: stable/newton

commit 9d7ba716fc1da84a65a21e6ee64ed74424d6b106
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Tue Dec 27 15:38:26 2016 +0300

    Use async shell call for provision

    This change allow to use async shell task based on
    puppet to run provision commands.

    It is transition change between old run way of image
    provision and provision as graph which will also
    used async shell to run.

    It is more fault tolerance way to provision because
    temporary problem with connection between master node
    and provisioning node do not block or fail provision.

    Important notice: it is allow only if bootstrap image
    has puppet and daemonize packages which is true for 9.2
    or higher releases.

    Change-Id: Ie634fae9b63bf0c103ec8926647af75b57cefe23
    Related-Bug: #1644618
    (cherry picked from commit dc47550460741e0a22429662161e4a65509c2dc6)

Revision history for this message
Vladimir Kozhukalov (kozhukalov) wrote :

Still hasn't been fixed. Re-opening.

Changed in fuel:
status: Fix Committed → Confirmed
Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/415167
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=13b4d0f38e2a3e1c4d0ac0d91a7bce1dc23d7c8a
Submitter: Jenkins
Branch: stable/mitaka

commit 13b4d0f38e2a3e1c4d0ac0d91a7bce1dc23d7c8a
Author: Georgy Kibardin <email address hidden>
Date: Tue Dec 27 09:30:32 2016 +0000

    Revert "Ignore heartbeats lock fails"

    It seems that there are at least two level of ruby-stomp brokenness and
    the fact that the mutex in original commit is locked actually means
    there is no heartbeat received in time and we need to do something about
    this.

    This reverts commit 8318d7056556337f17f596edad9d7eed48ec3ca5.

    Partial-Bug: #1644618
    Change-Id: I565f430d17bcee2c50c0ddc8ecc11f3dc8b420ed

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/417023

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/418259

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/417023
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=64ab61d3ddb36cf956c6da0fb52a72c8f6f9590e
Submitter: Jenkins
Branch: master

commit 64ab61d3ddb36cf956c6da0fb52a72c8f6f9590e
Author: Georgy Kibardin <email address hidden>
Date: Tue Dec 27 09:30:32 2016 +0000

    Revert "Ignore heartbeats lock fails"

    It seems that there are at least two level of ruby-stomp brokenness and
    the fact that the mutex in original commit is locked actually means
    there is no heartbeat received in time and we need to do something about
    this.

    This reverts commit 8318d7056556337f17f596edad9d7eed48ec3ca5.

    Partial-Bug: #1644618
    Change-Id: I565f430d17bcee2c50c0ddc8ecc11f3dc8b420ed
    (cherry picked from commit 13b4d0f38e2a3e1c4d0ac0d91a7bce1dc23d7c8a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/newton)

Reviewed: https://review.openstack.org/418259
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=b8e0a12955a938c7103def1f5b204bb557cc364e
Submitter: Jenkins
Branch: stable/newton

commit b8e0a12955a938c7103def1f5b204bb557cc364e
Author: Georgy Kibardin <email address hidden>
Date: Tue Dec 27 09:30:32 2016 +0000

    Revert "Ignore heartbeats lock fails"

    It seems that there are at least two level of ruby-stomp brokenness and
    the fact that the mutex in original commit is locked actually means
    there is no heartbeat received in time and we need to do something about
    this.

    This reverts commit 8318d7056556337f17f596edad9d7eed48ec3ca5.

    Partial-Bug: #1644618
    Change-Id: I565f430d17bcee2c50c0ddc8ecc11f3dc8b420ed
    (cherry picked from commit 13b4d0f38e2a3e1c4d0ac0d91a7bce1dc23d7c8a)
    (cherry picked from commit 64ab61d3ddb36cf956c6da0fb52a72c8f6f9590e)

Revision history for this message
Ekaterina Shutova (eshutova) wrote :
tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-agent 11.0.0.0rc1

This issue was fixed in the openstack/fuel-agent 11.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.