51-hosts at scale fails to complete and does not report an error

Bug #1674732 reported by Joe Talerico
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Steven Hardy

Bug Description

51-hosts fails to complete running, and wedges scale deployments from ever completing.

https://github.com/openstack/os-refresh-config/blob/master/os_refresh_config/os_refresh_config.py#L154

^ however there is no error.d directory on any of my hosts, nor do we see the error referenced.

Things would lock up like this : https://gist.github.com/jtaleric/407cda1329bf104059c71204c177758f

I would look at the host, and review the tmp directory : https://gist.github.com/jtaleric/af4015022d8782b3d0cfb671479dfb1a

Note: we would never see stop_51-hosts

To get around this, I have modified the 51-hosts file to simply exit 0, to have my deployment move forward.

Which I do see:
2017-03-21 15:27:21Z [overcloud-openstack-ComputeAllNodesDeployment-4hwvwrhulljh.51]: SIGNAL_IN_PROGRESS Signal: deployment 72297b9d-aaaf-4684-b3d4-264fdea
f4f4b succeeded

Joe Talerico (jtaleric)
summary: - 51-hosts at scale fails to complete or report a bug
+ 51-hosts at scale fails to complete or report an error
Changed in tripleo:
importance: Critical → High
milestone: none → pike-2
Steven Hardy (shardy)
summary: - 51-hosts at scale fails to complete or report an error
+ 51-hosts at scale fails to complete and does not report an error
Revision history for this message
Dan Prince (dan-prince) wrote :

Just a heads up that during Ocata release we stopped using 51-hosts in favor of a heat hook. This was required for the "composable undercloud" effort (for which I did not to implement signalling support for these old style elements).

The new hook is here: http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/scripts/hosts-config.sh

If possible it might be worth trying to see if this could be helpful in the old environment using 51-hosts to give better error handling and signalling.

Revision history for this message
Joe Talerico (jtaleric) wrote :

Running :
[root@overcloud-openstack-controller-0 heat-admin]# dib-run-parts /usr/libexec/os-refresh-config/configure.d/ --debug

I see (at the end):
/usr/libexec/os-refresh-config/configure.d//51-hosts: line 18: /bin/awk: Argument list too long

So, that is why 51-hosts is failing.

Changed in tripleo:
assignee: nobody → Kambiz Aghaiepour (kambiz-e)
status: Triaged → In Progress
Changed in tripleo:
assignee: Kambiz Aghaiepour (kambiz-e) → Steven Hardy (shardy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-image-elements (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/454616

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/454645

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/449198
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=7e5e4dc029198d46705c2e8a4f3f3a06899ab450
Submitter: Jenkins
Branch: master

commit 7e5e4dc029198d46705c2e8a4f3f3a06899ab450
Author: Kambiz Aghaiepour <email address hidden>
Date: Thu Mar 23 11:34:15 2017 -0400

    51-hosts fails if given lots of changes

    The issue is how awk is used to update hosts files. When
    os-apply-config produces sufficiently large amounts of lines
    to be added (or ensure in) hosts files, awk will error out.
    To work around it, instead use sed, and reconstruct the
    host file(s) to ensure the entries between the comment delimeters
    of "# HEAT_HOSTS_START" and "# HEAT_HOSTS_END" are swapped with
    the new entries.

    Also get rid of blank lines produced by os-apply-config

    Partial-Bug: #1674732
    Change-Id: Ibe0a9f6ec10d55750e3b0e16301236141f988d69

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (stable/ocata)

Reviewed: https://review.openstack.org/454616
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=a1da5fbe3095e055af1a0a520a91938a328a92f1
Submitter: Jenkins
Branch: stable/ocata

commit a1da5fbe3095e055af1a0a520a91938a328a92f1
Author: Kambiz Aghaiepour <email address hidden>
Date: Thu Mar 23 11:34:15 2017 -0400

    51-hosts fails if given lots of changes

    The issue is how awk is used to update hosts files. When
    os-apply-config produces sufficiently large amounts of lines
    to be added (or ensure in) hosts files, awk will error out.
    To work around it, instead use sed, and reconstruct the
    host file(s) to ensure the entries between the comment delimeters
    of "# HEAT_HOSTS_START" and "# HEAT_HOSTS_END" are swapped with
    the new entries.

    Also get rid of blank lines produced by os-apply-config

    Partial-Bug: #1674732
    Change-Id: Ibe0a9f6ec10d55750e3b0e16301236141f988d69

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/454645
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=92b238ea1739340746d93baa73a791c61c3a5e6c
Submitter: Jenkins
Branch: master

commit 92b238ea1739340746d93baa73a791c61c3a5e6c
Author: Steven Hardy <email address hidden>
Date: Fri Apr 7 10:50:39 2017 +0100

    Avoid awk error in hosts-config.sh for large deployments

    This ports the fixes made to the legacy 51-hosts script, which this
    script is derived from, to tht.

    See related t-i-e patch Ibe0a9f6ec10d55750e3b0e16301236141f988d69

    Change-Id: Ide922af93a5d185bd592e220327326f1d244c4e2
    Closes-Bug: #1674732

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.0.0b1

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0b1 development milestone.

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Is it possible to backport to newton?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-image-elements (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/497487

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (stable/newton)

Reviewed: https://review.openstack.org/497487
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=cb8335c9b17d222071f59b2b1e5aa2dcbb9804d4
Submitter: Jenkins
Branch: stable/newton

commit cb8335c9b17d222071f59b2b1e5aa2dcbb9804d4
Author: Kambiz Aghaiepour <email address hidden>
Date: Thu Mar 23 11:34:15 2017 -0400

    51-hosts fails if given lots of changes

    The issue is how awk is used to update hosts files. When
    os-apply-config produces sufficiently large amounts of lines
    to be added (or ensure in) hosts files, awk will error out.
    To work around it, instead use sed, and reconstruct the
    host file(s) to ensure the entries between the comment delimeters
    of "# HEAT_HOSTS_START" and "# HEAT_HOSTS_END" are swapped with
    the new entries.

    Also get rid of blank lines produced by os-apply-config

    Partial-Bug: #1674732
    Change-Id: Ibe0a9f6ec10d55750e3b0e16301236141f988d69
    (cherry picked from commit a1da5fbe3095e055af1a0a520a91938a328a92f1)

tags: added: in-stable-newton
Revision history for this message
Big Switch Networks (fuel-bugs-internal) wrote :

Hi , Could you please let us know where did you add exit 0?

we are also facing the same issue where 51-hosts stop is not happening..
Thanks

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers