os-refresh-config needs a timeout argument

Bug #1595722 reported by Steve Baker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Steve Baker

Bug Description

Currently any misbehaving script can stall os-refresh-config forever, which leads to the halting of os-collect-config polling and all heat deployment resources timing out.

I propose that an os-refresh-config --timeout <seconds> argument be implemented so that os-refresh-timeout can terminate itself when the overall run time exceeds <seconds>

Changed in tripleo:
assignee: nobody → Steve Baker (steve-stevebaker)
Steven Hardy (shardy)
Changed in tripleo:
status: New → Triaged
milestone: none → newton-2
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/337384

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/337385

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/337384
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=072b0670cce6d33cc402185731437df63d1782d6
Submitter: Jenkins
Branch: master

commit 072b0670cce6d33cc402185731437df63d1782d6
Author: Steve Baker <email address hidden>
Date: Tue Jul 5 11:31:39 2016 +1200

    Template param for what command occ runs

    The ConfigCommand parameter overrides the server resource metadata to
    specify what command os-collect-config runs whenever any configuration
    data changes.

    The default is already 'os-refresh-config' so this change has no
    effect but it allows a future change to specify an
    os-refresh-config --timeout argument to fix bug #1595722.

    Change-Id: I8dd35b6724d8c00e5495faca84ee8fee77641b82
    Partial-Bug: #1595722

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/337385
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=301560b35aae7b8861e519c705f131ead53bb7f0
Submitter: Jenkins
Branch: master

commit 301560b35aae7b8861e519c705f131ead53bb7f0
Author: Steve Baker <email address hidden>
Date: Tue Jul 5 11:53:23 2016 +1200

    Set os-refresh-config timeout to 4 hours

    This change uses the new os-refresh-config --timeout argument to set a
    kill timeout for stalled os-refresh-config runs.

    4 hours is a reasonable conservative value since it matches the stack
    timeout - but it can be set shorter in the future based on actual run
    times.

    Change-Id: I433f558515df24736263ec0d50de08ad8f78498f
    Closes-Bug: #1595722
    DependsOn: Ibcbb2090aed126abec8dac49efa53ecbdb2b9b2c

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Steven Hardy (shardy) wrote :

Some of this landed after the commits we used to cut the n2 release (because periodic promotion has been broken by osc-lib for a few days), so I'll shift this to n3 despite it having landed on trunk.

Changed in tripleo:
milestone: newton-2 → newton-3
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Also this will have upgrade implications, os-refresh-config on the nodes must be updated before a stack update containing the tht changes

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 5.0.0.0b3

This issue was fixed in the openstack/tripleo-heat-templates 5.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.