heat-engine starts only one worker on single-core machines

Bug #1526045 reported by Ryan Brown
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Steve Baker
Kilo
Fix Released
Undecided
Unassigned

Bug Description

The configuration default for the engine workers[1] is set to `processutils.get_worker_count()` from oslo-concurrency, and I've seen that on single-core machines, only having one worker can lead to stacks timing out on complex ops (especially with tripleo).

I'd like to propose making the default the greater of 4 or the number of cores on the machine, so that even on single-core machines there are enough workers to make progress.

[1]: https://github.com/openstack/heat/blob/master/heat/common/config.py#L84-L86

Changed in heat:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Sergey Kraynev (skraynev) wrote :

@Ryan: I have this patch is on review and can add some changes in follow up patch.
However, I see to important things:
1. IMO we can not change default behavior so fast. Probably need to use some warning before....
2. Also I am not sure, that we should change default behavior :)

Why? - because we plan to make convergence is default. So we need more workers on some powerful hardware.
It allows to make Heat more "scalable". (more cores- more workers)
If someone wants to use another value - he should specify it in config file IMO.
How we can guarantee, that someone else will not ask us about changing default to 2, because it's more comfortable for his deployment...

tags: added: kilo-backport-potential liberty-backport-potential
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

@Sergey, we're proposing that the default engine count should never be less than 4, but it will still be more than 4 on many-core servers.

It looks like this change only modifies API process workers so is not really related. https://review.openstack.org/#/c/254728/

The issue we're dealing with is deadlocks on complex stacks due to no available heat-engine to service RPC calls, so I think setting a minimum worker count is reasonable if the heat.conf has no explicit num_engine_workers set.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/259172

Changed in heat:
assignee: nobody → Steve Baker (steve-stevebaker)
status: Triaged → In Progress
Revision history for this message
Sergey Kraynev (skraynev) wrote :

Ah. You told about heat-engine, I mixed it with API services. Thx you for clarification, then this change makes sense for me.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/259172
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=adb21217955e59fce5fb194635b36b5b40d6d8c8
Submitter: Jenkins
Branch: master

commit adb21217955e59fce5fb194635b36b5b40d6d8c8
Author: Steve Baker <email address hidden>
Date: Fri Dec 18 09:10:46 2015 +1300

    Make minimum default num_engine_workers>=4

    Downstream test environments are frequently having failing stacks with
    error messages like:

      MessagingTimeout: resources[0]: Timed out waiting for a reply to
      message ID ...

    These environments generally have 1 or 2 cores, so only spawn one or two
    engine workers. This deadlocks with stacks that have many nested stacks
    due to engine->engine RPC calls.

    Even our own functional tests don't work reliably with less than 4
    workers, and the workaround has been to set that explicitly in
    pre_test_hook.sh.

    This change sets the default minimum number of workers to 4, but still
    matches workers to cores for larger servers.

    This change also moves the default evaluation to heat.cmd.engine so that
    generated configuration doesn't get a inappropriate default value.

    Change-Id: Iae6b3956bad414406d901bb2213c9ec230ff4304
    Closes-Bug: #1526045

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/266592

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/266593

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (stable/liberty)

Change abandoned by Ryan Brown (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/266592
Reason: Uploaded this change twice unintentionally - abandoning this one.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Steve Baker (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/266593
Reason: Restoring the other one and abandoning this one, the other one has the correct Change-Id

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/liberty)

Reviewed: https://review.openstack.org/266592
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=775acf52589647014ca916f2f6e43587a20f5f0b
Submitter: Jenkins
Branch: stable/liberty

commit 775acf52589647014ca916f2f6e43587a20f5f0b
Author: Steve Baker <email address hidden>
Date: Fri Dec 18 09:10:46 2015 +1300

    Make minimum default num_engine_workers>=4

    Downstream test environments are frequently having failing stacks with
    error messages like:

      MessagingTimeout: resources[0]: Timed out waiting for a reply to
      message ID ...

    These environments generally have 1 or 2 cores, so only spawn one or two
    engine workers. This deadlocks with stacks that have many nested stacks
    due to engine->engine RPC calls.

    Even our own functional tests don't work reliably with less than 4
    workers, and the workaround has been to set that explicitly in
    pre_test_hook.sh.

    This change sets the default minimum number of workers to 4, but still
    matches workers to cores for larger servers.

    This change also moves the default evaluation to heat.cmd.engine so that
    generated configuration doesn't get a inappropriate default value.

    Change-Id: Iae6b3956bad414406d901bb2213c9ec230ff4304
    Closes-Bug: #1526045
    (cherry picked from commit adb21217955e59fce5fb194635b36b5b40d6d8c8)

tags: added: in-stable-liberty
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/heat 6.0.0.0b2

This issue was fixed in the openstack/heat 6.0.0.0b2 development milestone.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

This issue was fixed in the openstack/heat 6.0.0.0b2 development milestone.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/heat 5.0.1

This issue was fixed in the openstack/heat 5.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/270586

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/kilo)

Reviewed: https://review.openstack.org/270586
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=4197f02e9e57505a701d15ba8c7dcf2e781f4bbf
Submitter: Jenkins
Branch: stable/kilo

commit 4197f02e9e57505a701d15ba8c7dcf2e781f4bbf
Author: Steve Baker <email address hidden>
Date: Thu Jan 21 16:29:52 2016 +1300

    Make minimum default num_engine_workers>=4

    Downstream test environments are frequently having failing stacks with
    error messages like:

      MessagingTimeout: resources[0]: Timed out waiting for a reply to
      message ID ...

    These environments generally have 1 or 2 cores, so only spawn one or two
    engine workers. This deadlocks with stacks that have many nested stacks
    due to engine->engine RPC calls.

    Even our own functional tests don't work reliably with less than 4
    workers, and the workaround has been to set that explicitly in
    pre_test_hook.sh.

    This change sets the default minimum number of workers to 4, but still
    matches workers to cores for larger servers.

    This change also moves the default evaluation to heat.cmd.engine so that
    generated configuration doesn't get a inappropriate default value.

    Change-Id: Iae6b3956bad414406d901bb2213c9ec230ff4304
    Closes-Bug: #1526045
    (cherry picked from commit adb21217955e59fce5fb194635b36b5b40d6d8c8)

tags: added: in-stable-kilo
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/heat 2015.1.4

This issue was fixed in the openstack/heat 2015.1.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/heat 2015.1.4 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.