Comment 26 for bug 1474332

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

It seems that backporting the fix for the bug 1474332 (upstream bug 1375156) to MOS 5/6 was not done correctly.

Using vanilla MOS 6.0 with Ceilometer deployed (is needed for Heat's autoscaling feature):

- one controller, one compute, one mongo
  - did not bother with setting up heat domain etc, was doing things from admin

- Heat's autoscaling working as expected

- installed updated heat packages from MOS 6 updates (http://mirror.fuel-infra.org/fwm/6.0/updates/ubuntu/pool/main/h/heat/)
  - stopped heat services, installed updated heat packages (all of them, python-heat, heat-common, heat-engine, heat-api*)
  - cleaned *.pyc files
  - restarted all heat services

- autoscaling stopped working
  - ASG always thinks it is already scaling_in_progress judging by its metadata
  - grep "NOT performing scaling adjustment, cooldown" /var/log/heat/heat-engine.log

Our backport is here: https://review.fuel-infra.org/#/c/10084/
There is now a backport for the same bug in upstream stable/juno: https://review.openstack.org/#/c/231798/
Note that our backport was done before the upstream one, and comparison reveals some differences between them.

When I applied the upstream backport to the env above, Heat's autoscaling started to work again.

Therefore I propose that we revert the commits with backport in our branches and cherry-pick the upstream one:
6.0 - definitely, confirmed
6.1 - most probably, has to be verified
5.1 - most probably, but an extra effort and care has to be paid for backporting the upstream stable/juno fix to our Icehouse-based Heat of 5.1
7.0 - mos probably not (AFAIU for 7.0 fix in our branch is actually cherry-picked from stable/kilo), but better be verified.

Testing the Heat's autoscaling:
- create a stack from provided test template
  - modify parameters to your actual setup
- one nova server will be created
- assign a floating IP to this server
- login to the server over SSH with key you have specified in the template
- stress server's CPU
- after some time ceilometer alarm for high CPU usage will fire off and another server will be created
  - if stress is not released, 3rd one will be created too (max size of the autoscaling group in the test template is hard-coded to 3)
- release the CPU load
- after some time only 1 server will be left