fuel-ostf heat-autoscaling fails on complicated environments with timeout error

Bug #1584190 reported by Sergey Yudin on 2016-05-20
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Timur Nurlygayanov
8.0.x
High
MOS Maintenance
Mitaka
High
Timur Nurlygayanov
Newton
High
Timur Nurlygayanov

Bug Description

Hello.

In our environments ostf fails on heat autoscaling test with timeout. Apparently when there is workload on the server 720 seconds is not enough to realise workload on VMs have changed. I'd suggest to change it to 1300 or something like this.

"""Create an AIC large HA environment with Contrail.
Scenario:
1. Revert snapshot "ready".
2. Install plugins.
3. Create a cluster.
4. Bootstrap 21 nodes.
5. Add 3 nodes with "aic-haproxy" role.
6. Add 3 nodes with "aic-controller", "aic-swift-proxy" and
"cinder" roles.
7. Add 2 nodes with "aic-identity" role.
8. Add 3 nodes with "aic-dbng" role.
9. Add 3 nodes with "mongo" role.
10. Add 3 nodes with "aic-swift-storage" role.
11. Add 1 node with "aic-compute" role.
12. Add 3 nodes with "contrail-config", "contrail-control",
"contrail-db" roles
13. Deploy the created cluster
14. Run OSTF <------ Fails here

Sergey Kreys (skreys) wrote :

Note that it's intermittent issue. But it's really annoying to hit this issue "from time to time".

Here is exact trace from this error:

2016-05-17 22:29:40 FAILURE Check stack autoscaling (fuel_health.tests.tests_platform.test_heat.HeatSmokeTests.test_autoscaling) Time limit exceeded while waiting for terminating the 2nd instance per autoscaling alarm to finish. Please refer to OpenStack logs for more details. File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 67, in testPartExecutor
    yield
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 601, in run
    testMethod()
  File "/usr/lib/python2.7/site-packages/fuel_health/tests/tests_platform/test_heat.py", line 704, in test_autoscaling
    len(instances) + 1, 720, 10, reduced_stack_name
  File "/usr/lib/python2.7/site-packages/fuel_health/common/test_mixins.py", line 183, in verify
    " Please refer to OpenStack logs for more details.")
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 666, in fail
    raise self.failureException(msg)
Step 11 failed: Time limit exceeded while waiting for terminating the 2nd instance per autoscaling alarm to finish. Please refer to OpenStack logs for more details.

Changed in fuel:
milestone: none → 9.0
assignee: nobody → Fuel QA Team (fuel-qa)
importance: Undecided → High
status: New → Confirmed
tags: added: area-qa
tags: added: area-ostf
removed: area-qa

@Sergey, could you please provide more detailed information about your environment configuration?
We already have a bug that Heat autoscaling doesn't work with SSL enabled https://bugs.launchpad.net/fuel/+bug/1576520, if this is your case, then please mark this bug as duplicate, otherwise attach snapshot or heat+ceilometer logs.

Hi team, I know about the issue and the root of the issue in many plugins and a virtual environment, if we manually change the timeout tests pass. So, the root of the issue in timeouts, because for some large virtual environments it is not enough time to pass the checks.

Max Lvov (usrleon) wrote :

I've got same error on MOS 8.0 MU1, 3 controllers, 5 computes, 4 ceph, 3 mongo. ANd what is most probably important we have 4 additional external networks configured

Max Lvov (usrleon) wrote :

And ldap plugin used!

Illia Polliul (ipolliul) wrote :

Hi, looks like problem is here:

root@node-3:~# cat /etc/ceilometer/pipeline.yaml
---
sources:
    - name: meter_source
      interval: 600
      meters:
            - "*"
            - "!volume.create.*"
            - "!volume.delete.*"
            - "!volume.update.*"
            - "!volume.resize.*"
            - "!volume.attach.*"
            - "!volume.detach.*"
            - "!snapshot.create.*"
            - "!snapshot.delete.*"
            - "!identity.authenticate.*"
            - "!storage.api.request"
      sinks:
          - meter_sink
sinks:
    - name: meter_sink
      transformers:
      publishers:
          - notifier://

So polling interval on computes is 10 minutes, while heat template expects some meters to change during 1 minute.

https://github.com/openstack/fuel-ostf/blob/stable/mitaka/fuel_health/etc/heat_autoscaling_neutron.yaml#L55

Need to adjust either one or another.

Roman Rufanov (rrufanov) wrote :

customer found on MOS 8.0, added milestone

Illia, the timeout will be 600 seconds because of configuration of ceilometer and this is bad idea to change it to 60 seconds for old releases because of the performance issues in Ceilometer.

We need to just change the timeout for test. I'm going to prepare the fix.

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/363025

Changed in fuel:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/363026
Committed: https://git.openstack.org/cgit/openstack/fuel-ostf/commit/?id=923c1231c4368fb18834c5a5e683dfca0860cd53
Submitter: Jenkins
Branch: master

commit 923c1231c4368fb18834c5a5e683dfca0860cd53
Author: Timur Nurlygayanov <email address hidden>
Date: Tue Aug 30 19:43:14 2016 +0300

    Increased timeout for Heat autoscaling test

    The autoscaling timeout in Ceilometer is 600 seconds,
    we need to make sure autoscaling will make an alarm
    in Ceilometer withing timeout*2 + several seconds to
    pass this test in 100% of cases.

    Change-Id: I6b2a43e4acb4b8d76641ee314ca19e67b999500a
    Closes-Bug: #1584190

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/363025
Committed: https://git.openstack.org/cgit/openstack/fuel-ostf/commit/?id=f646b843ace9636e5729ae477d4e367de5a05aa1
Submitter: Jenkins
Branch: stable/mitaka

commit f646b843ace9636e5729ae477d4e367de5a05aa1
Author: Timur Nurlygayanov <email address hidden>
Date: Tue Aug 30 19:40:10 2016 +0300

    Increased timeout for Heat autoscaling test

    The autoscaling timeout in Ceilometer is 600 seconds,
    we need to make sure autoscaling will make an alarm
    in Ceilometer withing timeout*2 + several seconds to
    pass this test in 100% of cases.

    Change-Id: I6b2a43e4acb4b8d76641ee314ca19e67b999500a
    Closes-Bug: #1584190

tags: added: on-verification
tags: removed: on-verification

Reviewed: https://review.openstack.org/363021
Committed: https://git.openstack.org/cgit/openstack/fuel-ostf/commit/?id=3fa12c02d6d8e6c1b93c16d10730e773f5b78a00
Submitter: Jenkins
Branch: stable/8.0

commit 3fa12c02d6d8e6c1b93c16d10730e773f5b78a00
Author: Timur Nurlygayanov <email address hidden>
Date: Tue Aug 30 19:34:31 2016 +0300

    Increased timeout for Heat autoscaling test

    The autoscaling timeout in Ceilometer is 600 seconds,
    we need to make sure autoscaling will make an alarm
    in Ceilometer withing timeout*2 + several seconds to
    pass this test in 100% of cases.

    Change-Id: I6b2a43e4acb4b8d76641ee314ca19e67b999500a
    Closes-Bug: #1584190

This issue was fixed in the openstack/fuel-ostf 10.0.0rc1 release candidate.

This issue was fixed in the openstack/fuel-ostf 10.0.0 release.

tags: added: on-verification

Verified on MOS 8.0 + MU4 updates.

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers