One SWARM Heat test failed

Bug #1591327 reported by Timur Nurlygayanov on 2016-06-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
MOS Ceilometer
Mitaka
High
Igor Degtiarov
Newton
High
MOS Ceilometer

Bug Description

Failed jobs:
https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.services_ha/135/testReport/(root)/deploy_heat_ha/deploy_heat_ha/

https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.services_ha_one_controller/136/testReport/(root)/deploy_heat_ha_one_controller_neutron/deploy_heat_ha_one_controller_neutron/

Steps To Reproduce:
1. Take MOS 9.0 ISO
2. Create environment with 1 controller, 1 compute, install Ceilometer, Cinder with LVM, Neutron VLANs
3. Run OSTF test "Heat Autoscaling"

Observed Result:
Test failed, we can see many tracebacks in ceilometer logs.

The issue affects 2 SWARM tests and it reproduced only once. We can't reproduce it manually on QA lab.

Changed in fuel:
milestone: none → 9.0
assignee: nobody → Alexander Nagovitsyn (gluk12189)
importance: Undecided → High
status: New → Incomplete
tags: added: swarm-blocker
tags: added: swarm-fail
removed: swarm-blocker

We can see that only one Heat-related SWARM test failed now (on the latest #467 build), we are going to investigate the root of the issue.

summary: - Two SWARM Heat tests failed
+ One SWARM Heat test failed
Changed in fuel:
status: Incomplete → Confirmed

I revert build and check tests and logs.
Auto Scaling works fine, rollback too. (Checked multiple times)

fuel --env 1 health --check tests_platform
WARNING: ostf credentials are going to be mandatory in the next release.
[ 1 of 14] [success] 'Ceilometer test to check alarm state and get Nova notifications' (46.05 s)
....
[10 of 14] [success] 'Advanced stack actions: suspend, resume and check' (84.73 s)
[11 of 14] [success] 'Check stack autoscaling' (966.2 s)
....

However, the test constantly fails on swarm.
I will continue research this problem with heat team

I think the problem in the SWARM server load
srv31-bud:~$ free -m
             total used free shared buffers cached
Mem: 64397 50296 14101 1135 235 46544
-/+ buffers/cache: 3516 60881
Swap: 15623 0 15623

virsh list
 Id Name State
----------------------------------------------------

Deploed 3 nodes (8gb x3) + 1 master node (3-4gb)
If the server is busy he has a swap problem. Nodes working slowly

Hi CI team, we need to run this test on the hardware server with required amount of RAM.
Now we can see that server has 64 Gb of RAM but anyway we use SWAP, in the result this test failed.
Could you please make sure that we don't use SWAP during SWARM tests execution? It will allow to avoid extra failures.

Alexander, could you please describe how many RAM we need to successfully run this SWARM test?

Aleksandra Fedorova (bookwar) wrote :

Timur, please state explicitly which amount of RAM is required for this test.

From comment https://bugs.launchpad.net/fuel/+bug/1591327/comments/4 I see that test was run on 64Gb RAM and there is no swap usage.

Do we have zabbix statistics on that host?
Load seems ok on that machine https://product-ci.infra.mirantis.net/computer/srv32-bud.infra.mirantis.net/load-statistics?type=hour

The latest jobs are green for MOS 9.0 #473 - #495, marked as Invalid.

Sorry, the job is still red:

https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.services_ha_one_controller/

On the host where we run the latest Heat tests:

tnurlygayanov@srv31-bud:~$ free -m
             total used free shared buffers cached
Mem: 64397 63979 418 1488 49 29091
-/+ buffers/cache: 34838 29559
Swap: 15623 0 15623

All RAM is full, atop shows SWAP as RED.

It looks like the configuration of the servers is correct, Alexander, we need to investigate the root of the failed test cases.

I'm revert build and launch all heat_ha tests:

[ 1 of 14] [success] 'Ceilometer test to check alarm state and get Nova notifications' (48.12 s)
[ 2 of 14] [success] 'Ceilometer test to check events and traits' (61.79 s)
[ 3 of 14] [success] 'Ceilometer test to check notifications from Glance' (8.869 s)
[ 4 of 14] [success] 'Ceilometer test to check notifications from Keystone' (10.67 s)
[ 5 of 14] [success] 'Ceilometer test to check notifications from Neutron' (19.44 s)
[ 6 of 14] [skipped] 'Ceilometer test to check events from Cinder' (0.02088 s) There are no storage nodes for volumes.
[ 7 of 14] [success] 'Ceilometer test to create, check and list samples' (3.192 s)
[ 8 of 14] [success] 'Ceilometer test to create, update, check and delete alarm' (109.4 s)
[ 9 of 14] [success] 'Typical stack actions: create, delete, show details, etc.' (60.33 s)
[10 of 14] [success] 'Advanced stack actions: suspend, resume and check' (97.54 s)
[11 of 14] [success] 'Check stack autoscaling' (1.311e+03 s)
[12 of 14] [success] 'Check stack rollback' (120.0 s)
[13 of 14] [success] 'Update stack actions: inplace, replace and update whole template' (155.3 s)
...

I'll try to increase the timers, but tests are working fine

Download full text (3.4 KiB)

From Ceilometer logs (the same time when we run this test for the first time:

2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client Traceback (most recent call last):
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/ceilometer/nova_client.py", line 52, in with_logging
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client return func(*args, **kwargs)
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/ceilometer/nova_client.py", line 157, in instance_get_all_by_host
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client search_opts=search_opts))
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/novaclient/v2/servers.py", line 749, in list
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client "servers")
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/novaclient/base.py", line 242, in _list
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client resp, body = self.api.client.get(url)
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 173, in get
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client return self.request(url, 'GET', **kwargs)
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 89, in request
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client **kwargs)
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 331, in request
2016-07-26 02:49:18.913 23978 ERROR ceilometer.nova_client resp = super(LegacyJsonAdapter, self).request(*args, **kwargs)
2016-07-26 02:49:18.913 239
<134>Jul 26 02:49:18 node-3 ceilometer-polling: 2016-07-26 02:49:18.914 23978 INFO ceilometer.agent.manager [req-bbb6b56b-7eb9-4720-9eba-8583d6e574d2 admin - - - -] Skip pollster disk.device.read.requests.rate, no resources found this cycle
<134>Jul 26 02:49:18 node-3 ceilometer-polling: 2016-07-26 02:49:18.915 23978 INFO ceilometer.agent.manager [req-bbb6b56b-7eb9-4720-9eba-8583d6e574d2 admin - - - -] Skip pollster disk.device.write.requests.rate, no resources found this cycle
<134>Jul 26 02:49:18 node-3 ceilometer-polling: 2016-07-26 02:49:18.915 23978 INFO ceilometer.agent.manager [req-bbb6b56b-7eb9-4720-9eba-8583d6e574d2 admin - - - -] Skip pollster cpu_util, no resources found this cycle
<134>Jul 26 02:49:18 node-3 ceilometer-polling: 2016-07-26 02:49:18.916 23978 INFO ceilometer.agent.manager [req-bbb6b56b-7eb9-4720-9eba-8583d6e574d2 admin - - - -] Skip pollster disk.read.bytes, no resources found this cycle
<134>Jul 26 02:49:18 node-3 ceilometer-polling: 2016-07-26 02:49:18.916 23978 INFO ceilometer.agent.manager [req-bbb6b56b-7eb9-4720-9eba-8583d6e574d2 admin - - - -] Skip pollster disk.read.requests, no resources found this cycle
<134>Jul 26 02:49:18 node-3 ceilometer-polling: 2016-07-26 02:49:18.917 23978 INFO ceilometer.agent.manager [req-bbb6b56b-7eb9-4720-9eba-8583d6...

Read more...

Download full text (3.7 KiB)

From /var/log/ceilometer/ceilometer-polling.log:

2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client Traceback (most recent call last):
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/ceilometer/nova_client.py", line 52, in with_logging
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client return func(*args, **kwargs)
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/ceilometer/nova_client.py", line 157, in instance_get_all_by_host
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client search_opts=search_opts))
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/novaclient/v2/servers.py", line 749, in list
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client "servers")
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/novaclient/base.py", line 242, in _list
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client resp, body = self.api.client.get(url)
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 173, in get
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client return self.request(url, 'GET', **kwargs)
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 89, in request
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client **kwargs)
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 331, in request
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client resp = super(LegacyJsonAdapter, self).request(*args, **kwargs)
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 98, in request
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client return self.session.request(url, method, **kwargs)
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/positional/__init__.py", line 94, in inner
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client return func(*args, **kwargs)
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/keystoneauth1/session.py", line 391, in request
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client base_url = self.get_endpoint(auth, **endpoint_filter)
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/keystoneauth1/session.py", line 661, in get_endpoint
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client return auth.get_endpoint(self, **kwargs)
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dist-packages/keystoneauth1/identity/base.py", line 214, in get_endpoint
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client service_name=service_name)
2016-07-26 02:39:18.915 23978 ERROR ceilometer.nova_client File "/usr/lib/python2.7/dis...

Read more...

It looks like a ceilometer configuration issue.

Ceilometer team, could you please take a look?

Thank you!

Download full text (3.5 KiB)

In keystone logs:

<15>Jul 26 02:30:06 node-1 keystone-admin: 2016-07-26 02:30:06.001 9963 DEBUG keystone.common.manager [req-b3ca4261-ddd1-4bbd-b61c-1b2a78dcf345 - - - - -] Failed to load 'keystone.identity.backends.sql.Identity'
 using stevedore: No 'keystone.identity' driver found, looking for 'keystone.identity.backends.sql.Identity' load_driver /usr/lib/python2.7/dist-packages/keystone/common/manager.py:77
<12>Jul 26 02:30:06 node-1 keystone-admin: 2016-07-26 02:30:06.074 9963 WARNING keystone.common.manager [req-b3ca4261-ddd1-4bbd-b61c-1b2a78dcf345 - - - - -] Deprecated: Direct import of driver 'keystone.identity
.backends.sql.Identity' is deprecated as of Liberty in favor of its entrypoint from 'keystone.identity' and may be removed in N.
<15>Jul 26 02:30:06 node-1 keystone-admin: 2016-07-26 02:30:06.079 9963 DEBUG keystone.notifications [req-b3ca4261-ddd1-4bbd-b61c-1b2a78dcf345 - - - - -] Callback: `keystone.identity.core.Manager._domain_deleted
` subscribed to event `identity.domain.deleted`. register_event_callback /usr/lib/python2.7/dist-packages/keystone/notifications.py:253
<12>Jul 26 02:30:06 node-1 keystone-admin: 2016-07-26 02:30:06.081 9963 WARNING keystone.assignment.core [req-b3ca4261-ddd1-4bbd-b61c-1b2a78dcf345 - - - - -] Deprecated: Use of the identity driver config to auto
matically configure the same assignment driver has been deprecated, in the "O" release, the assignment driver will need to be expicitly configured if different than the default (SQL).
<15>Jul 26 02:30:06 node-1 keystone-admin: 2016-07-26 02:30:06.101 9963 DEBUG keystone.notifications [req-b3ca4261-ddd1-4bbd-b61c-1b2a78dcf345 - - - - -] Callback: `keystone.assignment.core.Manager._delete_domai
n_assignments` subscribed to event `identity.domain.deleted`. register_event_callback /usr/lib/python2.7/dist-packages/keystone/notifications.py:253
<15>Jul 26 02:30:06 node-1 keystone-admin: 2016-07-26 02:30:06.285 9963 DEBUG keystone.common.manager [req-b3ca4261-ddd1-4bbd-b61c-1b2a78dcf345 - - - - -] Failed to load 'keystone.policy.backends.sql.Policy' usi
ng stevedore: No 'keystone.policy' driver found, looking for 'keystone.policy.backends.sql.Policy' load_driver /usr/lib/python2.7/dist-packages/keystone/common/manager.py:77
<12>Jul 26 02:30:06 node-1 keystone-admin: 2016-07-26 02:30:06.307 9963 WARNING keystone.common.manager [req-b3ca4261-ddd1-4bbd-b61c-1b2a78dcf345 - - - - -] Deprecated: Direct import of driver 'keystone.policy.b
ackends.sql.Policy' is deprecated as of Liberty in favor of its entrypoint from 'keystone.policy' and may be removed in N.
<15>Jul 26 02:30:06 node-1 keystone-admin: 2016-07-26 02:30:06.330 9963 DEBUG keystone.common.manager [req-b3ca4261-ddd1-4bbd-b61c-1b2a78dcf345 - - - - -] Failed to load 'keystone.contrib.revoke.backends.sql.Rev
oke' using stevedore: No 'keystone.revoke' driver found, looking for 'keystone.contrib.revoke.backends.sql.Revoke' load_driver /usr/lib/python2.7/dist-packages/keystone/common/manager.py:77
<12>Jul 26 02:30:06 node-1 keystone-admin: 2016-07-26 02:30:06.351 9963 WARNING oslo_log.versionutils [req-b3ca4261-ddd1-4bbd-b61c-1b2a78dcf345 - - - - -] Deprecated: keystone.contrib.revoke.backends.sql.Revoke
is d...

Read more...

I'm trying to reproduce it manually on my lab.

I've successfully reproduce the issue on my lab manually.

Steps To Reproduce:
1. Take MOS 9.0 ISO
2. Create environment with 1 controller, 1 compute, install Ceilometer, Cinder with LVM, Neutron VLANs
3. Run OSTF test "Heat Autoskeiling"

Observed Result:
Test failed, we can see many tracbacks in ceilometer logs.

Fix proposed to branch: master
Review: https://review.openstack.org/348192

Changed in fuel:
assignee: MOS Ceilometer (mos-ceilometer) → Timur Nurlygayanov (tnurlygayanov)
status: Confirmed → In Progress

So my fix doesn't work, the issue reproduced even after successfully passed Ceilometer OSTF tests.

Igor Degtiarov (idegtiarov) wrote :

I've done some investigation while environment is deployed:

1. ceilometer-polling agent initialized nova-client before Nova completed all starting processes, and in discovery instances we see `internalURL not found`
2. The same error notification is received on all request for nova-api till Nova starting process continues.
3. When Nova start works proper ceilometer nova-client keeps returning errors and such behavior is changed approximately in one hour after service was started.

Any manipulations like restarting polling service or reverting environment fix this issue. It is seems that starting ceilometer-polling agent explicitly after all nova services will be a fix.

Change abandoned by Timur Nurlygayanov (<email address hidden>) on branch: master
Review: https://review.openstack.org/348192
Reason: Fix doesn't help.

Change abandoned by Timur Nurlygayanov (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/348195
Reason: Fix doesn't help.

Fix proposed to branch: stable/mitaka
Change author: Igor Degtiarov <email address hidden>
Review: https://review.fuel-infra.org/23778

Change abandoned by Igor Degtiarov <email address hidden> on branch: stable/mitaka
Review: https://review.fuel-infra.org/23778

Fix proposed to branch: 9.0/mitaka
Change author: Igor Degtiarov <email address hidden>
Review: https://review.fuel-infra.org/23795

description: updated

Reviewed: https://review.fuel-infra.org/23795
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: c1c096fa1bf8f9f66cd7f0ee445cbfea39f24334
Author: Igor Degtiarov <email address hidden>
Date: Wed Aug 3 14:52:27 2016

Fixed issue with Heat system tests

We found out that early initialized nova client in ceilometer-polling agent
could contain incorrect parameters and doesn't support communication
with Nova services. New client is initialized if exception from previos
one is received.

Change-Id: If707d7d70d9efbb4ac891f406ad1b58ab5dc599a
Closes-Bug: #1591327

tags: added: on-verification
tags: removed: on-verification

The fix verified on 9.1, this test failed but now because of another issue.

Fix proposed to branch: mcp/newton
Change author: Igor Degtiarov <email address hidden>
Review: https://review.fuel-infra.org/33185

Fix proposed to branch: 11.0/ocata
Change author: Igor Degtiarov <email address hidden>
Review: https://review.fuel-infra.org/33774

Fix proposed to branch: mcp/ocata
Change author: Igor Degtiarov <email address hidden>
Review: https://review.fuel-infra.org/34486

Change abandoned by Ilya Tyaptin <email address hidden> on branch: mcp/ocata
Review: https://review.fuel-infra.org/34486

Change abandoned by Roman Podoliaka <email address hidden> on branch: 11.0/ocata
Review: https://review.fuel-infra.org/33774
Reason: we don't use 11.0/ocata anymore - mcp/ocata is the correct branch name

Reviewed: https://review.fuel-infra.org/33185
Submitter: Pkgs Jenkins <email address hidden>
Branch: mcp/newton

Commit: ce93ec46b4d81a2ac76688d69ed87827a2abbd9c
Author: Igor Degtiarov <email address hidden>
Date: Tue May 2 11:25:42 2017

Fixed issue with Heat system tests

We found out that early initialized nova client in ceilometer-polling agent
could contain incorrect parameters and doesn't support communication
with Nova services. New client is initialized if exception from previos
one is received.

Change-Id: If707d7d70d9efbb4ac891f406ad1b58ab5dc599a
Closes-Bug: #1591327

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers