[bvt]After deployment nailgun says that some controllers are offline, that caused ostf failures

Bug #1429807 reported by Egor Kotko
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Aleksandr Didenko

Bug Description

{"build_id": "2015-03-09_07-28-41", "ostf_sha": "8df5f2fcdae3bc9ea7d700ffd64db820baf51914", "build_number": "145", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-03-09_07-28-41", "ostf_sha": "8df5f2fcdae3bc9ea7d700ffd64db820baf51914", "build_number": "145", "api": "1.0", "nailgun_sha": "a9a6578a649a2a006c4810b3d0aa6876ac6e8b83", "production": "docker", "python-fuelclient_sha": "4eb787f1ad969bd23c93d192865543dbd45a8626", "astute_sha": "2d61ee42ec6dae3181d292c7769d32e40d463893", "feature_groups": ["experimental"], "release": "6.1", "fuelmain_sha": "0e45b31db1677651d6ddb1c852d62ebfd8875dcd", "fuellib_sha": "d26f3d60cd509865295652ae9115527ea276ae83"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "a9a6578a649a2a006c4810b3d0aa6876ac6e8b83", "production": "docker", "python-fuelclient_sha": "4eb787f1ad969bd23c93d192865543dbd45a8626", "astute_sha": "2d61ee42ec6dae3181d292c7769d32e40d463893", "feature_groups": ["experimental"], "release": "6.1", "fuelmain_sha": "0e45b31db1677651d6ddb1c852d62ebfd8875dcd", "fuellib_sha": "d26f3d60cd509865295652ae9115527ea276ae83"}

Sometimes OSTF "Check RabbitMQ is available" failed with error:
Number of controllers is not equal to number of cluster nodes - reasonable to add timeout on:
 if len(self._controllers) != self.amqp_clients[0].list_nodes():
in fuel-ostf/fuel_health/tests/ha/test_rabbit.py

Paste of error:
http://paste.openstack.org/show/190949/

Tags: ostf
Revision history for this message
Egor Kotko (ykotko) wrote :
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Sometimes after deployment nailgun says that some controllers are ofline, as result we fail on different tests (sometimes with error like can not set proxy with exceptions There is no online controllers) sometimes like here
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|--------|---------------------|---------|-------------|-------------------|------------|---------------|--------|---------
3 | ready | slave-04_compute | 1 | 10.109.20.6 | 64:db:c0:0d:71:34 | compute | | True | 1
1 | ready | slave-05_compute | 1 | 10.109.20.7 | 64:be:a4:d3:47:73 | compute | | True | 1
4 | ready | slave-01_controller | 1 | 10.109.20.3 | 64:33:fe:e0:61:aa | controller | | False | 1
5 | ready | slave-02_controller | 1 | 10.109.20.4 | 64:2b:8e:a4:2a:66 | controller | | True | 1
2 | ready | slave-03_controller | 1 | 10.109.20.5 | 64:29:63:f4:91:13 | controller | | True | 1
[root@nailgun ostf]# ssh node-1
Warning: Permanently added 'node-1' (RSA) to the list of known hosts.
Last login: Wed Mar 11 13:03:57 2015 from 10.109.20.2
[root@node-1 ~]# rabbitmqctl cluster_status
-bash: rabbitmqctl: command not found
[root@node-1 ~]# exit
logout
Connection to node-1 closed.
[root@nailgun ostf]# ssh node-2
Warning: Permanently added 'node-2' (RSA) to the list of known hosts.
Last login: Wed Mar 11 13:03:29 2015 from 10.109.20.2
[root@node-2 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-2' ...
[{nodes,[{disc,['rabbit@node-2','rabbit@node-4','rabbit@node-5']}]},
 {running_nodes,['rabbit@node-4','rabbit@node-5','rabbit@node-2']},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]
...done.
[root@node-2 ~]# exit
logout
Connection to node-2 closed.
[root@nailgun ostf]# ssh node-5
Warning: Permanently added 'node-5' (RSA) to the list of known hosts.
Last login: Wed Mar 11 13:03:49 2015 from 10.109.20.2
[root@node-5 ~]# exit
http://jenkins-product.srt.mirantis.net:8080/job/6.1.centos.bvt_1/176/console
There is no revert just deployment and run ostf, so it is not clear why nailgun move some controllers offline

Changed in fuel:
importance: Medium → High
assignee: Fuel QA Team (fuel-qa) → Fuel Python Team (fuel-python)
status: New → Confirmed
summary: - Sometimes OSTF "Check RabbitMQ is available" failed with error -
- reasonable to add timeuot
+ [bvt]After deployment nailgun says that some controllers are offline,
+ that caused ostf failures
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/164260

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Aleksandr Didenko (adidenko)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/164260
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=581d8b03b0a013fcbe340c75892105bb0243a78a
Submitter: Jenkins
Branch: master

commit 581d8b03b0a013fcbe340c75892105bb0243a78a
Author: Aleksandr Didenko <email address hidden>
Date: Fri Mar 13 18:46:04 2015 +0200

    Skip downed interfaces in master_ip_and_mac method

    In case we have any interface down (like generic bond0 interface),
    we fail with 'undefined method each for nil:NilClass' error in the
    middle of _master_ip_and_mac method. And it can happen before we
    find needed info thus failing to update Nailgun.

    So we just need to skip interfaces that do not have 'addresses'
    key.

    Change-Id: Ieb1de216931fa9334f07bd21f1a4096bf429afb0
    Closes-bug: #1429807

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Dmitry Tyzhnenko (dtyzhnenko) wrote :

Since fix commit CI hasn't catched this error

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.