SM provision failures on the computes due to rabbitmq error

Bug #1687451 reported by wenqing liang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Fix Committed
High
Ranjeet R

Bug Description

mainline-3064 mitaka on an openstack/contrail ha setup.

root@servermanager:~# server-manager status server

+---------+---------------------+------------+-------------------+
| id | status | ip_address | mac_address |
+---------+---------------------+------------+-------------------+
| server1 | provision_completed | 10.0.0.4 | 02:A7:E6:35:4B:B3 |
| server4 | provision_completed | 10.0.0.7 | 02:E3:91:A6:9A:16 |
| server5 | provision_completed | 10.0.0.8 | 02:50:D8:3D:B2:B8 |
| server8 | provision_failed | 10.0.0.11 | 02:DE:FF:F8:2C:01 |
| server9 | provision_failed | 10.0.0.12 | 02:D1:98:5C:63:7A |
| server2 | provision_completed | 10.0.0.5 | 02:3A:3E:7D:18:64 |
| server6 | provision_completed | 10.0.0.9 | 02:F3:80:97:DC:DA |
| server7 | provision_completed | 10.0.0.10 | 02:1F:A5:19:FD:DC |
| server3 | provision_completed | 10.0.0.6 | 02:4B:F2:39:B6:71 |
+---------+---------------------+------------+-------------------+
root@servermanager:~#

+--------------------------------------+------------------+---------------------+--------------------------------------+
| id | fixed_ip_address | floating_ip_address | port_id |
+--------------------------------------+------------------+---------------------+--------------------------------------+
| 344338c9-e573-48c3-ae3c-77d295622623 | 10.0.0.13 | 10.87.120.14 | 2a406a4d-973a-4cf5-93a8-46be4cc9f57f |
| 854b2880-68e2-4a28-8b19-df0e88028a20 | 10.0.0.9 | 10.87.120.10 | f38097dc-da3a-4ab6-886e-daeb8ebd2b8b |
| 0cbaf9d9-8ac0-47e8-8b2f-ca65f8af0658 | 10.0.0.11 | 10.87.120.5 | defff82c-01ff-4e90-a54f-6469ca9dd342 |
| 45444a52-bd75-4495-8f7d-31e8b4d4ce1f | 10.0.0.5 | 10.87.120.18 | 3a3e7d18-6491-412a-a074-11b97c01a0f4 |
| 85094b65-4a6c-4af6-bbc0-5312df62e4d0 | 10.0.0.8 | 10.87.120.12 | 50d83db2-b8ed-4602-8315-f6b344353ee0 |
| acc94765-7a4f-4ebf-aa31-f44acf27d618 | 10.0.0.12 | 10.87.120.8 | d1985c63-7a93-40b7-bc7e-fc23deae329f |
| 72d25453-cb95-4833-a850-8621de75ce7c | 10.0.0.4 | 10.87.120.19 | a7e6354b-b371-4bd2-abb1-cef71639ae54 |
| 8b354010-8335-46a4-bd71-d3d5d00de4b7 | 10.0.0.10 | 10.87.120.15 | 1fa519fd-dce2-4925-8d69-51681c5b5003 |
| 69853b1e-6783-4487-bdbd-8e26357b5a82 | 10.0.0.7 | 10.87.120.21 | e391a69a-16d0-4040-a347-5b0c9863deb6 |
| f5d263bf-fb4d-4103-a873-5168a576f20c | 10.0.0.6 | 10.87.120.9 | 4bf239b6-71c5-4659-89d8-c0d912018c00 |
+--------------------------------------+------------------+---------------------+--------------------------------------+

Pls see /cs-shared/bugs/1687451/debug.log.

wenqing liang (wliang)
description: updated
Jeba Paulaiyan (jebap)
tags: added: blocker sanity
Revision history for this message
Ramprakash R (ramprakash) wrote :
Download full text (6.9 KiB)

rabbitmq clustering seem to have failures. Looks like "hamon" script is trying to restart but causes the services to go down and cluster is not formed:

=INFO REPORT==== 1-May-2017::17:06:36 ===
accepting AMQP connection <0.1069.0> (192.168.10.22:51604 -> 192.168.10.21:5672)

=ERROR REPORT==== 1-May-2017::17:06:51 ===
** Generic server rabbit_node_monitor terminating
** Last message in was {'DOWN',#Ref<0.0.0.811>,process,
                               {rabbit,rabbit@server3ctrl},
                               normal}
** When Server state == {state,
                            {state,
                                {dict,2,16,16,8,80,48,
                                    {[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]},
                                    {{[],[],[],[],[],[],[],[],[],[],
                                      [[{rabbit,rabbit@server2ctrl}|
                                        #Ref<0.0.0.798>],
                                       [{rabbit,rabbit@server3ctrl}|
                                        #Ref<0.0.0.811>]],
                                      [],[],[],[],[]}}},
                                erlang},
                            [],
                            {state,
                                {dict,0,16,16,8,80,48,
                                    {[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]},
                                    {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                      [],[],[]}}},
                                erlang},
                            undefined,
                            {erlang,#Ref<0.0.0.48590>},
                            not_healing,
                            <<231,60,193,202,20,180,42,137,93,180,223,195,105,
                              11,7,107>>,
                            [{rabbit@server2ctrl,
                                 <<58,225,231,48,0,209,95,224,239,221,27,58,
                                   246,135,70,143>>},
                             {rabbit@server3ctrl,
                                 <<176,117,24,211,8,44,247,194,82,161,181,
                                   115,170,245,238,63>>}]}
** Reason for termination ==
** {bad_return_value,
       {error,
           {{badmatch,
                {error,
                    [{<<19,139,74,182,204,170,56,54,219,165,24,35,209,214,81,
                        223>>,
                      "consoleauth"},
                     {root,none}],
                    ["server2"]}},
            [{rabbit_exchange_type_topic,follow_down_get_path,2,[]},
             {rabbit_exchange_type_topic,'-remove_bindings/3-lc$^1/1-1-',1,[]},
             {rabbit_exchange_type_topic,remove_bindings,3,[]},
             {rabbit_binding,x_callback,4,[]},
             {rabbit_binding,'-process_deletions/1-fun-0-',2,[]},
             {dict,map_bucket,2,[{file,"dict.erl"},{line,460}]},
             {dict,map_bkt_list,2,[{file,"dict.erl"},{line,456}]},
             {dict,map_bkt_list,2,[{file,"dict.erl"},{line,456}]}]}}}

/<email address hidden>

=CRASH REPORT==== 1-May-2017::17:04:23 ===
  cr...

Read more...

Revision history for this message
Ramprakash R (ramprakash) wrote :

PLEASE NOTE: the above logs are not from the original setup that is reported by Wenqing, but on another setup where the same issue was observed. So the server names may not match.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/30917
Committed: http://github.com/Juniper/contrail-provisioning/commit/b3d761ff1023030e15deba145bd8ccda1678c73e
Submitter: Zuul (<email address hidden>)
Branch: master

commit b3d761ff1023030e15deba145bd8ccda1678c73e
Author: Ranjeet R <email address hidden>
Date: Mon May 1 19:14:25 2017 -0700

Fixes: SM provision failures on the computes due to rabbi...

Disabling RMQ monitoring script
Also, fixing the RMQ script to check for the
correct hosts in case it is enabled in the future.

Change-Id: I96ac3588d316350cd4b059888152bc26cea1ac65
Closes-Bug: 1687451

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.