Receiving "Connected" and "Disconnected" from corosync-notifyd too often

Bug #1439908 reported by Vasilios Tzanoudakis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Kuklin
5.1.x
Won't Fix
High
Fuel Library (Deprecated)
6.0.x
Won't Fix
High
Fuel Library (Deprecated)

Bug Description

Dear Team,

While receiving those "Connected" and "Disconnected" events from /var/log/syslog,
suddenly I received also the services from other nodes were in DOWN State.
For Some Reason all the Controller Nodes showed the other ones Services as DOWN.

Here is a paste from Node1 /var/log/syslog

http://paste.openstack.org/show/198003/

Here is a paste from Node2 /var/log/syslog
http://paste.openstack.org/show/198004/

Here is a paste from Node3 /var/log/syslog
http://paste.openstack.org/show/198005/

During this timeframe [2015-04-03 02:35:52] - [2015-04-03 02:40:22] we have noticed some icmp packet loss from our Nagios monitoring which is outside the Infrastructure. We also received a service Down for UDP DNS Check to an IP in the openstack cluster.

All Openstack Controllers nodes are connected with LACP to a Dell S4810 Switch and during this timeframe the switch didn't have any connectivity issues or something to its logs

Please let me know if this something that needs more attention or any kind of bug?

Enviroment :
Fuel 5.1.1
Ubuntu + GRE HA
Ceph FOR All

Changed in fuel:
milestone: none → 6.1
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Hi, Vasilios, I think we cannot investigate this bug without getting access to the particular environment. It seems, according to your description that you had some connectivity issues, obviously as several services were detecting connectivity failures. I will mark this bug as Incomplete. If you think, this is not right, please, provide your objections.

Changed in fuel:
status: New → Incomplete
importance: Undecided → High
Revision history for this message
Vasilios Tzanoudakis (vtzanoudakis) wrote :

thank you for your reply.

I can give you access to the enviroment if you want.

So does this evidence of connected / disconnected from corosync means that there is currently a connectivity issue between the nodes?

thank you.

Revision history for this message
Vasilios Tzanoudakis (vtzanoudakis) wrote :

The reason I am asking is that I have an OVS LACP setup with 2x10Gbit interfaces and I am aware of this bug also: https://bugs.launchpad.net/fuel/+bug/1272842 and I am trying to see if LACP is the connectivity issue.

packages installed:
neutron-plugin-openvswitch 1:2014.1.3-fuel5.1.2~mira4
neutron-plugin-openvswitch-agent 1:2014.1.3-fuel5.1.2~mira4
openvswitch-common 1.10.1+git20130823-0ubuntu3~cloud0
openvswitch-datapath-lts-saucy-dkms 1.10.2-0ubuntu2~ubuntu12.04.1
openvswitch-switch 1.10.1+git20130823-0ubuntu3~cloud0

thank you

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Vasilios, yes, LACP in OVS may not work - you need to switch to Linux LACP bonds if possible.

Revision history for this message
Vasilios Tzanoudakis (vtzanoudakis) wrote :

I have deployed with fuel 5.1.1 and LACP Balance TCP.

The node down happened unexpectedly twice this month and I couldn't understand the reason why this happened.

So the option inside fuel -->"LACP Balance TCP" isn't working 100% ? It seems to work as regarding the connectivity and the balancing. In the 10Gbit S4810p side LACP shows correct values. Take a look : http://paste.openstack.org/show/208537/

Have you used "LACP Balance TCP" in production before on 5.1.x releases? did you see same behavior?

thank you

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Vladimir Kuklin (vkuklin)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Vasilios, there obviously could be some buggy hardware and issues of openvswitch. If you see these issues, you will need to reconfigure bonds to Linux ones. We are switching to Linux bonds in 6.1 as they do not have these issues. If you need help, feel free to contact us through #fuel-dev IRC channel or Openstack-dev Mailing List with [Fuel] subject

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

So far this issue is fixed in 6.1 and is will not be fixed for 5.1 and 6.0

Changed in fuel:
status: Incomplete → Fix Committed
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.