Mysql backend readiness checks should rely on TCP connectivity instead of sockets

Bug #1394137 reported by Bogdan Dobrelya
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Aleksandr Didenko
5.1.x
Won't Fix
High
MOS Maintenance
6.0.x
Won't Fix
High
MOS Maintenance
6.1.x
Fix Committed
High
Aleksandr Didenko

Bug Description

Currently we use UNIX sockets based checks for "wait-for-haproxy-mysql-backend". That could cause deployment failure in a random manner then mysql service is not ready yet to accept TCP connections (there is no message "[Note] /usr/sbin/mysqld: ready for connections" in logs)

In order to fix it, we should use TCP connect based (management_vip:port) checks for wait-for-haproxy-mysql-backend.
Note, that the complete fix will be to move all exec checkers to puppet natve type (here is a PoC https://github.com/dmitryilyin/haproxy_backend_online), but for now we should first fix this execs as well.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Raised to critical due to many duplicates (i.e. bugs with neutron net-list failures or glance wasn't properly installed failures could be related to this one)

Changed in fuel:
importance: Undecided → Critical
milestone: none → 6.0
status: New → Triaged
assignee: nobody → Sergii Golovatiuk (sgolovatiuk)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Bogdan, I have not seen this bug reproduced anywhere. Could you add logs or duplicate bugs, please?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

here is a source (related) bug for this issue https://bugs.launchpad.net/fuel/+bug/1391213

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

I don't see how this is a Critical in 5.1.1. Do we have a regression that makes Fuel unable to deploy HA environments?

description: updated
Revision history for this message
Dmitry Ilyin (idv1985) wrote :

Here is a new type with TCP checks support https://github.com/dmitryilyin/haproxy_backend_online/

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Colleagues,
I doubt that it is High priority issue for stable/5.1.
The reason for it is that I do not see direct connection of this bug with real user experience.
Bogdan, bug you've mentioned as possible source/related one, has comments that these bugs are unrelated. So we need some more clarity and details here.

I'm not only worried that we just need to squash bugs as fast as we can to achieve HCF criteria, but also that first fix for this bug might bear one or more possible bugs which would be hard to catch. So it can be just risky to get a fix of this into maintenance branch.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Moved to 6.1 due to medium priority and SCF

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

wait-for-haproxy-mysql-backend is important for first node only. Actually it's wrapped in is_primary_controller. Once first node is deployed. It's not so important as HAProxy on every instance verifies MySQL using cluster_check script. We were not able to reproduce this bug, so I think we may close it.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

When this issue had been reproduced, the mysqld.log contained "[Note] /usr/sbin/mysqld: ready for connections" message with a *later* timestamp than puppet-apply.log contained one for "Exec[wait-for-haproxy-mysql-backend]) Evaluated". That resulted in Openstack services considered wait-for-haproxy-mysql-backend precondition had been met and started to issue connections to mysql which obviously failed due to it wasn't ready for connections yet.

This bug is floating, hence barely reproducible w/o long run testing for looped deployments

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I just put here this reproduced case rom bug #1391213 http://paste.openstack.org/show/134709/ to not search around

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Raised to high due to more duplicates incoming

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This bug looks inactive, I return it back to Fuel library team

tags: added: low-hanging-fruit
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

idv1985, I think that it worth to implement your new type here and add it to our manifest.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Thank you for clarification.

So, here is the patch https://review.openstack.org/#/c/136446/

But we should add a check for mysql as well (see original wait-for-haproxy-mysql-backend exec) but modify it in a wat that the mysql check being executed via TCP connect instead of UNIX sockets

Changed in fuel:
assignee: Bogdan Dobrelya (bogdando) → Dmitry Ilyin (idv1985)
assignee: Dmitry Ilyin (idv1985) → Bogdan Dobrelya (bogdando)
assignee: Bogdan Dobrelya (bogdando) → Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Bartlomiej Piotrowski (bpiotrowski)
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Bogdan Dobrelya (bogdando)
Changed in fuel:
assignee: Bogdan Dobrelya (bogdando) → Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Aleksandr Didenko (adidenko)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/136446
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=21c4055364dedcddbe19980ec01a2efb2d80f10c
Submitter: Jenkins
Branch: master

commit 21c4055364dedcddbe19980ec01a2efb2d80f10c
Author: Dmitry Ilyin <email address hidden>
Date: Tue Jan 13 19:29:01 2015 +0300

    Replace wait-for-haproxy-backends with puppet type

    * Implement a special puppet type to wait until haproxy backend is online.
    * Wait for mysqld backend as a requirement for nova and keystone backends
    * Switch to TCP connect for cluster check instead of UNIX socket:
      - the host to connect is the node's management IP
      - the port is the mysql HAProxy backend port, 3307
      - the user is clustercheck and its password is the same as wsrep user has
      - the timeout is 10 seconds
    * Parametrise galera cluster check class and script
    * Move galera::cluster check to opesntack module (otherwise, there are
      4 more classes to drag all the cluster check parameters through,
      including puppet-mysql module which does not have these params in upstream)
    * Fix docs for cluster check class parameters

    Related-Blueprint: pacemaker-improvements
    Closes-bug: #1394137

    Change-Id: I4eab6af7257270bf3cb2b40a34bcb21c952e8989

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.0)

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/172843

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/172845

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/5.1)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/5.1
Review: https://review.openstack.org/172845

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/6.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/6.0
Review: https://review.openstack.org/172843

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The sustaining team should backport this

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 5.1.1-updates and 6.0-updates as we don't expect new 5.1.1 and 6.0 deployments

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.