Access denied for user 'root'@'localhost'/Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock errors during cluster deployment

Bug #1548271 reported by Volodymyr Shypyguzov
60
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Sergii Golovatiuk

Bug Description

Deployment fails with message: "Deployment has failed. Critical nodes failed: Node[1], Node[2], Node[4], Node[7], Node[9], Node[8]. Stopping the deployment process!"

Here is what I've found in the puppet logs

node-4 2016-02-21T19:07:16.077694 err: (/Stage[main]/Mysql::Server::Root_password/Mysql_user[root@localhost]/ensure) change from absent to present failed: Execution of '/usr/bin/mysql -e CREATE USER 'root'@'localhost' IDENTIFIED BY PASSWORD '*93E6920DE1624AC53BC752BDEE63FCB68CF03D09'' returned 1: ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)
node-4 2016-02-21T19:07:16.137543 err: Could not prefetch mysql_grant provider 'mysql': Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

Steps To Reproduce:
1. Run bvt_2 SWARM tests:
python run_system_test.py run -q --nologcapture --with-xunit --group=bvt_2

Expected results:
All tests passed

Actual result:
Tests failed with the error:
node-4 2016-02-21T19:07:16.077694 err: (/Stage[main]/Mysql::Server::Root_password/Mysql_user[root@localhost]/ensure) change from absent to present failed: Execution of '/usr/bin/mysql -e CREATE USER 'root'@'localhost' IDENTIFIED BY PASSWORD '*93E6920DE1624AC53BC752BDEE63FCB68CF03D09'' returned 1: ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)
node-4 2016-02-21T19:07:16.137543 err: Could not prefetch mysql_grant provider 'mysql': Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

Revision history for this message
Volodymyr Shypyguzov (vshypyguzov) wrote :
Revision history for this message
Volodymyr Shypyguzov (vshypyguzov) wrote :
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Egor Kotko (ykotko) wrote :

Reproduced on:
fuel-9.0-70-2016-03-16_15-05-59.iso

Stop reset cluster in ha mode

        Scenario:
            1. Create cluster
            2. Add 3 node with controller role
            3. Deploy cluster
            4. Stop deployment
            5. Reset settings
            6. Add 2 nodes with compute role
            7. Re-deploy cluster
            8. Run OSTF

Failed with similar error:
2016-03-16 23:29:46 ERR Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
2016-03-16 23:29:46 ERR Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

Changed in fuel:
importance: Undecided → High
milestone: none → 9.0
Revision history for this message
Egor Kotko (ykotko) wrote :
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Sergii Golovatiuk (sgolovatiuk)
status: New → Confirmed
tags: added: area-library galera
summary: - Access denied for user 'root'@'localhost' error during cluster
+ Access denied for user 'root'@'localhost'/Can't connect to local MySQL
+ server through socket '/var/run/mysqld/mysqld.sock errors during cluster
deployment
Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

I'm seeing similar problem, but with a bit other error from MySQL (nothing on access denied, just could not connect to socket)

database with status failed on node 2

node-2/var/log/puppet.log:

2016-03-19 09:09:28 +0000 Puppet (err): Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

logs snapshot is attached

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/296712

Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Alex Schultz (alex-schultz)
status: Confirmed → In Progress
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

Reproduced on 9.0 iso #101

2016-03-23 23:45:59 ERR Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -e CREATE USER 'root'@'10.109.16.0/255.255.255.0' IDENTIFIED BY PASSWORD '*688296CA65F51E16C367DA15B854FA9246AF9978'' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

2016-03-23 23:45:59 ERR (/Stage[main]/Osnailyfacter::Mysql_user_access/Osnailyfacter::Mysql_grant[240.0.0.0/255.255.0.0]/Mysql_user[root@240.0.0.0/255.255.0.0]/ensure) change from absent to present failed: Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -e CREATE USER 'root'@'240.0.0.0/255.255.0.0' IDENTIFIED BY PASSWORD '*688296CA65F51E16C367DA15B854FA9246AF9978'' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

2016-03-23 23:45:59 ERR (/Stage[main]/Osnailyfacter::Mysql_user_access/Osnailyfacter::Mysql_grant[240.0.0.0/255.255.0.0]/Mysql_user[root@240.0.0.0/255.255.0.0]/ensure) change from absent to present failed: Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -e CREATE USER 'root'@'240.0.0.0/255.255.0.0' IDENTIFIED BY PASSWORD '*688296CA65F51E16C367DA15B854FA9246AF9978'' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/296712
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=0318216c0539d7d0b7ed38df51c11711af70dbcf
Submitter: Jenkins
Branch: master

commit 0318216c0539d7d0b7ed38df51c11711af70dbcf
Author: Alex Schultz <email address hidden>
Date: Wed Mar 23 13:22:58 2016 -0600

    Update keystone db task to wait for all databases

    This change updates the keystone db task to wait for the database tasks
    on all nodes to complete before getting run. I believe that this is
    causing the galera cluster to become corrupt because the following
    condition occurs:

    - primary-db comes up
    - secondary-db comes up and syncs from primary-db
    - teriary-db comes up and syncs from secondary-db while keystone task
      runs on primary-db causing cluster to become desynced
    - primary-db ends up trying to sync from the cluster and everything
      falls apart

    The idea is to wait until the cluster has completely formed before
    attempting to start openstack deployment. Additionally, this change
    removes the one-by-one on the secondary database tasks which was added
    as an attempt to address the cluster creation issues.

    Change-Id: I4d62728922bb3a74576c419a934c1afd9187a35d
    Closes-Bug: #1548271

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
importance: High → Critical
Revision history for this message
Andrey Maximov (maximov) wrote :

Alex,

It looks like your patch will help to workaround this problem in our deployment but it doesn't solve real issue in the mysql or galera.
What if similar conditions happen in the production ? I mean what if cluster becomes desynced for whatever reason, mysql will die again?

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Andrew,

I believe this to be related to the early state of the cluster and not necessarily an issue with mysql if a node is removed/fails and rejoins. I thought we had testing for such scenarios as part of our regression suite. What I witnessed seemed to be partially caused by the order changes introduced as part of the task based deployment switch. This patch puts back in the requirements that all the database tasks be completed before they start being written to which if I remember correctly is more in line with the previous granular deployment ordering.

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Reproduced again on Neutron vlan, default storages, 3 controllers, 2 compute cluster

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
description: updated
Revision history for this message
Alex Schultz (alex-schultz) wrote :

@Andrey, That failed because of upload_cirros task failure (bug 1556068) not mysql.

2016-03-28 01:40:07 DEBUG [27979] Task time summary: upload_cirros with status failed on node 1 took 00:03:00

Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Aleksey Zvyagintsev (azvyagintsev) wrote :

Looks like issue was not fixed
Reproduced on:
https://packaging-ci.infra.mirantis.net/job/master-pkg-systest-ubuntu/442
https://packaging-ci.infra.mirantis.net/job/master-pkg-systest-ubuntu/438/

node-1 2016-03-31T09:16:36.284837 notice: Compiled catalog for node-1.test.domain.local in environment production in 4.03 seconds
node-1 2016-03-31T09:17:16.802642 err: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) Failed to call refresh: keystone-manage db_sync returned 1 instead of one of [0]
node-1 2016-03-31T09:17:16.802672 err: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) Failed to call refresh: keystone-manage db_sync returned 1 instead of one of [0]
node-1 2016-03-31T09:17:16.803843 err: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) keystone-manage db_sync returned 1 instead of one of [0]
node-1 2016-03-31T09:17:16.803843 err: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) /usr/lib/ruby/vendor_ruby/puppet/util/errors.rb:106:in `fail'
node-1 2016-03-31T09:17:16.803871 err: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) keystone-manage db_sync returned 1 instead of one of [0]
node-1 2016-03-31T09:17:16.804124 err: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) /usr/lib/ruby/vendor_ruby/puppet/util/errors.rb:106:in `fail'
node-1 2016-03-31T09:17:16.804144 err: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) /usr/lib/ruby/vendor_ruby/puppet/type/exec.rb:160:in `sync'
node-1 2016-03-31T09:17:16.804298 err: (/Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]) /usr/lib/ruby/vendor_ruby/puppet/type/exec.rb:160:in `sync'
...

node-2 2016-03-31T09:26:39.724142 err: Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)

Changed in fuel:
status: Fix Committed → Confirmed
tags: added: blocker-for-qa swarm-blocker
Revision history for this message
Aleksey Zvyagintsev (azvyagintsev) wrote :

Filtered ogs from 442 build

Changed in fuel:
assignee: Alex Schultz (alex-schultz) → Sergii Golovatiuk (sgolovatiuk)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/302975

Changed in fuel:
status: Confirmed → In Progress
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Alex Schultz (alex-schultz)
Changed in fuel:
assignee: Alex Schultz (alex-schultz) → Sergii Golovatiuk (sgolovatiuk)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/303412

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/302975
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f1c63b04ee1b42ecd1b68ec3136b54b4ae49e634
Submitter: Jenkins
Branch: master

commit f1c63b04ee1b42ecd1b68ec3136b54b4ae49e634
Author: Sergii Golovatiuk <email address hidden>
Date: Thu Apr 7 19:08:30 2016 +0200

    Create users only on primary controller

    We only need to create the users on the primary controller as all of
    /var/lib/mysql is transferred during SST. By restricting the creation of
    the users on the primary controller, we are reducing the likelihood of a
    split brain that can occur when the MyISAM tables become desynced.

    Change-Id: I732400137dd4d6ede72c18a595a3b2fd295621d3
    Closes-Bug: #1548271
    Signed-off-by: Sergii Golovatiuk <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/303412
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=ff243e64deb73675e55822a7fee6ae481b0992d4
Submitter: Jenkins
Branch: stable/mitaka

commit ff243e64deb73675e55822a7fee6ae481b0992d4
Author: Sergii Golovatiuk <email address hidden>
Date: Thu Apr 7 19:08:30 2016 +0200

    Create users only on primary controller

    We only need to create the users on the primary controller as all of
    /var/lib/mysql is transferred during SST. By restricting the creation of
    the users on the primary controller, we are reducing the likelihood of a
    split brain that can occur when the MyISAM tables become desynced.

    Change-Id: I732400137dd4d6ede72c18a595a3b2fd295621d3
    Closes-Bug: #1548271
    Signed-off-by: Sergii Golovatiuk <email address hidden>

tags: added: in-stable-mitaka
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :

During CI I have found on job[0] similar errors[1]:

<27>Apr 12 23:49:02 node-5 mysqld: 2016-04-12 23:49:02 26083 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (c255661a-0108-11e6-abf2-e6225841644b): 1 (Operation not permitted)
<27>Apr 12 23:49:02 node-5 mysqld: at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.

[0] https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.cic_maintenance_mode/76/testReport/(root)/cic_maintenance_mode_env/
[1] http://paste.openstack.org/show/493929/

Is that related or we should create a new bug?

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Hi Alexandr, it seems like a different issue.
Could you please describe a separate bug for this?

Thank you!

Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :

Thanks, Timur. I have created a new bug:
https://bugs.launchpad.net/fuel/+bug/1569990

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

By the way, it looks like we have one more new issue:
https://bugs.launchpad.net/fuel/+bug/1571763

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

We tested the fix and it works fine, the issue is completely fixed.

Verified on MOS #246 (by manual and automated tests)

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

But, of course, we have another two issues for now:
https://bugs.launchpad.net/fuel/+bug/1569990
https://bugs.launchpad.net/fuel/+bug/1571763

these issues were hidden by this bug, but now we found them.

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.