Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock'

Bug #1413620 reported by Björn Tillenius
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Percona Cluster Charm
Triaged
Low
Unassigned
percona-cluster (Juju Charms Collection)
Invalid
Low
Unassigned

Bug Description

I rebooted the bootstrap node of an OpenStack deployment, which acted as neutron-gateway, as well has having mysql, cinder and nova-cloud-controller in LXCs.

After the node was rebooted, the mysql service (percona-cluster with one mysql instance) reported an error in the config-changed hook:

unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed Traceback (most recent cal
l last):
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed File "/var/lib/juju/agen
ts/unit-mysql-0/charm/hooks/config-changed", line 420, in <module>
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed main()
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed File "/var/lib/juju/agen
ts/unit-mysql-0/charm/hooks/config-changed", line 414, in main
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed hooks.execute(sys.argv
)
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed File "/var/lib/juju/agen
ts/unit-mysql-0/charm/hooks/charmhelpers/core/hookenv.py", line 528, in execute
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed self._hooks[hook_name]
()
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed File "/var/lib/juju/agen
ts/unit-mysql-0/charm/hooks/config-changed", line 151, in config_changed
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed shared_db_changed(r_id
, unit)
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed File "/var/lib/juju/agen
ts/unit-mysql-0/charm/hooks/config-changed", line 283, in shared_db_changed
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed password = configure_d
b(hostname, database, username)
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed File "/var/lib/juju/agen
ts/unit-mysql-0/charm/hooks/mysql.py", line 208, in configure_db
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed m_helper.connect(passw
ord=get_mysql_root_password())
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed File "/var/lib/juju/agen
ts/unit-mysql-0/charm/hooks/mysql.py", line 48, in connect
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed passwd=password)
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed File "/usr/lib/python2.7
/dist-packages/MySQLdb/__init__.py", line 81, in Connect
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed return Connection(*arg
s, **kwargs)
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed File "/usr/lib/python2.7
/dist-packages/MySQLdb/connections.py", line 187, in __init__
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed super(Connection, self
).__init__(*args, **kwargs2)
unit-mysql-0: 2015-01-22 14:29:08 INFO config-changed _mysql_exceptions.OperationalError: (2002, "Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)")

After doing 'juju resolved --retry mysql/0', everything got back to normal.

Revision history for this message
Björn Tillenius (bjornt) wrote :
Revision history for this message
Björn Tillenius (bjornt) wrote :
tags: added: cloud-installer landscape
Revision history for this message
Mario Splivalo (mariosplivalo) wrote :

Hi.

Can you provide us with the mysql logs (.err files residing in mysql datadir) too? This looks like a race condition (mysql was not up when juju wanted to run config_change hook on it), but we'll need more info to investigate. It would be nice to have 'juju status' output from before you did 'juju resolved', do you, maybe, still have it saved? Also, if you could bundle (in a .zip or similar archive) all the log files from the failed unit, that would also help (as allmachines.log is a bit hard to parse).

Have you tried reproducing this issue? If so, does it fail every time?
Do you maybe have a juju delployer configuration file that you can share (assuming that you did use juju deployer to deploy your stack)?

There is a updated version of the percona-cluster charm available, it might be wise to upgrade the current charm version (make sure you have your mysql databases backed up, just in case!).

Changed in percona-cluster (Juju Charms Collection):
status: New → Incomplete
Revision history for this message
David Britton (dpb) wrote : Re: [Bug 1413620] Re: Error after reboot: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
Download full text (9.3 KiB)

I see something like the following on all nodes of the cluster.

150513 18:23:29 mysqld_safe Starting mysqld daemon with databases from
/var/lib/mysql
150513 18:23:29 mysqld_safe WSREP: Running position recovery with
--log_error='/var/lib/mysql/wsrep_recovery.x65OEf'
--pid-file='/var/lib/mysql/juju-machine-13-lxc-1-recover.pid'
150513 18:23:31 mysqld_safe WSREP: Recovered position
b68fd0cd-f8ff-11e4-bf32-8b63f4f6930e:265378
150513 18:23:31 [Note] WSREP: wsrep_start_position var submitted:
'b68fd0cd-f8ff-11e4-bf32-8b63f4f6930e:265378'
150513 18:23:31 [Note] WSREP: Read nil XID from storage engines, skipping
position init
150513 18:23:31 [Note] WSREP: wsrep_load(): loading provider library
'/usr/lib/libgalera_smm.so'
150513 18:23:31 [Note] WSREP: wsrep_load(): Galera 2.8(r165) by Codership
Oy <email address hidden> loaded successfully.
150513 18:23:31 [Note] WSREP: Found saved state:
b68fd0cd-f8ff-11e4-bf32-8b63f4f6930e:-1
150513 18:23:31 [Note] WSREP: Reusing existing
'/var/lib/mysql//galera.cache'.
150513 18:23:31 [Note] WSREP: Passing config to GCS: base_host =
172.27.101.175; base_port = 4567; cert.log_conflicts = no; gcache.dir =
/var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0;
gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M;
gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16;
gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle =
0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit =
0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S;
replicator.commit_order = 3
150513 18:23:31 [Note] WSREP: Assign initial position for certification:
265378, protocol version: -1
150513 18:23:31 [Note] WSREP: wsrep_sst_grab()
150513 18:23:31 [Note] WSREP: Start replication
150513 18:23:31 [Note] WSREP: Setting initial position to
b68fd0cd-f8ff-11e4-bf32-8b63f4f6930e:265378
150513 18:23:31 [Note] WSREP: protonet asio version 0
150513 18:23:31 [Note] WSREP: backend: asio
150513 18:23:31 [Note] WSREP: GMCast version 0
150513 18:23:31 [Note] WSREP: (28905a6f-f99d-11e4-8794-afa4dadb86b5, 'tcp://
0.0.0.0:4567') listening at tcp://0.0.0.0:4567
150513 18:23:31 [Note] WSREP: (28905a6f-f99d-11e4-8794-afa4dadb86b5, 'tcp://
0.0.0.0:4567') multicast: , ttl: 1
150513 18:23:31 [Note] WSREP: EVS version 0
150513 18:23:31 [Note] WSREP: PC version 0
150513 18:23:31 [Note] WSREP: gcomm: connecting to group 'juju_cluster',
peer '172.27.101.175:,172.27.101.134:,172.27.101.146:'
150513 18:23:31 [Warning] WSREP: (28905a6f-f99d-11e4-8794-afa4dadb86b5,
'tcp://0.0.0.0:4567') address 'tcp://172.27.101.175:4567' points to own
listening address, blacklisting
150513 18:23:31 [Note] WSREP: (28905a6f-f99d-11e4-8794-afa4dadb86b5, 'tcp://
0.0.0.0:4567') address 'tcp://172.27.101.175:4567' pointing to uuid
28905a6f-f99d-11e4-8794-afa4dadb86b5 is blacklisted, skipping
150513 18:23:31 [Note] WSREP: declaring
264c0c94-f99d-11e4-8da9-0345cc984ee5 stable
150513 18:23:31 [Warning] WSREP: no nodes coming from prim view, prim not
possible
150513 18:23:31 [Note] WSREP:
view(view_id(NON_PRIM,264c0c94-f99d-11e4-8da9-0345cc984ee5,2) memb {
        264c0c94-f99d-11e4-8da9-0345cc984ee5,
      ...

Read more...

Changed in percona-cluster (Juju Charms Collection):
status: Incomplete → Confirmed
Revision history for this message
David Britton (dpb) wrote : Re: Error after reboot: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

Charm version I tested with: cs:trusty/percona-cluster-21

ii percona-xtrabackup 2.1.8-1 amd64 Open source backup tool for InnoDB and XtraDB
ii percona-xtradb-cluster-client-5.5 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 amd64 Percona Server database client binaries
ii percona-xtradb-cluster-common-5.5 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 all Percona Server database common files
ii percona-xtradb-cluster-galera-2.x 165-0ubuntu1 amd64 Synchronous multi-master replication plugin for transactional applications
ii percona-xtradb-cluster-server-5.5 5.5.37-25.10+dfsg-0ubuntu0.14.04.1 amd64 Percona Server database server binaries

David Britton (dpb)
tags: added: kanban-cross-team
David Britton (dpb)
tags: removed: kanban-cross-team
Revision history for this message
James Page (james-page) wrote :

Race between config-changed and the mysql instance actually starting and being re-synced; The charm has a helper for determining whether things are up and in sync, we should guard using that OR potenially wrap db ops calls in retry_on_exception for OperationalErrors.

Changed in percona-cluster (Juju Charms Collection):
importance: Undecided → Low
status: Confirmed → Triaged
James Page (james-page)
Changed in charm-percona-cluster:
importance: Undecided → Low
status: New → Triaged
Changed in percona-cluster (Juju Charms Collection):
status: Triaged → Invalid
Revision history for this message
Ashley Lai (alai) wrote :

We hit similar error in our deployment. I've attached the crashdump for investigation. The juju log from mysql/0 shows the following.

2018-01-22 21:32:57 DEBUG db-admin-relation-changed Traceback (most recent call last):
2018-01-22 21:32:57 DEBUG db-admin-relation-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/db-admin-relation-changed", line 780, in <module>
2018-01-22 21:32:57 DEBUG db-admin-relation-changed main()
2018-01-22 21:32:57 DEBUG db-admin-relation-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/db-admin-relation-changed", line 770, in main
2018-01-22 21:32:57 DEBUG db-admin-relation-changed hooks.execute(sys.argv)
2018-01-22 21:32:57 DEBUG db-admin-relation-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/charmhelpers/core/hookenv.py", line 800, in execute
2018-01-22 21:32:57 DEBUG db-admin-relation-changed self._hooks[hook_name]()
2018-01-22 21:32:57 DEBUG db-admin-relation-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/db-admin-relation-changed", line 417, in db_changed
2018-01-22 21:32:57 DEBUG db-admin-relation-changed password = db_helper.configure_db(addr, db_name, username, admin=admin)
2018-01-22 21:32:57 DEBUG db-admin-relation-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/charmhelpers/contrib/database/mysql.py", line 401, in configure_db
2018-01-22 21:32:57 DEBUG db-admin-relation-changed self.connect(password=self.get_mysql_root_password())
2018-01-22 21:32:57 DEBUG db-admin-relation-changed File "/var/lib/juju/agents/unit-mysql-0/charm/hooks/charmhelpers/contrib/database/mysql.py", line 91, in connect
2018-01-22 21:32:57 DEBUG db-admin-relation-changed passwd=password)
2018-01-22 21:32:57 DEBUG db-admin-relation-changed File "/usr/lib/python2.7/dist-packages/MySQLdb/__init__.py", line 81, in Connect
2018-01-22 21:32:57 DEBUG db-admin-relation-changed return Connection(*args, **kwargs)
2018-01-22 21:32:57 DEBUG db-admin-relation-changed File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 204, in __init__
2018-01-22 21:32:57 DEBUG db-admin-relation-changed super(Connection, self).__init__(*args, **kwargs2)
2018-01-22 21:32:57 DEBUG db-admin-relation-changed _mysql_exceptions.OperationalError: (2002, "Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)")
2018-01-22 21:32:57 ERROR juju.worker.uniter.operation runhook.go:114 hook "db-admin-relation-changed" failed: exit status 1

Ashley Lai (alai)
summary: - Error after reboot: Can't connect to local MySQL server through socket
- '/var/run/mysqld/mysqld.sock' (2)
+ Can't connect to local MySQL server through socket
+ '/var/run/mysqld/mysqld.sock'
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.