'Lost connection to MySQL server during query' after running openstack-upgrade action

Bug #1779061 reported by Xav Paice
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack AODH Charm
Invalid
Low
Unassigned
OpenStack Cinder Charm
Invalid
Low
Unassigned
OpenStack Heat Charm
Invalid
Low
Unassigned
OpenStack Nova Cloud Controller Charm
Invalid
Low
Unassigned

Bug Description

Charms 18.05, Xenial and upgrading from Pike to Queens.

We ran the steps in https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-upgrade-openstack.html to upgrade.

After the upgrade, the logs were filled with mysql errors after any operation using the database:

2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines [-] Database connection was found disconnected; reconnecting: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1']
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines Traceback (most recent call last):
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py", line 73, in _connect_ping_listener
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines connection.scalar(select([1]))
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 877, in scalar
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines return self.execute(object, *multiparams, **params).scalar()
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 945, in execute
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines return meth(self, multiparams, params)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines return connection._execute_clauseelement(self, multiparams, params)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines compiled_sql, distilled_params
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines context)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1398, in _handle_dbapi_exception
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines util.raise_from_cause(newraise, exc_info)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines reraise(type(exception), exception, tb=exc_tb, cause=cause)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines context)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines cursor.execute(statement, parameters)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/cursors.py", line 165, in execute
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines result = self._query(query)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/cursors.py", line 321, in _query
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines conn.query(q)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 860, in query
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 1061, in _read_query_result
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines result.read()
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 1349, in read
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines first_packet = self.connection._read_packet()
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 991, in _read_packet
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines packet_header = self._read_bytes(4)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 1037, in _read_bytes
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1']

Revision history for this message
Xav Paice (xavpaice) wrote :

Running the config-changed hook seemed to sort it out.

affects: charms → charm-cinder
Revision history for this message
Xav Paice (xavpaice) wrote :

Correction, running config-changed didn't sort it out, the errors are back.

James Troup (elmo)
tags: added: upgrade
Revision history for this message
James Page (james-page) wrote :

Hmm - "Lost connection to MySQL server during query" across multiple services sounds like an issue in the percona-cluster rather than in the individual services.

Was the PXC cluster in a healthy state prior to upgrade? Worth a check for any errors in the mysql error log. Its also possible the VIP for access might be flipping around if you have a unit which is lagging - the OCF resources in the corosync/pacemaker cluster check for underlying freshness of the PXC daemon unit and will move the VIP if it lags

Changed in charm-aodh:
importance: Undecided → Medium
Changed in charm-cinder:
importance: Undecided → Medium
Changed in charm-heat:
importance: Undecided → Medium
Changed in charm-nova-cloud-controller:
importance: Undecided → Medium
Changed in charm-aodh:
status: New → Incomplete
Changed in charm-cinder:
status: New → Opinion
status: Opinion → Incomplete
Changed in charm-heat:
status: New → Incomplete
Changed in charm-nova-cloud-controller:
status: New → Incomplete
Revision history for this message
James Page (james-page) wrote :

Marking 'Incomplete' for the moment - however setting priority as Medium, as this might be an OS service issue as I don't think percona-cluster units are ever upgraded as part of an openstack upgrade process (which is correct - the UCA pockets don't host percona-xtradb-cluster packages).

Revision history for this message
Xav Paice (xavpaice) wrote :

Mysql/percona-cluster was healthy prior to upgrade. And is healthy now.

Have seen this on other clouds too, same circumstances.

Changed in charm-nova-cloud-controller:
status: Incomplete → New
Changed in charm-heat:
status: Incomplete → New
Changed in charm-cinder:
status: Incomplete → New
Changed in charm-aodh:
status: Incomplete → New
Revision history for this message
James Page (james-page) wrote :

Might be related to bug 1774544

Revision history for this message
Xav Paice (xavpaice) wrote :

The only errors I'm seeing in mysql logs following the upgrades are connection timeouts, e.g. (from the unit with the VIP):

2018-07-25 04:35:43 144076 [Warning] WSREP: Failed to report last committed 49642328, -110 (Connection timed out)
2018-07-25 05:24:57 144076 [Warning] WSREP: Failed to report last committed 49660297, -110 (Connection timed out)
2018-07-25 05:29:25 144076 [Warning] WSREP: Failed to report last committed 49662870, -110 (Connection timed out)
2018-07-25 05:29:55 144076 [Warning] WSREP: Failed to report last committed 49663109, -110 (Connection timed out)
2018-07-25 05:30:17 144076 [Warning] WSREP: Failed to report last committed 49663250, -110 (Connection timed out)
2018-07-25 06:24:22 144076 [Warning] WSREP: Failed to report last committed 49695720, -110 (Connection timed out)
2018-07-25 07:51:34 144076 [Warning] WSREP: Failed to report last committed 49749132, -110 (Connection timed out)

The times don't appear to match at all to the times in the application logs though.

I've looked through syslog for Pacemaker changes for the VIP, it's not moving around at all.

I'm happy to go grab a bunch of logs and pastebin them - which ones are of use?

FWIW, this doesn't appear to affect service availability and seems to just be extra noise in the logs.

Revision history for this message
James Page (james-page) wrote :

The call 'SELECT 1' is the connection check used to validate a connection before its used; pooled connections might be timing out of suchlike hence these messages.

I would expect the pool tier to just cycle the connection and give out a fresh one; so there may not be a functional impact, but there may be a performance impact.

I did a quick look into one of our bionic/queens deployments and I'm not seeing the same issue; however it was not upgraded which may be important in the reproduction of this bug.

Changed in charm-aodh:
importance: Medium → Low
Changed in charm-cinder:
importance: Medium → Low
Changed in charm-heat:
importance: Medium → Low
Changed in charm-nova-cloud-controller:
importance: Medium → Low
Revision history for this message
Krzysztof Pawlowski (krzysztof.pawlowski) wrote :
Revision history for this message
David Ames (thedac) wrote :

I don't believe this is a charm bug. It is oslo.db from Bug #1774544.

Since this bug keeps getting re-opened I am setting it to invalid.

If you truly think this in error, feel free to re-open the bug.

Changed in charm-aodh:
status: New → Invalid
Changed in charm-cinder:
status: New → Invalid
Changed in charm-heat:
status: New → Invalid
Changed in charm-nova-cloud-controller:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.