OpenStack Cinder Charm

'Lost connection to MySQL server during query' after running openstack-upgrade action

Bug #1779061 reported by Xav Paice on 2018-06-28

This bug affects 1 person

	Status	Importance	Assigned to
OpenStack AODH Charm	Invalid	Low	Unassigned
OpenStack Cinder Charm	Invalid	Low	Unassigned
OpenStack Heat Charm	Invalid	Low	Unassigned
OpenStack Nova Cloud Controller Charm	Invalid	Low	Unassigned

Bug Description

Charms 18.05, Xenial and upgrading from Pike to Queens.

We ran the steps in https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-upgrade-openstack.html to upgrade.

After the upgrade, the logs were filled with mysql errors after any operation using the database:

2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines [-] Database connection was found disconnected; reconnecting: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1']
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines Traceback (most recent call last):
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py", line 73, in _connect_ping_listener
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines connection.scalar(select([1]))
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 877, in scalar
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines return self.execute(object, *multiparams, **params).scalar()
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 945, in execute
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines return meth(self, multiparams, params)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines return connection._execute_clauseelement(self, multiparams, params)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines compiled_sql, distilled_params
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines context)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1398, in _handle_dbapi_exception
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines util.raise_from_cause(newraise, exc_info)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines reraise(type(exception), exception, tb=exc_tb, cause=cause)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines context)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines cursor.execute(statement, parameters)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/cursors.py", line 165, in execute
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines result = self._query(query)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/cursors.py", line 321, in _query
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines conn.query(q)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 860, in query
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 1061, in _read_query_result
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines result.read()
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 1349, in read
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines first_packet = self.connection._read_packet()
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 991, in _read_packet
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines packet_header = self._read_bytes(4)
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 1037, in _read_bytes
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
2018-06-28 05:34:16.223 84669 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1']

Tags:

Revision history for this message

Xav Paice (xavpaice) wrote on 2018-06-28:

Running the config-changed hook seemed to sort it out.

affects:

charms → charm-cinder

Revision history for this message

Xav Paice (xavpaice) wrote on 2018-06-28:

Correction, running config-changed didn't sort it out, the errors are back.

James Troup (elmo) on 2018-07-04

tags:

added: upgrade

Revision history for this message

James Page (james-page) wrote on 2018-07-15:

Hmm - "Lost connection to MySQL server during query" across multiple services sounds like an issue in the percona-cluster rather than in the individual services.

Was the PXC cluster in a healthy state prior to upgrade? Worth a check for any errors in the mysql error log. Its also possible the VIP for access might be flipping around if you have a unit which is lagging - the OCF resources in the corosync/pacemaker cluster check for underlying freshness of the PXC daemon unit and will move the VIP if it lags

Changed in charm-aodh:
importance:	Undecided → Medium
Changed in charm-cinder:
importance:	Undecided → Medium
Changed in charm-heat:
importance:	Undecided → Medium
Changed in charm-nova-cloud-controller:
importance:	Undecided → Medium
Changed in charm-aodh:
status:	New → Incomplete
Changed in charm-cinder:
status:	New → Opinion
status:	Opinion → Incomplete
Changed in charm-heat:
status:	New → Incomplete
Changed in charm-nova-cloud-controller:
status:	New → Incomplete

Revision history for this message

James Page (james-page) wrote on 2018-07-15:

Marking 'Incomplete' for the moment - however setting priority as Medium, as this might be an OS service issue as I don't think percona-cluster units are ever upgraded as part of an openstack upgrade process (which is correct - the UCA pockets don't host percona-xtradb-cluster packages).

Revision history for this message

Xav Paice (xavpaice) wrote on 2018-07-25:

Mysql/percona-cluster was healthy prior to upgrade. And is healthy now.

Have seen this on other clouds too, same circumstances.

Changed in charm-nova-cloud-controller:
status:	Incomplete → New
Changed in charm-heat:
status:	Incomplete → New
Changed in charm-cinder:
status:	Incomplete → New
Changed in charm-aodh:
status:	Incomplete → New

Revision history for this message

James Page (james-page) wrote on 2018-07-25:

Might be related to bug 1774544

Revision history for this message

Xav Paice (xavpaice) wrote on 2018-07-25:

The only errors I'm seeing in mysql logs following the upgrades are connection timeouts, e.g. (from the unit with the VIP):

2018-07-25 04:35:43 144076 [Warning] WSREP: Failed to report last committed 49642328, -110 (Connection timed out)
2018-07-25 05:24:57 144076 [Warning] WSREP: Failed to report last committed 49660297, -110 (Connection timed out)
2018-07-25 05:29:25 144076 [Warning] WSREP: Failed to report last committed 49662870, -110 (Connection timed out)
2018-07-25 05:29:55 144076 [Warning] WSREP: Failed to report last committed 49663109, -110 (Connection timed out)
2018-07-25 05:30:17 144076 [Warning] WSREP: Failed to report last committed 49663250, -110 (Connection timed out)
2018-07-25 06:24:22 144076 [Warning] WSREP: Failed to report last committed 49695720, -110 (Connection timed out)
2018-07-25 07:51:34 144076 [Warning] WSREP: Failed to report last committed 49749132, -110 (Connection timed out)

The times don't appear to match at all to the times in the application logs though.

I've looked through syslog for Pacemaker changes for the VIP, it's not moving around at all.

I'm happy to go grab a bunch of logs and pastebin them - which ones are of use?

FWIW, this doesn't appear to affect service availability and seems to just be extra noise in the logs.

Revision history for this message

James Page (james-page) wrote on 2018-07-27:

The call 'SELECT 1' is the connection check used to validate a connection before its used; pooled connections might be timing out of suchlike hence these messages.

I would expect the pool tier to just cycle the connection and give out a fresh one; so there may not be a functional impact, but there may be a performance impact.

I did a quick look into one of our bionic/queens deployments and I'm not seeing the same issue; however it was not upgraded which may be important in the reproduction of this bug.

Changed in charm-aodh:
importance:	Medium → Low
Changed in charm-cinder:
importance:	Medium → Low
Changed in charm-heat:
importance:	Medium → Low
Changed in charm-nova-cloud-controller:
importance:	Medium → Low

Revision history for this message

Krzysztof Pawlowski (krzysztof.pawlowski) wrote on 2018-08-29:

After some research I have found that it was introduced propably by:
https://github.com/openstack/oslo.db/commit/83563e0acc9fd3e1aace5300c2f2b6d75ed5303c#diff-1d4fd3d8f66287a7917561b150e2919f

Revision history for this message

David Ames (thedac) wrote on 2018-11-28:

#10

I don't believe this is a charm bug. It is oslo.db from Bug #1774544.

Since this bug keeps getting re-opened I am setting it to invalid.

If you truly think this in error, feel free to re-open the bug.

Changed in charm-aodh:
status:	New → Invalid
Changed in charm-cinder:
status:	New → Invalid
Changed in charm-heat:
status:	New → Invalid
Changed in charm-nova-cloud-controller:
status:	New → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.