Haproxy fails to kick out broken backend

Bug #1765739 reported by Peter Sabaini
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Triaged
Wishlist
Unassigned

Bug Description

Due to an issue with a mysql connection we had HA-clustered n-c-c backends on a unit failing, see below.

However, haproxy still relayed requests to that backend, resulting in intermittent faults for the client.

Haproxy should have detected the failing backends and taken them out of the backend group.

I guess this is just a consequence of running haproxy in tcp mode instead of http mode and the backends still accepting connections.

Traceback from fault, ftr.

2018-04-20 09:06:33.951 492656 ERROR oslo_service.service with ctxt_mgr.reader.using(context):
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return self.gen.next()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 759, in _transaction_scope
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service allow_async=self._allow_async) as resource:
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return self.gen.next()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 491, in _session
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service bind=self.connection, mode=self.mode)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 272, in _create_session
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service self._start()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 338, in _start
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service engine_args, maker_args)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 362, in _setup_for_connection
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service sql_connection=sql_connection, **engine_kwargs)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py", line 152, in create_engine
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service test_conn = _test_connection(engine, max_retries, retry_interval)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py", line 326, in _test_connection
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return engine.connect()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2018, in connect
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return self._connection_cls(self, **kwargs)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 72, in __init__
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service if connection is not None else engine.raw_connection()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2104, in raw_connection
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service self.pool.unique_connection, _connection)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2078, in _wrap_pool_connect
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service e, dialect, self)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1401, in _handle_dbapi_exception_noconnection
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service util.raise_from_cause(newraise, exc_info)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service reraise(type(exception), exception, tb=exc_tb)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2074, in _wrap_pool_connect
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return fn()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 318, in unique_connection
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return _ConnectionFairy._checkout(self)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 713, in _checkout
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service fairy = _ConnectionRecord.checkout(pool)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 480, in checkout
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service rec = pool._do_get()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 1060, in _do_get
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service self._dec_overflow()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service compat.reraise(exc_type, exc_value, exc_tb)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 1057, in _do_get
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return self._create_connection()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 323, in _create_connection
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return _ConnectionRecord(self)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 449, in __init__
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service self.connection = self.__connect()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 607, in __connect
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service connection = self.__pool._invoke_creator(self)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/strategies.py", line 97, in connect
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return dialect.connect(*cargs, **cparams)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 385, in connect
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return self.dbapi.connect(*cargs, **cparams)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/pymysql/__init__.py", line 88, in Connect
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service return Connection(*args, **kwargs)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 679, in __init__
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service self.connect()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 890, in connect
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service self._get_server_information()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 1190, in _get_server_information
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service packet = self._read_packet()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 966, in _read_packet
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service packet.check_error()
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 394, in check_error
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service err.raise_mysql_exception(self._data)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/pymysql/err.py", line 120, in raise_mysql_exception
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service _check_mysql_exception(errinfo)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/pymysql/err.py", line 115, in _check_mysql_exception
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service raise InternalError(errno, errorvalue)
2018-04-20 09:06:33.951 492656 ERROR oslo_service.service DBError: (pymysql.err.InternalError) (1129, u"Host '10-x-x-x.maas' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'")

Revision history for this message
James Page (james-page) wrote :

haproxy just does simple tcp or http checks as configured in the charms - if a backend app service is non-functional this will not be detected.

I agree that implementing some sort of healthcheck is a good idea but that relies on the underlying service being able to expose that.

Revision history for this message
James Page (james-page) wrote :

ftr we use tcp mode to support SSL termination at the service endpoint rather than at the load balancer. Moving to http/https might be OK but we'd need to validate installation of the appropriate CA cert on the local install.

Changed in charm-nova-cloud-controller:
status: New → Triaged
importance: Undecided → Wishlist
tags: added: cpe-onsite
Revision history for this message
Dorina Timbur (dorina-t) wrote :

Hi, would it be possible to increase the priority higher than "wishlist"?
We recently had an incident in a live customer cloud where losing a single compute, hosting one of the keystone units brought the service down completely.

Revision history for this message
Andrea Ieri (aieri) wrote :

I think bug 1880610 would resolve this one, especially for mysql since haproxy offers a backend-specific check for it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.