Improper close connection to database leading to mysql/mariadb block connection.

Bug #1831009 reported by puthi on 2019-05-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Undecided
Unassigned

Bug Description

Version
=======
Neutron-server: openstack-neutron-13.0.2-1.el7.noarch
Nova: openstack-nova-*-18.2.0-1.el7.noarch
Mariadb: mariadb-server-10.1.20-2.el7.x86_64

Openstack setup:
================
HAproxy => 3 Controllers (nova,neutron,keystone) => Mariadb

Openstack config
================
connection_recycle_time = 3600 (default)
the rest of database connection are default.

Mariadb config
==============
interactive_timeout: 28800
wait_timeout: 28800

Error
=====
As my Openstack cluster grow, more and more people start using the cluster, i start seeing this error everyday now

2019-05-30 10:42:15.695 44938 ERROR oslo_db.sqlalchemy.exc_filters [req-b6fd59b9-8378-49df-bbf6-de9f9b741490 - - - - -] DBAPIError exception wrapped from (pymysql.err.InternalError) (1129, u"Host 'xx.xx.xx.xx' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'") (Background on this error at: http://sqlalche.me/e/2j85): InternalError: (1129, u"Host 'xx.xx.xx.xx' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'")

This is not necessary happens to only neutron but all of the components of openstack. And when i turn log_warnings=4 in Mariadb, I can see in the log of Mariadb as below:

2019-05-27 10:22:04 140078484511488 [Warning] Aborted connection 70834104 to db: 'nova' user: 'nova' host: 'controller3' (CLOSE_CONNECTION)
2019-05-27 10:22:05 140084673243904 [Warning] Aborted connection 70834111 to db: 'nova' user: 'nova' host: 'controller3' (CLOSE_CONNECTION)
2019-05-27 10:22:07 140078500485888 [Warning] Aborted connection 70834211 to db: 'nova' user: 'nova' host: 'controller3' (CLOSE_CONNECTION)
2019-05-27 10:22:07 140078490655488 [Warning] Aborted connection 70834157 to db: 'nova' user: 'nova' host: 'controller3' (CLOSE_CONNECTION)
2019-05-27 10:22:09 140078698322688 [Warning] Aborted connection 70834327 to db: 'nova' user: 'nova' host: 'controller3' (CLOSE_CONNECTION)
2019-05-27 10:22:12 140078715833088 [Warning] Aborted connection 70894166 to db: 'unconnected' user: 'neutron' host: 'controller2' (CLOSE_CONNECTION)
2019-05-27 10:22:13 140078737951488 [Warning] Aborted connection 70834380 to db: 'nova' user: 'nova' host: 'controller3' (CLOSE_CONNECTION)
2019-05-27 10:22:17 140078641797888 [Warning] Aborted connection 70834382 to db: 'nova' user: 'nova' host: 'controller3' (CLOSE_CONNECTION)
2019-05-27 10:22:21 140078581893888 [Warning] Aborted connection 70834436 to db: 'nova' user: 'nova' host: 'controller3' (CLOSE_CONNECTION)
2019-05-27 10:22:22 140078724434688 [Warning] Aborted connection 70834469 to db: 'nova' user: 'nova' host: 'controller3' (CLOSE_CONNECTION)
2019-05-27 10:22:28 140078715833088 [Warning] Aborted connection 70894174 to db: 'unconnected' user: 'unauthenticated' host: 'controller2' (CLOSE_CONNECTION)
2019-05-27 10:22:29 140078715833088 [Warning] Aborted connection 70894177 to db: 'neutron' user: 'neutron' host: 'controller2' (CLOSE_CONNECTION)
...
2019-05-30 7:35:28 140078596025088 [Warning] Aborted connection 72547571 to db: 'nova' user: 'nova' host: 'controller1' (Got an error reading communication packets)
2019-05-30 7:46:54 140078541036288 [Warning] Aborted connection 72552087 to db: 'nova_cell0' user: 'nova' host: 'controller1' (Got an error reading communication packets)
2019-05-30 7:46:57 140078799182592 [Warning] Aborted connection 72552086 to db: 'nova' user: 'nova' host: 'controller1' (Got an error reading communication packets)
2019-05-30 7:47:02 140078738565888 [Warning] Aborted connection 72534613 to db: 'nova_cell0' user: 'nova' host: 'controller1' (Got an error reading communication packets)
2019-05-30 8:31:11 140078638418688 [Warning] Aborted connection 72419897 to db: 'nova' user: 'nova' host: 'controller3' (Got timeout reading communication packets)
2019-05-30 8:36:22 140078791195392 [Warning] Aborted connection 72421900 to db: 'nova' user: 'nova' host: 'controller2' (Got timeout reading communication packets)
2019-05-30 8:46:23 140078624594688 [Warning] Aborted connection 72577413 to db: 'nova_cell0' user: 'nova' host: 'controller1' (Got an error reading communication packets)
2019-05-30 8:46:26 140078716447488 [Warning] Aborted connection 72577414 to db: 'nova' user: 'nova' host: 'controller1' (Got an error reading communication packets)
2019-05-30 10:45:23 140078661151488 [Warning] Aborted connection 72675103 to db: 'neutron' user: 'neutron' host: 'controller3' (Got an error reading communication packets)
2019-05-30 10:45:23 140078672517888 [Warning] Aborted connection 72675137 to db: 'neutron' user: 'neutron' host: 'controller3' (Got an error reading communication packets)
2019-05-30 10:45:23 140078768155392 [Warning] Aborted connection 72674638 to db: 'neutron' user: 'neutron' host: 'controller3' (Got an error reading communication packets)
2019-05-30 10:45:23 140078647327488 [Warning] Aborted connection 72674581 to db: 'neutron' user: 'neutron' host: 'controller3' (Got an error reading communication packets)

I also notice that every times i restart any services (nova/neutron/keystone) i can see that warning "Got an error reading communication packets" appear in Mariadb log. And i think when the connection is properly closed, the Mariadb log will give "CLOSE_CONNECTION" instead of that error.
So, i wonder if this is a bug in Openstack Rocky version as i don't see this behavior in other version.

Tags: db Edit Tag help
puthi (puthi) on 2019-05-30
description: updated
puthi (puthi) on 2019-05-30
description: updated
Slawek Kaplonski (slaweq) wrote :

I think that some DB expert should take a look at this but shouldn't this be maybe reported agains oslo.db if it happens also in different OpenStack projects as well? IMHO oslo.db is common piece used by all projects to connect to db so maybe there is some bug.

puthi (puthi) wrote :

Thanks
I wasn't sure where to report too but as neutron seem to have a lot of this problem that is why i started here. I'll try to move to oslo.db as you suggested.

Akihiro Motoki (amotoki) on 2019-06-04
tags: added: db
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers