MySQL connection limit should be increased

Bug #1475433 reported by Aleksandr Shaposhnikov
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Denis Egorenko
6.1.x
Invalid
High
MOS Maintenance
7.0.x
Fix Released
High
Denis Egorenko

Bug Description

MOS version: 6.1 Release, Ubuntu+HA

Basically on 200 nodes lab we got in situation when under the stable number of objects in cloud (7k VM, 7k Users,7k Tenants, 7k Networks, 7k Subnetwork) we go out of mysql connections limit.
Let me describe briefly.
We have Galera cluster and haproxy in front of it. Just to escape putting SQL inserts and updates to the different nodes of mysql-galera cluster we have active/backup/backup config in haproxy settings for galera so effectively we have only one mysql server serving the cloud. But it looks like number of connections in config (4096) allowed isn't enough to server 200 nodes cluster.
Here is example of keystone and other service reaction on that:
root@node-663:~# keystone tenant-list
An unexpected error prevented the server from fulfilling your request: (OperationalError) (1040, 'Too many connections') None None (original cause: OperationalError: (OperationalError) (1040, 'Too many connections') None None) u'SELECT 1' [] (Disable debug mode to suppress these details.) (HTTP 500)
root@node-663:~# keystone tenant-list
Authorization Failed: An unexpected error prevented the server from fulfilling your request: (OperationalError) (1040, 'Too many connections') None None (Disable debug mode to suppress these details.) (HTTP 500)

Here is netstat stats to the haproxy across three controllers:
[root@fuel ~]# for i in 57 58 59;do ssh 10.20.2.$i "netstat -an |grep -c 3306";done
Warning: Permanently added '10.20.2.57' (RSA) to the list of known hosts.
1642
Warning: Permanently added '10.20.2.58' (RSA) to the list of known hosts.
1776
Warning: Permanently added '10.20.2.59' (RSA) to the list of known hosts.
1629

So we could see that we just get to the limit.

Fixes(two of them):
1. Change number of connections on all of the nodes keeping in mind that only one will service requests(still).
2. Allow using of whole galera cluster so connections would be distributed across all the controllers/servers with mysql-galera.

Second fix should be tested on high load and also with failure testing because previously galera didn't behave stable enough and only because of that we switched to the active/backup/backup config. At the Openstack summit I had conversation with galera representative and they said that all the problems which we probably had was fixed. Anyway, we should test 3xACTIVE config before proceed with this fix to upstream.

Quick workaround:

1. Change /etc/mysql/conf.d/wsrep.cnf in section [mysqld] max_connections from 4096 to 16384. Number should be relevant to the hardware used. Beware. Don't use very high numbers on slow hardware. That could lead in very aggressive CPU usage by MySQL-galera.
2. Make online change of max_connections using shell command:
mysql -e "SET GLOBAL max_connections = 16384;"

Changed in mos:
importance: Undecided → High
tags: added: scale
Changed in mos:
status: New → Confirmed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

AFAIK, we have the following connections limit

grep files /proc/$(pidof mysqld)/limits
Max open files 102400 102400 files

which if far from 4096, as you descrived, Alexandr.
This requires elaboration.

Do you also think, mysql cluster should just be scaled horizontally (cells) instead of vertically (connection limits) ?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I'm marking this issue as incomplete unless more details provided regarding real numbers of mysqld connections from an affected environment. Logs are not obligatory here, but always wellcome

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

But you are right, there is max_connections=4096 exist in /etc/mysql/conf.d/wsrep.cnf which is probably an issue.

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Do we need to use some formula or just 16384?

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

s/just/just set

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

When ten Openstack components require 16k connection then it's something definitely wrong. I guess there are some openstack components that don't use connection pools. This case requires additional investigation.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Sergii, they all use connection pools. The problem is that connection pooling is per process (== fork) and we tend to have a lot of forks for API services (nova-api, neutron-server, etc) and a few other ones (e.g. nova-conductor). IIRC, it's a fork per a CPU core for each such service.

So we could either increase the maximum number of connections in MySQL or allocate fewer connections per each fork.

Revision history for this message
Denis Egorenko (degorenko) wrote :
Revision history for this message
Denis Egorenko (degorenko) wrote :
tags: added: fuel-to-mos
Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :

All related patches are merged

Revision history for this message
Sergey Vilgelm (sergey.vilgelm) wrote :

Verified on build #224 and #242 with difference environments. The max connections are 2K and 6K for 2G and 8G memory.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.