"Too many connections" on all nodes with only few established connections

Bug #1673793 reported by Ville Ojamo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
New
Undecided
Unassigned

Bug Description

Percona-XtraDB-Cluster-56-5.6.35-26.20.1.el6.x86_64 on CentOS 6

All cluster nodes have "Too many connections" error, while each of them have only relatively few network connections.

max_connections = 500
max_user_connections = 150

thread pool is used.

Nodes had between 70 and 150 ESTABLISHED state TCP connections each when the "Too many errors" was happening.

Nothing else is running on these servers, so there are only two connection types through the socket: clustercheck xinetd service, which is limited to 120 instances at xinetd.conf; and collectd which uses a single connection to monitor the servers.

This should leave between 200 and 300 free connections taking into account global max_user_connections and only 2 users through socket. But even "root" user is unable to login with "Too many connections" error.

After two out of three nodes were shut down, the third node started allowing logins from root and nothing special was happening there.

I understand that this bug report is "pretty thin" to put it mildly. Next time I will know to use the "--protocol=TCP" along with "mysql -P extra_port" and try get some extra information......

Revision history for this message
Ville Ojamo (ville-ojamo) wrote :

The "-P" option to mysql client could automatically imply "--protocol=TCP", by the way, or give an error if it is using socket. Since it makes no sense at all to give -P option when connecting through socket.

In fact I feel like I should open another bug about this behavior.

Revision history for this message
Ville Ojamo (ville-ojamo) wrote :

Some details from mysql log file below.

The problem seemed to have started here:
"
*** Priority TRANSACTION:
TRANSACTION 11635895831, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
MySQL thread id 1, OS thread handle 0x7f5947fbe700, query id 60485263 System lock

*** Victim TRANSACTION:
TRANSACTION 11635889733, ACTIVE 53 sec
, undo log entries 4
MySQL thread id 5230403, OS thread handle 0x7f594430f700, query id 60474823 118.67.200.14 appuser
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 35711 page no 72267 n bits 208 index `PRIMARY` of table `appdb`.`user` trx id 11635889733 lock_mode X locks rec but not
 gap
2017-03-17 19:39:28 3176 [Note] WSREP: cluster conflict due to high priority abort for threads:
2017-03-17 19:39:28 3176 [Note] WSREP: Winning thread:
   THD: 1, mode: applier, state: executing, conflict: no conflict, seqno: 4701144942
   SQL: (null)
2017-03-17 19:39:28 3176 [Note] WSREP: Victim thread:
   THD: 5230403, mode: local, state: executing, conflict: no conflict, seqno: -1
   SQL: (null)
WSREP: BF lock wait long
"

The timing of the event above coincides almost exactly (cannot say for sure, but it is very close, closer to seconds than minutes from the event above) with incremental backup starting with xtrabackup, changed page tracking is used. The backup completed successfully.

After this event, the log shows several "INNODB MONITOR OUTPUT" in rapid succession (16 second intervals), with "WSREP: BF lock wait long" in between, BF lock wait lines increase by 1 between monitor outputs and eventually "Too many connections" appear.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1962

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.