Percona 5.7 randomly crash after upgrading from 5.6

Bug #1578253 reported by romain-pw
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
New
Undecided
Unassigned

Bug Description

Hello,

On Debian 8, after upgrading from Percona 5.6 to 5.7, the database crash randomly after some time and start flooding error logs with the message "[ERROR] Error in accept: Bad file descriptor" until it completely fill the disk.

Here is the full error log :

2016-05-04T08:35:06.313297Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.11-4-log) starting as process 14046 ...
2016-05-04T08:35:06.328826Z 0 [Note] InnoDB: PUNCH HOLE support available
2016-05-04T08:35:06.328880Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-05-04T08:35:06.328887Z 0 [Note] InnoDB: Uses event mutexes
2016-05-04T08:35:06.328893Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2016-05-04T08:35:06.328898Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2016-05-04T08:35:06.328902Z 0 [Note] InnoDB: Using Linux native AIO
2016-05-04T08:35:06.329662Z 0 [Note] InnoDB: Number of pools: 1
2016-05-04T08:35:06.329812Z 0 [Note] InnoDB: Not using CPU crc32 instructions
2016-05-04T08:35:06.339802Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2016-05-04T08:35:06.348418Z 0 [Note] InnoDB: Completed initialization of buffer pool
2016-05-04T08:35:06.351685Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2016-05-04T08:35:06.371604Z 0 [Note] InnoDB: Crash recovery did not find the parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite
2016-05-04T08:35:06.372968Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2016-05-04T08:35:06.396337Z 0 [Note] InnoDB: Created parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite, size 3932160 bytes
2016-05-04T08:35:06.520305Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2016-05-04T08:35:06.520381Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2016-05-04T08:35:06.589968Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2016-05-04T08:35:06.590806Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2016-05-04T08:35:06.590818Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2016-05-04T08:35:06.592041Z 0 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.7.11-4 started; log sequence number 22679190289
2016-05-04T08:35:06.592577Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2016-05-04T08:35:06.592882Z 0 [Note] Plugin 'FEDERATED' is disabled.
2016-05-04T08:35:06.756371Z 0 [Note] InnoDB: Buffer pool(s) load completed at 160504 10:35:06
2016-05-04T08:35:12.960807Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2016-05-04T08:35:12.960840Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2016-05-04T08:35:12.969917Z 0 [Warning] CA certificate ca.pem is self signed.
2016-05-04T08:35:12.969997Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2016-05-04T08:35:12.970886Z 0 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
2016-05-04T08:35:12.970910Z 0 [Note] - '0.0.0.0' resolves to '0.0.0.0';
2016-05-04T08:35:12.970925Z 0 [Note] Server socket created on IP: '0.0.0.0'.
2016-05-04T08:35:13.038507Z 0 [Note] Event Scheduler: Loaded 0 events
2016-05-04T08:35:13.047408Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.11-4-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Percona Server (GPL), Release '4', Revision '5c940e1'
2016-05-04T11:45:13.696841Z 0 [ERROR] Error in accept: Bad file descriptor
2016-05-04T11:45:13.696978Z 0 [ERROR] Error in accept: Bad file descriptor
2016-05-04T11:45:13.697059Z 0 [ERROR] Error in accept: Bad file descriptor
[...]

In my /etc/security/limits.conf I have the following :
mysql hard nofile 102400
mysql soft nofile 102400

And I have also set LimitNOFILE=102400 in my /lib/systemd/system/mysql.service file.

You can also find in attachment my MySQL configuration file.

Revision history for this message
romain-pw (romain-pw) wrote :
romain-pw (romain-pw)
description: updated
Revision history for this message
Nikita (dwins) wrote :

Same here, after upgrade from 5.6 to 5.7, about 1-1.5 running:

2016-11-03T06:30:14.614519Z 0 [ERROR] Error in accept: Bad file descriptor
2016-11-03T06:30:14.614592Z 0 [ERROR] Error in accept: Bad file descriptor
2016-11-03T06:30:14.614665Z 0 [ERROR] Error in accept: Bad file descriptor
2016-11-03T06:30:14.614738Z 0 [ERROR] Error in accept: Bad file descriptor
2016-11-03T06:30:14.614793Z 0 [ERROR] Error in accept: Bad file descriptor
2016-11-03T06:30:14.614841Z 0 [ERROR] Error in accept: Bad file descriptor
2016-11-03T06:30:14.614890Z 0 [ERROR] Error in accept: Bad file descriptor
2016-11-03T06:30:14.614948Z 0 [ERROR] Error in accept: Bad file descriptor
[...]
2016-11-03T06:30:14.861525Z 0 [ERROR] Error in accept: Socket operation on non-socket
2016-11-03T06:30:14.861607Z 0 [ERROR] Error in accept: Socket operation on non-socket
2016-11-03T06:30:14.861688Z 0 [ERROR] Error in accept: Socket operation on non-socket
2016-11-03T06:30:14.861769Z 0 [ERROR] Error in accept: Socket operation on non-socket
2016-11-03T06:30:14.861850Z 0 [ERROR] Error in accept: Socket operation on non-socket
2016-11-03T06:30:14.861931Z 0 [ERROR] Error in accept: Socket operation on non-socket
2016-11-03T06:30:14.862014Z 0 [ERROR] Error in accept: Socket operation on non-socket
2016-11-03T06:30:14.862098Z 0 [ERROR] Error in accept: Socket operation on non-socket

Revision history for this message
Alex Liffick (m0nk) wrote :

I haven't seen any activity on this and have been watching for several months. We moved our web applications from servers running Ubuntu 12.04 and Percona 5.5 to servers running Ubuntu 12.04 and Percona 5.7 in October and had periodically run into both the errors below:

[ERROR] Error in accept: Bad file descriptor
[ERROR] Error in accept: Socket operation on non-socket

After ensuring we don't allow any socket-based operations, that error ceased, but we still receive the bad file descriptor error every so often at midnight exactly. We have about 20 different servers with the same configuration and it's completely random which ones are hit with the issue but it's always at midnight exactly. The my.cnf on each has the following setting:

open_files_limit = 65535

I also have tried the following per a thread on Stack Overflow:

Create /etc/systemd/system/mysql.service.d/10-ulimit.conf with settings:

[Service]
LimitNOFILE=1000000

There's no way we're hitting the 65535 file descriptor limit, so I'm completely unsure what to do next. A big problem is that checking the status of the mysql service shows it's active, so we have a script running out of cron every 5 minutes that checks connectivity and restarts the server if it can't connect.

Revision history for this message
Alex Liffick (m0nk) wrote :

I still haven't seen any updates to this issue; we've moved all of our systems to Ubuntu 16.04 with Percona 5.7 and still have the issue. It mostly happens around midnight for some reason, though it seems random as to which out of our 15 servers it will occur. The settings in my.cnf are the same for open_files_limit and I've also created the ulimit.conf on these servers. Please do something with this bug.

Revision history for this message
Jervin R (revin) wrote :

Alex,

Can you check if https://bugs.mysql.com/bug.php?id=84708 applies to you?

If not, do you have any performance graphs from the servers that recently had these events - i.e. Connections graph, threads created, threads running, cached, new connections, etc. I wonder if this is something related to workload.

Revision history for this message
Alex Liffick (m0nk) wrote :

Thanks for the reply, and that's actually very interesting and a point I will look into. We were getting hammered with brute force attempts against SSH from China and Russia, so we blocked entire IP ranges in hosts.deny. The issue hits fairly regularly exactly at midnight though I'll look to see if TCP wrappers are possibly related.

I would say that it could also be workload related; several of the servers that it happens on are pretty heavily used, though that's not always the case as one we just built but aren't even using was hit with it overnight last night.... which is what prompted my newest comment. This server has a current 24 active threads, which are all innodb related. There have been 131 connections since it was spun up last week. If there's anything else that might help, please let me know and I'll get that info.

Thanks.

Revision history for this message
Jervin R (revin) wrote :

Alex,

Good information - the reason I am asking for graphs is to correlate the events with the actual server behavior. I have no definite answer to why it is happening, we need to start somewhere for a reproduceble test case :)

Another idea, if your MySQL instances are running on public IP addresses - would it be possible to run on private network and configure mysql to listen only on those IP addresses via bind-address? This might help if you are being hammered on public IP space. Hopefully this is possible.

Revision history for this message
Alex Liffick (m0nk) wrote :

I changed some systems' TCP Wrappers to only specific daemons so that mysqld isn't included and so far so good. It's hard to say whether 4 days equates to success, but at least one of the systems I've implemented it on was seeing the issue happen every other day... and it's been running stable. I'll post back once I've gotten all systems moved over and we have at least 1 week of non-failure.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3426

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.