error 4 checking MySQL

Bug #569665 reported by Peter Zaitsev
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
mysql-mmm
New
Undecided
Unassigned

Bug Description

I had a system down with MMM today with following in the logs:

2010/04/25 05:57:05 WARN Check 'rep_threads' on 'X' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.4.254:3306, user = mmm_monitor)! Can't connect to MySQL server on '10.0.4.254' (4)
2010/04/25 05:57:07 WARN Check 'rep_backlog' on 'x' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.4.254:3306, user = mmm_monitor)! Can't connect to MySQL server on '10.0.4.254' (4)
2010/04/25 05:57:12 INFO Check 'rep_backlog' on 'X' is ok!
2010/04/25 05:57:20 ERROR Check 'rep_threads' on 'X' has failed for 10 seconds! Message: ERROR: Replication is broken
2010/04/25 05:57:37 WARN Check 'rep_threads' on 'y' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.4.248:3306, user = mmm_monitor)! Can't connect to MySQL server on '10.0.4.248' (4)
2010/04/25 05:57:39 WARN Check 'rep_backlog' on 'Y' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.4.248:3306, user = mmm_monitor)! Can't connect to MySQL server on '10.0.4.248' (4)
2010/04/25 05:57:51 INFO Check 'rep_backlog' on 'Y' is ok!
2010/04/25 05:57:55 ERROR Check 'rep_threads' on 'Y' has failed for 13 seconds! Message: ERROR: Replication is broken

There are 2 issues here. First both MySQL servers are actually well reachable.
Second it discovers both nodes are down freaks out and puts them both offline removing all roles which is not very helpful.
I would prefer MMM not to remove the roles until it has somethere to put them to.

Revision history for this message
Kristoffer (k-i) wrote :
Download full text (18.0 KiB)

Hello,

It seems many have this problem.
Is there any indication why this happens, or if somebody has time to fix it?

My situation:
Debian squeeze
Mysql 5.1
MMM v. 2.2.1
Master/Master setup with one writer, no readers.

We experienced the problem a couple of times, where mmm can't connect to mysql server, even though nagios can (from the same server).

MMM log:

2011/07/12 01:53:34 WARN Check 'rep_threads' on 'db05' is in unknown state! Message: UNKNOWN: Connect error (host = xx.xx.xx.xx:3306, user = mmm_monitor)! Can't connect to MySQL server on 'xx.xx.xx.xx' (4)
2011/07/12 01:53:34 WARN Check 'rep_backlog' on 'db05' is in unknown state! Message: UNKNOWN: Connect error (host = xx.xx.xx.xx:3306, user = mmm_monitor)! Can't connect to MySQL server on 'xx.xx.xx.xx' (4)
2011/07/12 01:53:36 WARN Check 'rep_threads' on 'db06' is in unknown state! Message: UNKNOWN: Connect error (host = xx.xx.xx.xx:3306, user = mmm_monitor)! Can't connect to MySQL server on 'xx.xx.xx.xx' (4)
2011/07/12 01:53:36 WARN Check 'rep_backlog' on 'db06' is in unknown state! Message: UNKNOWN: Connect error (host = xx.xx.xx.xx:3306, user = mmm_monitor)! Can't connect to MySQL server on 'xx.xx.xx.xx' (4)
2011/07/12 01:53:52 ERROR Check 'mysql' on 'db05' has failed for 18 seconds! Message: ERROR: Connect error (host = xx.xx.xx.xx:3306, user = mmm_monitor)! Can't connect to MySQL server on 'xx.xx.xx.xx' (4)
2011/07/12 01:53:53 FATAL State of host 'db05' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2011/07/12 01:53:53 INFO Removing all roles from host 'db05':
2011/07/12 01:53:53 INFO Removed role 'writer(xx.xx.xx.xx)' from host 'db05'
2011/07/12 01:53:53 INFO Orphaned role 'writer(xx.xx.xx.xx)' has been assigned to 'db06'
2011/07/12 01:53:54 ERROR Check 'mysql' on 'db06' has failed for 18 seconds! Message: ERROR: Connect error (host = xx.xx.xx.xx:3306, user = mmm_monitor)! Can't connect to MySQL server on 'xx.xx.xx.xx' (4)
2011/07/12 01:53:57 FATAL State of host 'db06' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2011/07/12 01:53:57 INFO Removing all roles from host 'db06':
2011/07/12 01:53:57 INFO Removed role 'writer(xx.xx.xx.xx)' from host 'db06'
2011/07/12 02:02:32 WARN Check 'mysql' on 'db05' is in unknown state! Message: UNKNOWN: Too many connections! Too many connections
2011/07/12 02:02:32 WARN Check 'mysql' on 'db06' is in unknown state! Message: UNKNOWN: Too many connections! Too many connections
2011/07/12 02:03:14 WARN Check 'mysql' on 'db05' is in unknown state! Message: UNKNOWN: Too many connections! Too many connections
2011/07/12 02:03:31 WARN Check 'mysql' on 'db05' is in unknown state! Message: UNKNOWN: Too many connections! Too many connections
2011/07/12 02:03:53 WARN Check 'mysql' on 'db05' is in unknown state! Message: UNKNOWN: Too many connections! Too many connections
2011/07/12 02:04:02 WARN Check 'mysql' on 'db06' is in unknown state! Message: UNKNOWN: Too many connections! Too many connections
2011/07/12 02:04:07 WARN Check 'mysql' on 'db05' is in unknown state! Message: UNKNOWN: Too many connections! Too many connections
2011/07/12 02:07:30 WARN Check 'mysql' on 'db05' is in unknown state! Mes...

Revision history for this message
Kenny Gryp (gryp) wrote :

Kristoffer, this is not an MMM bug. It looks like authentication issues, most likely related to dns issues during authentication

Revision history for this message
Kristoffer (k-i) wrote :

Hi Kenny,

Thanks for your response.

I agree it looks like authentication issues, but all three servers (master/master/monitor) has the same hosts file with all hosts, and the mmm configuration files are setup with ip addresses, not hostnames.

Wouldn't this rule out auth issues?

Revision history for this message
Kenny Gryp (gryp) wrote :

MMM connects to the database on ip, but in order to authenticate, the MySQL serverdoes a reverse dns lookup of the client ip that is connecting.
You can disable that behavior if you set 'skip-name-resolve' _and_ only specify IP's (with/without wildcards) in the authentication host.

Anyway, this is a bit of a guess with the little information that I have.
Feel free to contact Percona or other means if you want this to be resolved.

Revision history for this message
Kristoffer (k-i) wrote :

I see your point - reading this: http://hackmysql.com/dns everything fits our problem.

However trying to test it:

1) block access to default gateway in firewall on db servers and monitor server
2) verify DNS lookups timeout
2) restart mysql

Everything still works? If you have any hints how I can verify the problem I would appreciate it! :-)

I will add the skip-name-resolve to my.cnf.

Thanks for you help.

Revision history for this message
jack ren (linuxgood1230) wrote :

Hello ALL:

I have skip-name-resolve,but it always happened.
I have the same problom,but my 'rep_threads' and 'rep_backlog' check does not make erros,
and just WARN.

Every 5s ,it is become ok;but is is a problem like you.

is it solved?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.