mysql-mmm-monitor declares all nodes dead after long time running
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mysql-mmm |
New
|
Undecided
|
Unassigned |
Bug Description
Once in a while we discovered that the monitoring daemon declares all nodes dead and removes all shared IPs. Result is that all virtual IPs are not reachable anymore.
Restarting the mysql-mmm-
The log has the following output:
2012/08/08 07:12:30 FATAL State of host 'turnus' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2012/08/08 07:12:33 FATAL State of host 'nox' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2012/08/08 07:13:45 FATAL State of host 'nox' changed from HARD_OFFLINE to AWAITING_RECOVERY
2012/08/08 07:13:48 FATAL State of host 'turnus' changed from HARD_OFFLINE to AWAITING_RECOVERY
2012/08/08 07:14:45 FATAL State of host 'nox' changed from AWAITING_RECOVERY to ONLINE because of auto_set_online(60 seconds). It was in state AWAITING_RECOVERY for 60 seconds
2012/08/08 07:14:48 FATAL State of host 'turnus' changed from AWAITING_RECOVERY to ONLINE because of auto_set_online(60 seconds). It was in state AWAITING_RECOVERY for 60 seconds
2012/09/13 07:24:22 FATAL State of host 'turnus' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2012/09/13 07:24:25 FATAL State of host 'nox' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2012/09/13 08:02:37 FATAL State of host 'turnus' changed from HARD_OFFLINE to AWAITING_RECOVERY
2012/09/13 08:02:37 FATAL State of host 'nox' changed from HARD_OFFLINE to AWAITING_RECOVERY
2012/09/13 08:03:37 FATAL State of host 'turnus' changed from AWAITING_RECOVERY to ONLINE because of auto_set_online(60 seconds). It was in state AWAITING_RECOVERY for 60 seconds
2012/09/13 08:03:37 FATAL State of host 'nox' changed from AWAITING_RECOVERY to ONLINE because of auto_set_online(60 seconds). It was in state AWAITING_RECOVERY for 60 seconds
Log MySQL node Nox:
2012/09/13 07:24:22 INFO We have some new roles added or old rules deleted!
2012/09/13 07:24:22 INFO Added: reader(
2012/09/13 07:24:25 INFO We have some new roles added or old rules deleted!
2012/09/13 07:24:25 INFO Deleted: reader(
2012/09/13 08:03:37 INFO We have some new roles added or old rules deleted!
2012/09/13 08:03:37 INFO Added: reader(
Log MySQL node Turnus:
2012/08/08 07:12:30 INFO We have some new roles added or old rules deleted!
2012/08/08 07:12:30 INFO Deleted: reader(
2012/08/08 07:14:48 INFO We have some new roles added or old rules deleted!
2012/08/08 07:14:48 INFO Added: reader(
2012/09/13 07:24:22 INFO We have some new roles added or old rules deleted!
2012/09/13 07:24:22 INFO Deleted: reader(
2012/09/13 08:03:37 INFO We have some new roles added or old rules deleted!
2012/09/13 08:03:37 INFO Added: reader(
The incident started at 7:22 and I resolved it by restarting the monitoring daemon at 8:02.
The running time of the daemon was approx. 70 days 12 hours.
We are running version:
mysql-mmm-
mysql-mmm-
mysql-mmm-
I have the same problem with the following versions installed on the monitor host:
mysql-mmm-common 2.2.1-1
mysql-mmm-monitor 2.2.1-1
the following versions installed on the databases hosts:
libdbd-mysql-perl 4.016-1 core-5. 1 5.1.63-0+squeeze1
libmysqlclient16 5.1.63-0+squeeze1
mysql-client-5.1 5.1.63-0+squeeze1
mysql-common 5.1.63-0+squeeze1
mysql-mmm-agent 2.2.1-1
mysql-mmm-common 2.2.1-1
mysql-server 5.1.63-0+squeeze1
mysql-server-5.1 5.1.63-0+squeeze1
mysql-server-