MMM2 Angel potentially fills filesystem on infinit error condition

Bug #473446 reported by BJ Dierkes on 2009-11-03
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
mysql-mmm
Low
Pascal Hofmann

Bug Description

System: RHEL 5.3 64bit, MMM 2.0.10

When attempting to run mmmd_agent I get the following error:

2009/11/03 16:01:10 FATAL Listener: Can't create socket!
2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting

The problem is that Angel loops indefinitely, regardless of the error and it doesn't stop to take a breath... so in a situation like this that is infinite, Angel will eventually fill the filesystem with logs and emails to root.

Tue Nov 3 16:01:10 CST 2009
----------------------------------------------------------------------
36K /var/log/mysql-mmm/
1.9M /var/spool/clientmqueue/

2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:01:10 FATAL Listener: Can't create socket!
2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:01:10 FATAL Listener: Can't create socket!
2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:01:10 FATAL Listener: Can't create socket!
2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:01:10 FATAL Listener: Can't create socket!
2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:01:10 FATAL Listener: Can't create socket!

Tue Nov 3 16:04:11 CST 2009
----------------------------------------------------------------------
1.4M /var/log/mysql-mmm/
235M /var/spool/clientmqueue/

2009/11/03 16:04:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:04:10 FATAL Listener: Can't create socket!
2009/11/03 16:04:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:04:10 FATAL Listener: Can't create socket!
2009/11/03 16:04:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:04:10 FATAL Listener: Can't create socket!
2009/11/03 16:04:11 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:04:11 FATAL Listener: Can't create socket!
2009/11/03 16:04:11 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:04:11 FATAL Listener: Can't create socket!

A simple suggestion would be to have Angel sleep on each loop iteration, if even for just 1 second it will slow down the effect of an infinite error state. What I think would possibly be better is have a failure/retry/giveup type error handling that says... if I've failed with the same issue (exit code?) without a successful run over the last 100 tries... there is no point in continuing to run. Obviously, MMM is something that you want running no matter what... but with a bit of logic I think it could safely retry a set number of times and eventually give up at some point.

Related branches

Changed in mysql-mmm:
assignee: nobody → Pascal Hofmann (pascalhofmann)
importance: Undecided → Low
status: New → Confirmed
BJ Dierkes (derks) wrote :

I am adding a suggested patch that will sleep 10 seconds on failure, and after 10 consecutive failed attempts shutsdown Angel.

BJ Dierkes (derks) wrote :

As a side note, you can easily produce this issue by setting the ip address for db1 ("this" server) to an ip address that is can not bind to. This will reproduce my original issue where Agent can't create a socket.

BJ Dierkes (derks) wrote :

Results with patch:

2009/11/10 19:06:06 FATAL Listener: Can't create socket!
2009/11/10 19:06:06 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:06:16 FATAL Listener: Can't create socket!
2009/11/10 19:06:16 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:06:26 FATAL Listener: Can't create socket!
2009/11/10 19:06:26 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:06:36 FATAL Listener: Can't create socket!
2009/11/10 19:06:36 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:06:46 FATAL Listener: Can't create socket!
2009/11/10 19:06:46 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:06:56 FATAL Listener: Can't create socket!
2009/11/10 19:06:56 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:07:06 FATAL Listener: Can't create socket!
2009/11/10 19:07:06 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:07:16 FATAL Listener: Can't create socket!
2009/11/10 19:07:16 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:07:26 FATAL Listener: Can't create socket!
2009/11/10 19:07:26 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:07:36 FATAL Listener: Can't create socket!
2009/11/10 19:07:36 FATAL Child exited with exitcode 99 and has failed more than 10 times consecutively, not restarting

Pascal Hofmann (pascalhofmann) wrote :

Thanks for your patch!

Changed in mysql-mmm:
status: Confirmed → Fix Committed
Changed in mysql-mmm:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers