MMM2 Angel potentially fills filesystem on infinit error condition

Bug #473446 reported by BJ Dierkes
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
mysql-mmm
Fix Released
Low
Pascal Hofmann

Bug Description

System: RHEL 5.3 64bit, MMM 2.0.10

When attempting to run mmmd_agent I get the following error:

2009/11/03 16:01:10 FATAL Listener: Can't create socket!
2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting

The problem is that Angel loops indefinitely, regardless of the error and it doesn't stop to take a breath... so in a situation like this that is infinite, Angel will eventually fill the filesystem with logs and emails to root.

Tue Nov 3 16:01:10 CST 2009
----------------------------------------------------------------------
36K /var/log/mysql-mmm/
1.9M /var/spool/clientmqueue/

2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:01:10 FATAL Listener: Can't create socket!
2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:01:10 FATAL Listener: Can't create socket!
2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:01:10 FATAL Listener: Can't create socket!
2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:01:10 FATAL Listener: Can't create socket!
2009/11/03 16:01:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:01:10 FATAL Listener: Can't create socket!

Tue Nov 3 16:04:11 CST 2009
----------------------------------------------------------------------
1.4M /var/log/mysql-mmm/
235M /var/spool/clientmqueue/

2009/11/03 16:04:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:04:10 FATAL Listener: Can't create socket!
2009/11/03 16:04:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:04:10 FATAL Listener: Can't create socket!
2009/11/03 16:04:10 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:04:10 FATAL Listener: Can't create socket!
2009/11/03 16:04:11 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:04:11 FATAL Listener: Can't create socket!
2009/11/03 16:04:11 FATAL Child exited with exitcode 99, restarting
2009/11/03 16:04:11 FATAL Listener: Can't create socket!

A simple suggestion would be to have Angel sleep on each loop iteration, if even for just 1 second it will slow down the effect of an infinite error state. What I think would possibly be better is have a failure/retry/giveup type error handling that says... if I've failed with the same issue (exit code?) without a successful run over the last 100 tries... there is no point in continuing to run. Obviously, MMM is something that you want running no matter what... but with a bit of logic I think it could safely retry a set number of times and eventually give up at some point.

Related branches

Changed in mysql-mmm:
assignee: nobody → Pascal Hofmann (pascalhofmann)
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
BJ Dierkes (derks) wrote :

I am adding a suggested patch that will sleep 10 seconds on failure, and after 10 consecutive failed attempts shutsdown Angel.

Revision history for this message
BJ Dierkes (derks) wrote :

As a side note, you can easily produce this issue by setting the ip address for db1 ("this" server) to an ip address that is can not bind to. This will reproduce my original issue where Agent can't create a socket.

Revision history for this message
BJ Dierkes (derks) wrote :

Results with patch:

2009/11/10 19:06:06 FATAL Listener: Can't create socket!
2009/11/10 19:06:06 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:06:16 FATAL Listener: Can't create socket!
2009/11/10 19:06:16 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:06:26 FATAL Listener: Can't create socket!
2009/11/10 19:06:26 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:06:36 FATAL Listener: Can't create socket!
2009/11/10 19:06:36 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:06:46 FATAL Listener: Can't create socket!
2009/11/10 19:06:46 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:06:56 FATAL Listener: Can't create socket!
2009/11/10 19:06:56 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:07:06 FATAL Listener: Can't create socket!
2009/11/10 19:07:06 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:07:16 FATAL Listener: Can't create socket!
2009/11/10 19:07:16 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:07:26 FATAL Listener: Can't create socket!
2009/11/10 19:07:26 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/10 19:07:36 FATAL Listener: Can't create socket!
2009/11/10 19:07:36 FATAL Child exited with exitcode 99 and has failed more than 10 times consecutively, not restarting

Revision history for this message
Pascal Hofmann (pascalhofmann) wrote :

Thanks for your patch!

Changed in mysql-mmm:
status: Confirmed → Fix Committed
Changed in mysql-mmm:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.