I did some digging in the application and specifically looked into the table that was causing the cluster to crash. One thing I noticed is that there was not a primary key on the table that was causing the problem, so I added a generic ID column with auto-increment to the table and I also moved the process that does the select/loop/delete to a seperate process called by a cron job on one server, rather than from within the web code on any of the three web servers. Hopefully that will prevent the cluster from crashing.
I did some digging in the application and specifically looked into the table that was causing the cluster to crash. One thing I noticed is that there was not a primary key on the table that was causing the problem, so I added a generic ID column with auto-increment to the table and I also moved the process that does the select/loop/delete to a seperate process called by a cron job on one server, rather than from within the web code on any of the three web servers. Hopefully that will prevent the cluster from crashing.