mmm tries to kill_host when replication delays in PASSIVE mode

Bug #645460 reported by Devananda van der Veen on 2010-09-22
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

In lib/Monitor/ there are 5 locations which execute this code snippet:

            if (!$self->send_agent_status($host)) {
               ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host);
               $self->_kill_host($host, $checks->ping($host));

However, send_agent_status() starts out with this check:

   # Never send anything to agents if we are in PASSIVE mode
   # Never send anything to agents if we have no network connection
   return if ($self->is_passive || !$main::have_net);

The result of this combination is that, under a variety of benign circumstances (such as slave replication becoming more than max_backlog seconds behind), if the monitor is in PASSIVE mode, it will call the _kill_host() routine and, if configured, STONITH unnecessarily. If the kill_host option is not configured, the result is the following frequent and confusing spam in the error log file:

     "Could not kill host '%s' - there may be some duplicate ips now! (There's no binary configured for killing hosts."

Proposed solution:

Each of those 5 locations which check if(!$self->send_agent_status($host)) should also check $self->is_passive and $main::have_net.

(version: current launchpad trunk, 2.2.1)

Related branches

Changed in mysql-mmm:
importance: Undecided → High
status: New → Confirmed
Changed in mysql-mmm:
milestone: none → 2.2.2
David Beveridge (dage) on 2018-05-05
Changed in mysql-mmm:
milestone: 2.2.2 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers