mmm tries to kill_host when replication delays in PASSIVE mode

Bug #645460 reported by aeva black
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mysql-mmm
Confirmed
High
Unassigned

Bug Description

In lib/Monitor/Monitor.pm there are 5 locations which execute this code snippet:

            if (!$self->send_agent_status($host)) {
               ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host);
               $self->_kill_host($host, $checks->ping($host));
            }

However, send_agent_status() starts out with this check:

   # Never send anything to agents if we are in PASSIVE mode
   # Never send anything to agents if we have no network connection
   return if ($self->is_passive || !$main::have_net);

The result of this combination is that, under a variety of benign circumstances (such as slave replication becoming more than max_backlog seconds behind), if the monitor is in PASSIVE mode, it will call the _kill_host() routine and, if configured, STONITH unnecessarily. If the kill_host option is not configured, the result is the following frequent and confusing spam in the error log file:

     "Could not kill host '%s' - there may be some duplicate ips now! (There's no binary configured for killing hosts."

Proposed solution:

Each of those 5 locations which check if(!$self->send_agent_status($host)) should also check $self->is_passive and $main::have_net.

(version: current launchpad trunk, 2.2.1)

Related branches

Changed in mysql-mmm:
importance: Undecided → High
status: New → Confirmed
Changed in mysql-mmm:
milestone: none → 2.2.2
David Beveridge (dage)
Changed in mysql-mmm:
milestone: 2.2.2 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.