mysqld_safe doesn't restart mysqld after a crash

Bug #1204380 reported by Miguel Angel Nieto
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Invalid
Undecided
Unassigned

Bug Description

mysqld_safe doesn't restart mysqld after a crash.

130722 01:46:59 mysqld_safe Number of processes running now: 0
130722 01:46:59 mysqld_safe WSREP: not restarting wsrep node automatically

Version: '5.5.30-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), wsrep_23.7.4.r3843

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Miguel, this is my code and we don't consider it a bug, but a protection against potential chaos. Generally a node is a part of a cluster and should not be manipulated without considering the whole situation.

Revision history for this message
Jay Janssen (jay-janssen) wrote :

Alex, I can appreciate that, though there is a certain slickness to an inconsistent node restarting and fixing itself with an SST. Could we at least make 'max_wsrep_restarts' configurable in the [mysqld_safe] section of the my.cnf?

  if [ -n "$wsrep_restart" ]
  then
    if [ $wsrep_restart -le $max_wsrep_restarts ]
    then
      wsrep_restart=`expr $wsrep_restart + 1`
      log_notice "WSREP: sleeping 15 seconds before restart"
      sleep 15
    else
      log_notice "WSREP: not restarting wsrep node automatically"
      break
    fi
  fi

Revision history for this message
Jay Janssen (jay-janssen) wrote :

To clarify, I agree defaulting this to 0 is logical.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Jay, that's trivial to do - and I would have defaulted it to 1 actually, however in the current form this variable has a mysqld_safe process lifetime and means the total number of restarts during the whole life of the process. That is, at one point it becomes 0 and then on the next crash two months later it will not restart mysqld any more. This is not what people expect, so I hardcoded it to 0 for the time being, so that users don't come to rely on automatic restarts.

To solve this we must reset the counter after every successful cluster join, however I don't know how to achieve that yet, what would be a criterion of "successful cluster join".

Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

As per the Miguel, Alex is right. This is cluster software so things shouldn't just restart automatically. Going to set this to Invalid.

Changed in percona-xtradb-cluster:
status: New → Invalid
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1401

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.