The qrunner-master lock file causes issues when running clustered

Bug #1082308 reported by Axis Communications
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
Low
Mark Sapiro

Bug Description

Hi

It is possible to run mailman in a failover or load balancing cluster, see:
http://wiki.list.org/pages/viewpage.action?pageId=4030621

When running a cluster, it is crucial to use:
* a shared directory for archive data
* a shared directory for locks
* separate directories for each qrunner

This is possible to implement by setting the directories in mm_cfg.py, for example like this (where <host> is a host name):
VAR_PREFIX = '<shared dir>'
LIST_DATA_DIR = os.path.join(VAR_PREFIX, 'lists')
LOCK_DIR = os.path.join(VAR_PREFIX, 'locks')
DATA_DIR = os.path.join(VAR_PREFIX, 'data')
SPAM_DIR = os.path.join(VAR_PREFIX, 'spam')
LOG_DIR = os.path.join(VAR_PREFIX, 'logs-<host>')
PUBLIC_ARCHIVE_FILE_DIR = os.path.join(VAR_PREFIX, 'archives', 'public')
PRIVATE_ARCHIVE_FILE_DIR = os.path.join(VAR_PREFIX, 'archives', 'private')
# For qfiles and logs, <dir>-<host> is used to avoid conflicts
QUEUE_DIR = os.path.join(VAR_PREFIX, 'qfiles-<host>')
INQUEUE_DIR = os.path.join(QUEUE_DIR, 'in')
OUTQUEUE_DIR = os.path.join(QUEUE_DIR, 'out')
CMDQUEUE_DIR = os.path.join(QUEUE_DIR, 'commands')
BOUNCEQUEUE_DIR = os.path.join(QUEUE_DIR, 'bounces')
NEWSQUEUE_DIR = os.path.join(QUEUE_DIR, 'news')
ARCHQUEUE_DIR = os.path.join(QUEUE_DIR, 'archive')
SHUNTQUEUE_DIR = os.path.join(QUEUE_DIR, 'shunt')
VIRGINQUEUE_DIR = os.path.join(QUEUE_DIR, 'virgin')
BADQUEUE_DIR = os.path.join(QUEUE_DIR, 'bad')
RETRYQUEUE_DIR = os.path.join(QUEUE_DIR, 'retry')
MAILDIR_DIR = os.path.join(QUEUE_DIR, 'maildir')

Unfortunately, the master-qrunner lock causes problem with this setup. mailmanctl -s starts even if there is a master-qrunner file (provided that there is no running mailmanctl on the host), making it possible to get the service up and running on more than one host. Once a day however, mailmanctl controls the lock. If it does not have it, it shuts down. If you are running a cluster, at least one of the nodes will not have the lock, and the service will be shut down on that node.

To solve this, I propose that the the LOCKFILE name in mailmanctl becomes configurable, so instead of having:
LOCKFILE = os.path.join(mm_cfg.LOCK_DIR, 'master-qrunner')
Have:
LOCKFILE = os.path.join(mm_cfg.LOCK_DIR, mm_cfg.QRUNNER_LOCK_FILE)

Then add LOCKFILE = 'master-qrunner' in Defaults.py.

This would make it easy to have individual qrunner master lock files for each node in a cluster.

Related branches

Mark Sapiro (msapiro)
Changed in mailman:
assignee: nobody → Mark Sapiro (msapiro)
importance: Undecided → Low
milestone: none → 2.1.16
status: New → Fix Committed
Mark Sapiro (msapiro)
Changed in mailman:
milestone: 2.1.16 → 2.1.16rc1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers