mailman fails to start with stale pidfile and reports (in logs) the wrong pidfile to remove

Bug #908800 reported by Tom Haddon on 2011-12-26
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Unassigned

Bug Description

Forster recently went awol and needed stabbing. When it came back up, the mailman process wasn't running. I tried manually starting it but was getting an error message very similar to https://pastebin.canonical.com/56096/ which is referenced by David in https://wiki.canonical.com/IncidentReports/2011-11-21-LP-More-Mailman-delays. After trying to start it a few times, each time removing /srv/lists.launchpad.net/var/mailman/data/master-qrunner.pid per the logs, it still was failing.

Then I noticed there was another pidfile - /srv/lists.launchpad.net/var/production-mailman-launchpad.pid. Once this was removed, the mailman process started fine. So, it seems the initscript is giving us misleading information and silently fails if the second pidfile is stale.

Tom Haddon (mthaddon) on 2011-12-26
tags: added: canonical-losa-lp
Changed in launchpad:
importance: Undecided → High
Curtis Hovey (sinzui) wrote :

Didn'rt we see this happen several years ago when mailman shared an LPCONFIG with other production evironments? Has the configs changed recently so that mailman's config also thinks it is the config for other environments?

tags: added: mailman
Changed in launchpad:
status: New → Triaged
Robert Collins (lifeless) wrote :

This would likely be fixed by detangling mailman and using a stock install + our runner

Sean Sosik-Hamor (sciri) wrote :

Ran into this again today when forster rebooted.

Chris Jones (cmsj) wrote :

These sorts of issues can be trivially solved by storing pidfiles in volatile mounts - such as /var/run/

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers