Barry and I talked about how to tackle this. The following are notes on what we hashed out. All the good stuff is from Barry.
Barry is factoring out a library from Mailman 3 to help with this now. It will determine whether a given email.message.Message is a bounce message, and if so, it will return the email addresses involved and whether the bounce is temporary or permanent.
It is expected to work within a scenario like the following. Launchpad sends VERP-formatted emails. We have a custom LMTP server that gets bounces and (using Barry's library) figures out what's going on. It sends that information to Launchpad, which records the bounce information. A cronscript processes the Launchpad bounce information, and among other things, calculates how the bounces should affect the given user email addresses. A change to the email page allows people to re-enable disabled email addresses, possibly among other changes.
Coordination with IS:
- verify VERP format
- agree on feasibility of running a custom LMTP server in production, and of running more if we need to scale; these need to be able to contact a given port
- as above for staging and qastaging
- discover current rate of bounce emails sent to Launchpad
- Install custom LMTP server
- Install EXIM rules to send all Launchpad bounce messages (including those with VERP addresses) to the LMTP server(s)
Details on other components and changes:
* Create an LMTP server (the stdlib has one in smtpd). It accepts messages and calls Barry's library (see above) with them. For each message that the library identifies as a bounce, it contacts Launchpad via some protocol and port and format (e.g., xmlrpc) to deliver information about it. This information may include the email addresses that bounced; whether the bounce is temporary or permanent; and the bounce message.
* When Launchpad receives a call from the LMTP service, it records the message id, the date, the bounced email addresses, whether the bounce is permanent or temporary, and possibly a context (e.g., the source of the mail that caused the bounce, such as a bug notification), and possibly the message body (to be able to answer user queries; Barry said that Mailman did not do this but people wanted it).
* A cron (or similar) runs to do the following tasks:
- process bounce events, converting them into bounce scores for each address. A bounce score increases by one for every day that gets one or more permanent bounces.
- Decide if an email address should be disabled (e.g., score > 5). If disabled email address is preferred, choose another email to be preferred, if there is one.
- Send probes to disabled emails once a week for N (4?) weeks informing them that the address is disabled, and sending them to the web email page to re-enable.
- Set the bounce score to zero for email addresses that have not increased their score for N (7?) days and that are not disabled.
* Change the email web page for users and admins to deactivate and reactivate emails
* Change sending emails from Launchpad in two ways.
- First, they should include VERP information (e.g., "from = '<email address hidden>'; smtplib.sendmail(from, to, msg)").
- Second, if the SMTP server returns a failure reply code, record this in the same bounce database as used by the Launchpad handler for the LMTP server bounce messages. Barry said that the 450 code and 5xx codes can be regarded as permanent failures, and the other 400 codes as temporary failures (see http://tools.ietf.org/html/rfc5321#section-4.2).
- new tables for bounce information (message id, date, maybe message, is_permanent, context?) and email addresses (message id, address)
- Email addresses have a bounce score, last bounce message id(so we can get the date from the bounce information), last probe date, disabled_status. status is not disabled, disabled for bounces, disabled by user, disabled by admin.