remove 552 exception for SMTP perm/temp

Bug #558128 reported by donnc
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
Low
Mark Sapiro

Bug Description

Sendmail 8.12 uses 552 for "Message size exceeds
fixed maximum message size", and that seems
reasonably consistent with RFC 2821. Since it is not
appropriate to retry on this error, 552 should be
permanent.

Revision history for this message
donnc (donnc-users) wrote :

The file d was added: SMTPDirect.py patch

Revision history for this message
Robert Mathews (rob-launchpad-net) wrote :

I experienced a problem with Mailman 2.1.24 as a result of this.

A list received several large messages in one day that made the size of a digest exceed 90 MB. When Mailman sent the digest, it got back this SMTP error:

 552 5.3.4 Message size exceeds fixed limit

And logged this to logs/smtp-failure:

 SMTP session failure: 552, 5.3.4 Message size exceeds fixed limit, msgid: <...

Unfortunately, the SMTPDirect.py code mentioned in this bug report special-cases "552" errors to treat them like "452" errors, resulting in Mailman trying to redeliver this message in an infinite loop until I removed it from the queue.

The special-casing is apparently to work around a documentation error in RFC 821 that was corrected in RFC 2821: The fear was that MTA authors had followed the documentation error and were returning a 552 instead of a 452 for the case of "too many recipients in this session".

I would expect that few-to-no mail servers in active use have such a problem anymore; the original bug here requesting the removal of the workaround code is 13 years old. At this point, it's very likely that the workaround code merely makes Mailman wrongly retry legitimate 552 "Message size exceeds fixed limit" deliveries over and over.

The obsolete code should probably be removed. A trivial patch against Mailman 2.1.25 is attached.

Revision history for this message
Mark Sapiro (msapiro) wrote :

I will consider "fixing" this for Mailman 2.1.26, but I note that the retry loop is not "infinite". It is one retry every DELIVERY_RETRY_WAIT (default one hour) for DELIVERY_RETRY_PERIOD (default 5 days) before giving up and returning failure. Granted, this is perhaps 719 unnecessary retries, but is far from "infinite".

Also note the problem of an oversize digest can be entirely avoided by setting digest_size_threshhold to a reasonable maximum.

Finally. consider what happens if this is considered a "hard bounce". It means every innocent digest member whose ISP rejects the oversize digest with a 552 will record a bounce for that digest. In extreme cases, it could result in users delivery being disabled and ultimately unsubscribed through no fault of their own.

A similar problem could exist with overly large individual messages, and this can also be easily avoided by setting max_message_size to a reasonable value and not approving excessively large posts.

Changed in mailman:
assignee: nobody → Mark Sapiro (msapiro)
importance: Undecided → Low
status: New → Triaged
Revision history for this message
Robert Mathews (rob-launchpad-net) wrote :

>It is one retry every DELIVERY_RETRY_WAIT (default one hour) for DELIVERY_RETRY_PERIOD
>(default 5 days) before giving up and returning failure. Granted, this is perhaps 719
>unnecessary retries, but is far from "infinite".

Thanks for looking at this!

It's definitely possible I'm just being an idiot here, but DELIVERY_RETRY_WAIT doesn't seem to be working properly. Mine was retrying nonstop, and I had tens of thousands of delivery attempts before I stopped it.

A grep of the code indicates DELIVERY_RETRY_WAIT is defined in "Mailman/Defaults.py.in" but not used anywhere. "Mailman/Queue/OutgoingRunner.py" has some code that is trying to check the "deliver_after" time of a message, but that never gets set anywhere I can see.

Is the code perhaps supposed to set a message's "deliver_after" to "time + DELIVERY_RETRY_WAIT" after a failure?

But you're right that it would have eventually stopped by itself after five days, now that I look more closely at the code. My apologies for the hyperbolic description.

>Also note the problem of an oversize digest can be entirely avoided by setting >digest_size_threshhold to a reasonable maximum.

I agree. It *was* set to a reasonable default for the list in question, and then the list owner changed it to 0 for some reason. Grrr. I may have to patch our copy to prevent this. Perhaps there could be a mm.cfg.py option that allows site administrators to limit the maximum size that can be set, and prevent setting it to 0?

>Finally. consider what happens if this is considered a "hard bounce". It means every
>innocent digest member whose ISP rejects the oversize digest with a 552 will record a bounce
>for that digest. In extreme cases, it could result in users delivery being disabled
>and ultimately unsubscribed through no fault of their own.

Yes, that's a reasonable concern. Of course, the same problem can happen for many other reasons too (digests rejected for spam, etc.). The only thing I can think of is that a hard "maximum digest limit" size set to a reasonable number sitewide would minimize this risk.

However: while searching my logs for how many messages generate a 552 error, I came across this unfortunately common horror:

"host smtp.secureserver.net[68.178.213.203] said: 552 5.2.0 Uczf1w00J3GnVuQ01 - Uczf1w00J3GnVuQ01czf95 This message has been rejected due to content judged to be spam by the internet community IB212"

This is an awful abuse of the SMTP standard; it means that removing the magic "treat 552 as 452" code would trigger bounces in cases like this. So perhaps this bears more thinking about.

Or perhaps the solution is simpler: When DELIVERY_RETRY_PERIOD is reached due to repeated 552 errors treated as 4xx errors, it looks like Mailman currently discards the message without incrementing the bounce count. Perhaps the code should maintain that special "don't increment the bounce score" treatment of 552 errors, but not retry delivery of the message?

Or perhaps this is all irrelevant and the real problem is merely that I noticed high CPU usage on a server because DELIVERY_RETRY_WAIT is broken, and fixing that would make me and others neither notice nor care about the 552 thing.

Revision history for this message
Robert Mathews (rob-launchpad-net) wrote :

Just to add a little more: I checked the code and it appears that DELIVERY_RETRY_WAIT was added in 2.1.2, but then removed in 2.1.3 in favor of the RetryRunner having a 15 minute fixed delay:

 SLEEPTIME = mm_cfg.minutes(15)

Unfortunately it looks like the DELIVERY_RETRY_WAIT constant was not removed then even though it stopped being used in the code. Perhaps this was intended to be:

 SLEEPTIME = mm_cfg.minutes(mm_cfg.DELIVERY_RETRY_WAIT)

Looking more closely at my high CPU usage that triggered all this, it turns out that because we use VERP_PERSONALIZED_DELIVERIES and the list has hundreds of digest members, it was consuming almost that entire 15 minutes trying to send them all before starting again. :-(

Still seems like DELIVERY_RETRY_WAIT should be fixed or removed, though. If it worked I'd increase it to several hours in my situation.

Revision history for this message
Mark Sapiro (msapiro) wrote :

You are correct about DELIVERY_RETRY_WAIT being ignored. It could be implemented by changing line 33 of Mailman/Queue/RetryRunner.py from

    SLEEPTIME = mm_cfg.minutes(15)

to

    SLEEPTIME = mm_cfg.DELIVERY_RETRY_WAIT

however that doesn't explain why in your case the retries were continuous as they should be only once in 15 minutes.

> Is the code perhaps supposed to set a message's "deliver_after" to "time + DELIVERY_RETRY_WAIT" after a failure?

That would be another (and better) way to implement DELIVERY_RETRY_WAIT and may be the intent of deliver_after as it currently isn't set anywhere.

It would also avoid the possibility of too frequent retrying if somehow the time.sleep call is interrupted.

I have opened https://bugs.launchpad.net/mailman/+bug/1729472 for the issue of DELIVERY_RETRY_WAIT being ignored and I will fix that. I'm marking this "won't fix" as I don't intend to change the treatment of 552, at least for now.

Changed in mailman:
status: Triaged → Won't Fix
Revision history for this message
Robert Mathews (rob-launchpad-net) wrote :

>that doesn't explain why in your case the retries were continuous as they should be only once
>in 15 minutes.

I'm pretty sure it was because we use VERP_PERSONALIZED_DELIVERIES and the list has hundreds of digest members. It took it most of the 15 minutes to sequentially try to deliver the hundreds of separate copies to different recipients, logging a separate smtp-failure.log error message for each. In the log it looks like it's going constantly, but now I see that there are sometimes gaps of a couple of minutes.

Anyway, thanks for opening the other bug.

Revision history for this message
Mark Sapiro (msapiro) wrote :

> I'm pretty sure it was because we use VERP_PERSONALIZED_DELIVERIES and the list has hundreds of digest members.

Right, I missed that in https://bugs.launchpad.net/mailman/+bug/558128/comments/5 when I wrote https://bugs.launchpad.net/mailman/+bug/558128/comments/6

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers