qmgr process loads the system when using rate_* in custom transports

Bug #339823 reported by Santiago Romero
4
Affects Status Importance Assigned to Milestone
postfix (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

 Last month I had a "load average" issue in a postfix mail server (only runs postfix service). Suddenly, load average started to raise and qmgr process appeared on top of "top" taking 20-30% of CPU.

top - 18:19:54 up 7 days, 2:03, 2 users, load average: 4.94, 3.96, 4.02
Tasks: 144 total, 6 running, 138 sleeping, 0 stopped, 0 zombie
Cpu(s): 48.3%us, 50.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Mem: 1035280k total, 999964k used, 35316k free, 149072k buffers
Swap: 750696k total, 88k used, 750608k free, 599308k cached

 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23665 postfix 20 0 5880 2628 1792 S 20.3 0.3 68:11.18 qmgr 23662 root 20 0 5392 1732 1400 R 6.0 0.2 20:49.46 master

Network traffic was low and we had the normal throughput of emails.

Queue had only 73 emails in it when the problem happened (just like now, they are all deferred emails).

Doing "postfix stop" / "postfix start" solved the problem.

I reported the bug to Postfix Users mailing list and postfix's Author (Wietse Venema) found that it was a bug and posted a PATCH in the mailing list

 Some snippets from the list:

---------------------------------------------------------------
VICTOR DUCHOVNI:
Please wait for an updated patch, we believe we have identified the
cause and reproduced the symptoms (in that order). I have a candidate
patch, but I expect Wietse will send an updated more polished version
in the not too distant future.

The issue found applies only to "rate-limited" transports, if you are
not using such transports, you don't need the patch. The patch ensures
that work done at the completion of a delivery with a "normal" transport
is correctly split between "before suspend" and "after resume".

The original 2.5.x code is correct for "oqmgr", but not for "qmgr"
(aka "nqmgr"), which requires additional internal state adjustments
when destinations are blocked and unblocked.

---------------------------------------------------------------
WIETSE VENEMA:

To apply this patch, cd into the Postfix-2.5.* top-level source
directory and execute:

$ patch < thismessage

We were able to reproduce the scheduler looping problem, and it
does not recur with the patched version.

 Wietse

---------------------------------------------------------------

 I applied the patch and the problem didn't happen again, but I need that patch to be integrated into postfix's ubuntu deb packages so that I can still benefit of future security upgrades.

 The patch was submitted at:

Date: Thu, 5 Mar 2009 17:41:51 -0500 (EST)

 Thanks a lot.

Revision history for this message
Santiago Romero (sromero) wrote :

 I'm attaching the patch posted by Wietse Venema at the mailing list.

 I don't know why, but I have 5 rejects applying the patch. The substituted code was OK, but some lines where 2 or 3 lines below the line-id's in the patch :-?

 I corrected the .rej manually and the patch is working. Let me know if you need me to "extract" my "custom" patch as difference from my current sources and the standard postfix ubuntu sources.

Revision history for this message
Scott Kitterman (kitterman) wrote :

Confirmed based on upstream patch. I think we should get this into Jaunty and SRU for Hardy/Intrepid. My imression based on the discussion on the upstream ML is that this doesn't bite a lot of people, but when it bites, it bites hard.

Changed in postfix:
importance: Undecided → Medium
milestone: none → ubuntu-9.04-beta
status: New → Triaged
Revision history for this message
LaMont Jones (lamont) wrote :

Fixed in 2.5.7 upstream

Changed in postfix (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Scott Kitterman (kitterman) wrote :

Fixed in the most recent release.

Changed in postfix (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.