Duplicate messages in 2.0beta6

Bug #265353 reported by Deejster
2
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
High
Barry Warsaw

Bug Description

I'm having problems with some users causing duplicate messages under
2.0beta6 running on Sparc/Solaris 2.7. One particular user has been
approved by the administrator, so it may be related to the problem in
#111176. If so, it hasn't really been fixed.

I haven't thoroughly examined how mailman works, so these may be off base,
but I have three thoughts about what the issue might be.

1) Being written from the web interface causes some errors in certain
situations that cause this glitch (e.g. message can't be rewritten or db
can't be updated?).

2) Some other glitch in qrunner keeping track of particular messages and
their status.

3) Some bizarrity in the header from particular mailers. (in this case OE
5.5.) or some effect of having an email address of <email address hidden> and
having the mail come through a us-based gateway (userbox.ggg.za.isp.com).

Or it could be something else....

Below are some log samples and header info pertaining to a particular
message that was being duplicated. I've done some sanitizing to protect
the innocent (or guilty), but otherwise they are direct cut and paste. I'm
sorry this is so long, but I wanted to provide as much information as I
could.

Post held and approved (from logs/vette). Seems to be fine.

  Oct 15 14:32:04 2000 (21387) list post from <email address hidden> held: Post by
non-member to a members-only list
  Oct 15 15:28:38 2000 (22489) approved held message enqueued:
heldmsg-list-233.txt

Here's the directory listing of the queued file

  -rw-rw-r-- 1 www mailman 226 Oct 15 15:28
9bca74e3a154f8b1bbbe14f59e98937be8e23660.db
  -rw-rw-r-- 1 www mailman 3833 Oct 15 15:28
9bca74e3a154f8b1bbbe14f59e98937be8e23660.msg

Here's the header for the message (as written into ~/qfiles)
after it had been bounced a few times (note the headers hadn't been
rewritten to include list info as one of the later bounces had. This is
why I wondered about option #1 above.)

 From <email address hidden> Sun Oct 15 14:08:26 2000
 Received: from host.ggg.hhh.isp.com (gw.isp.co.za [196.25.198.96])
        by host.ddd.net (8.9.3/8.9.3) with ESMTP id OAA20895
        for <email address hidden>; Sun, 15 Oct 2000 14:08:22 -0600 (MDT)
 Received: from userbox (userbox.ggg.za.isp.com [192.168.1.21])
        by host.ggg.za.isp.com (8.9.3/8.8.7) with SMTP id WAA20111
        for <email address hidden>; Sun, 15 Oct 2000 22:15:00 +0200
 Message-ID: <004701c036e4$3cff94c0$1501a8c0@userbox>
 From: "Some User" <email address hidden>
 To: <email address hidden>
 Subject: Reapeat message list problem.
 Date: Sun, 15 Oct 2000 22:12:24 +0200
 MIME-Version: 1.0
 Content-Type: text/plain;
        charset="iso-8859-1"
 Content-Transfer-Encoding: 7bit
 X-Priority: 3
 X-MSMail-Priority: Normal
 X-Mailer: Microsoft Outlook Express 5.50.4133.2400
 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400

This is a diff between the above header and one that had been created and
written out later. Note the only difference is the addition of the list
info. Should this have been written over the first file or should the
first file have been removed when this was written to prevent bounces?
(excuse my ignorance if this is a silly suggestion)

  dorothy 205: diff *msg
  11,12d10
  < Subject: Reapeat message list problem.
  < Date: Sun, 15 Oct 2000 22:12:24 +0200
  20a19,31
  > Subject: [list] Reapeat message list problem.
  > Sender: <email address hidden>
  > Errors-To: <email address hidden>
  > X-BeenThere: <email address hidden>
  > X-Mailman-Version: 2.0beta6
  > Precedence: bulk
  > List-Help: <mailto:<email address hidden>?subject=help>
  > List-Post: <mailto:<email address hidden>>
  > List-Subscribe: <http://lists.ddd.net/mailman/listinfo/list>,
<mailto:<email address hidden>?subject=subscribe>
  > List-Id: Mailing List for General Issues and Updates
<list.jjj.org>
  > List-Unsubscribe:
<http://lists.ddd.net/mailman/listinfo/list>, <mailto:list-r
<email address hidden>?subject=unsubscribe>
  > List-Archive: http://lists.ddd.net/lists/list/
  > Date: Sun, 15 Oct 2000 22:12:24 +0200
  94a106,108
  > _______________________________________________
  > To control your list membership:

Here's some relevant log entries, as well. Note that I didn't include all
logs as they were somewhat redundant.
Also note that I'm only running qrunner at 30 minute intervals to mitigate
the duplicate message problem.

First, logs/post

  Oct 15 15:37:03 2000 (22555) post to list from <email address hidden>, size=3065,
1 failures
  Oct 15 16:06:43 2000 (23162) post to list from <email address hidden>, size=3065,
1 failures
  Oct 15 16:36:32 2000 (23752) post to list from <email address hidden>, size=3065,
1 failures

From logs/smtp (note that it's not a huge list)

Oct 15 15:37:03 2000 (22555) smtp for 415 recips, completed in 127.695
seconds
Oct 15 16:06:43 2000 (23162) smtp for 415 recips, completed in 185.517
seconds
Oct 15 16:36:32 2000 (23752) smtp for 415 recips, completed in 180.491
seconds

From logs/smtp-failure (note that these are the most frequent type of
failure that we see)

  Oct 15 15:37:03 2000 (22555) 450 <email address hidden> (<email address hidden>... Can
not check MX records for recipient host aaa.com)
  Oct 15 16:06:43 2000 (23162) 450 <email address hidden> (<email address hidden>... Can
not check MX records for recipient host aaa.com)
  Oct 15 16:36:32 2000 (23752) 450 <email address hidden> (<email address hidden>... Can
not check MX records for recipient host aaa.com)

[http://sourceforge.net/tracker/index.php?func=detail&aid=217015&group_id=103&atid=100103]

Revision history for this message
Scoggins (scoggins) wrote :

I too am seeing this problem. Only I'm using beta5. I'm currently upgrading
to Beta6 to see if this stops the problem. I've so far had the mailloop
occure on a moderated list and an open list. I'm running mailman on a
FreeBSD 3.5-STABLE box.

Revision history for this message
Barry Warsaw (barry) wrote :

I need to see any relevant entries in logs/error, including full
tracebacks. I also need to see output from running bin/dumpdb on any
qfiles/*.db files which don't seem to get de-queued.

Revision history for this message
Deejster (deejster) wrote :

dbdump of message that did not get properly dequeued

dorothy 366: ../bin/dumpdb 9b*db
{ 'approved': 1,
    'filebase': '9c6ddb821c6097f990517afd93de276c653ae989',
    'listname': 'jdom-interest',
    'rejection-notice': 'Non-members are not allowed to post messages to
this list.',
    'tolist': 1,
    'version': 2}

dbdump of message that resulted from smtp-failure during processing of
above message.

dorothy 367: ../bin/dumpdb 9c*db
{ 'approved': 1,
    'deliver_until': 972097814.534,
    'filebase': '9c6ddb821c6097f990517afd93de276c653ae989',
    'last_recip_count': 1,
    'listname': 'jjj',
    'original_sender': '<email address hidden>',
    'pipeline': ['SMTPDirect', 'AfterDelivery', 'Acknowledge'],
    'recips': [ '<email address hidden>'],
    'rejection-notice': 'Non-members are not allowed to post messages to
this list.',
    'tolist': 1,
    'version': 2}

There were no entries in logs/error. Is there a way to force a
traceback?

Revision history for this message
Barry Warsaw (barry) wrote :

I'm moving this back to unreproducible because I still haven't been able to
observe the dups behavior in any of the tests I've crafted. :(

Revision history for this message
Deejster (deejster) wrote :

Can you provide any guidance on more information that I can provide to help
you track this down? [Yesteday I had a message that bounced for 6 hours,
then cleaned itself out.]
Are there ways to increase the level of logging so that we might have more
information when it happens again?

Here are some observations that I've made. They may already be obvious to
you, but I'll put them down just in case...

A) Normal message
  1) message.db and message.msg files put in ~/qfiles
  2) qrunner runs, delivers to all recipients
  3) message.db and message.msg deleted

B) Normal message with delivery errors
  1) message.db and message.msg files put in ~/qfiles
  2) qrunner runs, receives an SMTP error for some recips
  3) message.db and message.msg files rewritten with
     updated information
  4) message stays in queue, but delivery attempts are only
     to those addresses that failed. Attempts continue
     until the message is delivered or times out.

C) Message that bounces
  1) message.db and message.msg files put in ~/qfiles
  2) qrunner runs, receives an SMTP error for some recips
     (in all cases this has been "Can not check MX records
     for host somedomain.dom", but that's true for most of
     the errors in B.2 above, too, so the error type is
     probably a red herring.)
  3) updated information written to NEWmessage.db and
     NEWmessage.msg file. Original message.db and
     message.msg files left in queue unmodified.
  4) If a run ever occurs where there are no smtp failures,
     the message will then dequeue. I don't know if these
     messages will time out, because I've never left them
     queued for that long.

I guess my question out of this would be whether there's some (hopefully
known) error processing condition(s) that causes new .db and .msg files to
be written, rather than overwriting the existing files? And, is there maybe
one of these conditions where the keepqueued flag got set?

BTW - I dug out the original SMTP failure from the mail logs
I'm guessing it's not much help, but here it is:

Oct 15 15:36:52 mailhost sendmail[22628]: PAA22628: ruleset=check_rcpt,
arg1=<email address hidden>, relay=localhost [127.0.0.1], reject=450
<email address hidden>... Can not check MX records for recipient host aaa.com

Revision history for this message
Deejster (deejster) wrote :

Is there any help in the fact that the FILEBASE in dbdump of
9bca74e3a154f8b1bbbe14f59e98937be8e23660.db
 (the file that did not get dequeued below)
is not the filebase for itself but for the other file in the queue?

Since it appears that the file was not modified after being enqueued via
administrator approval, is there maybe a bug somewhere in the enqueueing
routines that wrote the wrong filebase?

Revision history for this message
Deejster (deejster) wrote :

More followup on the filebase vs. filename in approved messages.

This seems to be very reproduceable as follows:

1) A message comes in to the system for <list> as
filebase.msg/filebase.db from a non-auto-approved poster.

2) filebase.msg gets moved to data/heldmsg-<list>-##.txt, and
filebase.db gets absorbed into lists/<list>/request.db

3) when approved via the admindb CGI interface, filebase.db gets extracted
from request.db and enqueued as NEWfilebase.db (without updating the
contents), and data/heldmsg-<list>-##.txt gets enqueued as
NEWfilebase.msg

My current best guess it when qrunner processes NEWfilebase*, and
encounters an smtp-failure, they are re-queued as filebase.msg/.db and the
NEWfilebase.msg/.db are overlooked in the dequeueing.

There's likely to be two solutions that'd work. Either qrunner could
check filebase vs. root_filename for consistency and keep track of things
when there are inconsistencies, or the admindb.cgi program should use the
filebase value from lists/<list>/request.db when enqueueing the files
(or updated the NEWfilebase.db file to have filebase=NEWfilebase). (or
maybe both should be done for thoroughness?)

Revision history for this message
Barry Warsaw (barry) wrote :

You latest followup provided the recipe I needed to reproduce the bug, so
I'm re-opening this bug. I believe I have a fix (posted to
mailman-developers and mailman-users).

Revision history for this message
Barry Warsaw (barry) wrote :

Fixed in ListAdmin.py 1.47 (2.0rc2) by zapping the `filebase' attribute in
the reconsitituted msgdata.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.