On Thu, Dec 08, 2005 at 10:33:54PM +0100, Florian Weimer wrote:
> * Lionel Elie Mamane:
>
> > On Thu, Dec 08, 2005 at 09:30:52PM +0100, Wouter Verhelst wrote:
> >
> >> The fact that my primary MX is only available through IPv6, and that
> >> this is the case for other people who're having problems too might
> >> then be a better chance at being the problem.
> >
> > My primary MX is IPv6-only, too. I don't have detected a problem yet :)
>
> Do you receive lots of mail from master.debian.org, and would you
> notice the bounces? Mail from Debian mailing lists come directly from
> murphy.debian.org, which does not seem to have the problem.
>
> You also have one IPv4-only MX, which might be enough to prevent the
> Exim bug[1] from occurring.
>
> [1] I'm not sure if it's a Exim's fault, it's only a hunch.
I'm quite sure it's an exim bug, but haven't quite nailed it yet. The
bug has been witnessed positively both on master.d.o and on
one mailserver I maintain. Interestingly, it doesn't seem to be IPv6
related (or maybe there are two bugs).
The situation on my mailserver was that the primary MX had a long term
unavailability and was way past cutoff time, but the secondary MX worked
fine. However, for some reason, what suddenly happened was that all the
mail queued for the domain in question got bounced for reason of having
a extended time of being unreacheable, past the retry time. Obviously,
that's bogus, as the secundary MX wasn't past cutoff yet.
I've meant to look into the code for this, but didn't yet get around to
it. If someone wants to do so, please -- I seriously suspect that Exim
in Sarge has a serious bug in there somewhere, it's showing up with this
IPv6 and IPv4 multihomed MX's too, after all.
I think this is a serious bug, as it can cause mail to get lost
(bouncing a mail for no good reason at all in some very common
situations like the IPv6 vs IPv4 multimhomed MX's)
Log snippets:
# Primary (long time unreacheable) MX is shrek.vanschaik.tk, secundary
# reacheable MX is mailrelay.direct-adsl.nl
2005-11-30 18:35:41 1EhVnA-0002GK-L1 shrek.vanschaik.tk [81.207.193.3]:
No route to host
2005-11-30 18:35:41 1EhVnA-0002GK-L1 == <email address hidden>
<email address hidden> R=dnslookup_relay_to_domains T=remote_smtp defer
(113): No route to host
2005-11-30 18:35:41 1EhVnA-0002GK-L1 ** <email address hidden>
<email address hidden>: retry timeout exceeded
Second failure:
2005-11-30 18:36:43 1EhVrp-0001pB-Jw ** <email address hidden>
<email address hidden> R=dnslookup_relay_to_domains T=remote_smtp:
retry time not reached for any host after a long failure period
Obviously, the secundary MX was okay, so retry timeout exceeded and
especially the second failure should not have happened.
--Jeroen
--
Jeroen van Wolffelaar
<email address hidden> (also for Jabber & MSN; ICQ: 33944357) http://Jeroen.A-Eskwadraat.nl
Message-ID: <email address hidden>
Date: Fri, 9 Dec 2005 01:12:56 +0100
From: Jeroen van Wolffelaar <email address hidden>
To: <email address hidden>
Subject: Possible exim retry bug (Re: master mail problems -- help needed)
Package: exim4-daemon-heavy
Version: 4.50-8
Severity: serious
On Thu, Dec 08, 2005 at 10:33:54PM +0100, Florian Weimer wrote:
> * Lionel Elie Mamane:
>
> > On Thu, Dec 08, 2005 at 09:30:52PM +0100, Wouter Verhelst wrote:
> >
> >> The fact that my primary MX is only available through IPv6, and that
> >> this is the case for other people who're having problems too might
> >> then be a better chance at being the problem.
> >
> > My primary MX is IPv6-only, too. I don't have detected a problem yet :)
>
> Do you receive lots of mail from master.debian.org, and would you
> notice the bounces? Mail from Debian mailing lists come directly from
> murphy.debian.org, which does not seem to have the problem.
>
> You also have one IPv4-only MX, which might be enough to prevent the
> Exim bug[1] from occurring.
>
> [1] I'm not sure if it's a Exim's fault, it's only a hunch.
I'm quite sure it's an exim bug, but haven't quite nailed it yet. The
bug has been witnessed positively both on master.d.o and on
one mailserver I maintain. Interestingly, it doesn't seem to be IPv6
related (or maybe there are two bugs).
The situation on my mailserver was that the primary MX had a long term
unavailability and was way past cutoff time, but the secondary MX worked
fine. However, for some reason, what suddenly happened was that all the
mail queued for the domain in question got bounced for reason of having
a extended time of being unreacheable, past the retry time. Obviously,
that's bogus, as the secundary MX wasn't past cutoff yet.
I've meant to look into the code for this, but didn't yet get around to
it. If someone wants to do so, please -- I seriously suspect that Exim
in Sarge has a serious bug in there somewhere, it's showing up with this
IPv6 and IPv4 multihomed MX's too, after all.
I think this is a serious bug, as it can cause mail to get lost
(bouncing a mail for no good reason at all in some very common
situations like the IPv6 vs IPv4 multimhomed MX's)
Log snippets:
# Primary (long time unreacheable) MX is shrek.vanschaik.tk, secundary direct- adsl.nl
# reacheable MX is mailrelay.
Last succesful delivery:
2005-11-30 17:49:53 1EhV6R-0000uq-Qg shrek.vanschaik.tk [81.207.193.3]: relay_to_ domains T=remote_smtp direct- adsl.nl [195.121.6.56] C="250 2.0.0 jAUGo2wm025502
Connection timed out
2005-11-30 17:50:02 1EhV6R-0000uq-Qg => <email address hidden>
<email address hidden> R=dnslookup_
H=mailrelay.
Message accepted for delivery" QT=3m19s
First failure:
2005-11-30 18:35:41 1EhVnA-0002GK-L1 shrek.vanschaik.tk [81.207.193.3]: relay_to_ domains T=remote_smtp defer
No route to host
2005-11-30 18:35:41 1EhVnA-0002GK-L1 == <email address hidden>
<email address hidden> R=dnslookup_
(113): No route to host
2005-11-30 18:35:41 1EhVnA-0002GK-L1 ** <email address hidden>
<email address hidden>: retry timeout exceeded
Second failure:
2005-11-30 18:36:43 1EhVrp-0001pB-Jw ** <email address hidden> relay_to_ domains T=remote_smtp:
<email address hidden> R=dnslookup_
retry time not reached for any host after a long failure period
Obviously, the secundary MX was okay, so retry timeout exceeded and
especially the second failure should not have happened.
--Jeroen
-- Jeroen. A-Eskwadraat. nl
Jeroen van Wolffelaar
<email address hidden> (also for Jabber & MSN; ICQ: 33944357)
http://