dns timeout didn't result in tempfail

Bug #1914889 reported by Nick Tait
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
opendkim (Ubuntu)
New
Undecided
Unassigned

Bug Description

My mail logs show that OpenDKIM experienced a DNS timeout when validating the DKIM signature on an inbound email, but then accepted the message. As a result the DMARC policy was applied and the message was quarantined.

The expected behaviour in the event of a DNS timeout is that it would use the On-DNSError setting, which defaults to tempfail, and would cause the sending MTA to retry at a later time, and when this happened it is expected that the result of the earlier DNS query would be immediately available because it would have been cached by the DNS server, and so the signature would have been successfully verified and therefore would have passed DMARC check (and the email would have been accepted).

Here is what my mail logs showed:

Jan 28 11:27:33 mx postfix/postscreen[19584]: CONNECT from [168.100.1.4]:14353 to [192.168.20.197]:25
Jan 28 11:27:33 mx postfix/postscreen[19584]: PASS OLD [168.100.1.4]:14353
Jan 28 11:27:33 mx postfix/smtpd[19585]: connect from russian-caravan.cloud9.net[168.100.1.4]
Jan 28 11:27:34 mx postfix/smtpd[19585]: Anonymous TLS connection established from russian-caravan.cloud9.net[168.100.1.4]: TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)
Jan 28 11:27:35 mx policyd-spf[19610]: prepend Received-SPF: None (mailfrom) identity=mailfrom; client-ip=168.100.1.4; helo=russian-caravan.cloud9.net; <email address hidden>; receiver=<UNKNOWN>
Jan 28 11:27:35 mx policyd-spf[19615]: prepend Received-SPF: None (mailfrom) identity=mailfrom; client-ip=168.100.1.4; helo=russian-caravan.cloud9.net; <email address hidden>; receiver=<UNKNOWN>
Jan 28 11:27:35 mx postfix/smtpd[19585]: 5481E5E10A6: client=russian-caravan.cloud9.net[168.100.1.4]
Jan 28 11:27:35 mx postfix/cleanup[19627]: 5481E5E10A6: message-id=<email address hidden>
Jan 28 11:27:40 mx opendkim[1731]: 5481E5E10A6: key retrieval failed (s=202101-e055eb0c, d=patpro.net): '202101-e055eb0c._domainkey.patpro.net' query timed out
Jan 28 11:27:40 mx opendmarc[1534]: implicit authentication service: mx.tait.net.nz
Jan 28 11:27:40 mx opendmarc[1534]: 5481E5E10A6: SPF(mailfrom): <email address hidden> none
Jan 28 11:27:44 mx opendmarc[1534]: 5481E5E10A6: patpro.net fail
Jan 28 11:27:44 mx postfix/cleanup[19627]: 5481E5E10A6: milter-hold: END-OF-MESSAGE from russian-caravan.cloud9.net[168.100.1.4]: milter triggers HOLD action; from=<email address hidden> to=<email address hidden> proto=ESMTP helo=<russian-caravan.cloud9.net>
Jan 28 11:27:44 mx postfix/smtpd[19585]: disconnect from russian-caravan.cloud9.net[168.100.1.4] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7

Here is the content of opendkim.conf (with comments and blank lines removed):

Syslog yes
UMask 007
Domain tait.net.nz
KeyFile /etc/dkimkeys/20201017.private
Selector 20201017
InternalHosts 127.0.0.0/8,::1,192.168.20.192/28
Canonicalization relaxed/simple
AlwaysAddARHeader yes
Socket local:/var/run/opendkim/opendkim.sock
PidFile /var/run/opendkim/opendkim.pid
OversignHeaders From
TrustAnchorFile /usr/share/dns/root.key
UserID opendkim

You can see there is no DNSTimeout setting, meaning it uses the default value of 5 seconds. Also there are no On-... options, so these are also using default settings. My expectation is that a DNS timeout would use On-DNSError default value, which is documented as tempfail. However it would appear that it is using some other setting which defaults to accept?

Version info:

$ lsb_release -rd
Description: Ubuntu 18.04.5 LTS
Release: 18.04

$ apt-cache policy opendkim
opendkim:
  Installed: 2.11.0~alpha-11build1
  Candidate: 2.11.0~alpha-11build1
  Version table:
 *** 2.11.0~alpha-11build1 500
        500 http://nz.archive.ubuntu.com/ubuntu bionic/universe i386 Packages
        100 /var/lib/dpkg/status

As far as I can tell this is a one-off occurrence, and not something I can easily recreate. But hopefully there is enough information here to determine what happened?

Thanks,
Nick.

Nick Tait (nick.t)
description: updated
Revision history for this message
David Bürgin (glts) wrote :

I agree with your analysis. I don’t know how to reproduce this, it is
probably best to report the bug to the upstream project.

Revision history for this message
Nick Tait (nick.t) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.