Segfaults on verify callout, in _gnutls_trust_list_get_issuer

Bug #1974214 reported by Malcolm Scott
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
exim
Fix Released
Unknown
exim4 (Debian)
Fix Released
Unknown
exim4 (Ubuntu)
Fix Released
High
Sergio Durigan Junior
Jammy
Fix Released
High
Sergio Durigan Junior
Kinetic
Fix Released
High
Sergio Durigan Junior

Bug Description

[ Impact ]

When the user sends a message to someone, if the server responsible for receiving this message defers it, and if there are other possible servers (i.e., other servers listed as secondary MX) to try, exim4 will segfault while trying to connect to the second server.

[ Test Plan ]

The test case for this bug is a bit involved. It makes use of the upstream reporter's mail server, which has been configured to defer emails when they come through the primary MX, but accept when they come through the secondary MX. This means that you will need access to port 25 (unfortunately canonistack seems to block it).

$ lxc launch ubuntu-daily:jammy exim4-bug1974214
$ lxc shell exim4-bug1974214
# apt update && apt full-upgrade
# apt install -y exim4
# dpkg-reconfigure exim4-config
... In the "Mail Server configuration" screen, select "internet site; mail is sent and received directly using SMTP". Leave everything else untouched.
# cat > /etc/netplan/99-disable-ipv6.yaml << _EOF_
network:
  ethernets:
    eth0:
      link-local: [ ipv4 ]
_EOF_
# netplan apply
# reboot
$ lxc shell exim4-bug1974214
# cat > 1.msg << _EOF_
Subject: test

this is a test
_EOF_
# exim4 -odq -f <email address hidden> <email address hidden> < 1.msg
# exim4 -bp
 0m 321 1nxC3o-0000Ax-AS <email address hidden>
          <email address hidden>

... You will have to grab the message ID, which is 1nxC3o-0000Ax-AS in this case. You have to use this ID in the following command.

# exim4 -d+all -q 1nxC3o-0000Ax-AS 2>&1 | tee /tmp/exim.debug
...
# grep SEGV /tmp/exim.debug

You should be able to see exim4 signalling the segmentation fault that occurred while attempting to connect to the second server.

[ Where problems could occur ]

The patches, albeit well contained and relatively simple, touch code that deals with TLS and security. There is always the risk of introducing an unwanted vulnerability or normal regression here. If that happens, the very first thing we need to do is revert the patches and work with upstream to develop a fix.

In the unlikely case that we encounter regressions, they are probably going to affect TLS connections when sending/receiving messages. Email servers nowadays generally offer encrypted connections (via TLS or STARTTLS), and some still offer plaintext as well. If there is a problem with TLS and exim4 is configured to fallback to plaintext, things will still work assuming that the other end also talks plaintext. Otherwise, we might see reports of undelivered emails.

Finally, the fix is composed of two patches. The first one prevents exim4 from discarding the cached credentials when the transport connection with the primary MX closes, and the second resets headers before trying to connect to the secondary MX.

[ Original Description ]

We are experiencing segfaults in exim since upgrading from impish (4.94.2-7ubuntu2 with libgnutls30 3.7.1-5ubuntu1) to jammy (4.95-4ubuntu2 with libgnutls30 3.7.3-4ubuntu1), in _gnutls_trust_list_get_issuer, seemingly in the sender/recipient verify callout during message submission.

Typically the initial attempt to submit a message crashes an exim child thread, but the same message is accepted when the sender retries.

gdb backtrace:

Thread 2.1 "exim4" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe2f844d080 (LWP 29278)]
0x00007fe2f8f3eb2b in _gnutls_trust_list_get_issuer (flags=<optimised out>, issuer=<optimised out>, cert=<optimised out>, list=<optimised out>) at x509/../../../lib/x509/verify-high.c:1026
1026 x509/../../../lib/x509/verify-high.c: No such file or directory.
(gdb) bt
#0 0x00007fe2f8f3eb2b in _gnutls_trust_list_get_issuer (flags=<optimised out>, issuer=<optimised out>, cert=<optimised out>,
    list=<optimised out>) at x509/../../../lib/x509/verify-high.c:1026
#1 gnutls_x509_trust_list_get_issuer (list=list@entry=0x55ef6bd9c260, cert=0x55ef6bd9be20, issuer=issuer@entry=0x7ffc82dba510,
    flags=flags@entry=16) at x509/../../../lib/x509/verify-high.c:1129
#2 0x00007fe2f8f3f679 in gnutls_x509_trust_list_verify_crt2 (list=0x55ef6bd9c260, cert_list=0x7ffc82dba5c0,
    cert_list_size=<optimised out>, data=<optimised out>, elements=<optimised out>, flags=33554432, voutput=0x7ffc82dba888, func=0x0)
    at x509/../../../lib/x509/verify-high.c:1522
#3 0x00007fe2f8ed7516 in _gnutls_x509_cert_verify_peers (status=0x7ffc82dba888, elements=0, data=0x0, session=0x55ef6c0c1150)
    at ../../lib/cert-session.c:597
#4 gnutls_certificate_verify_peers (session=0x55ef6c0c1150, data=data@entry=0x0, elements=elements@entry=0,
    status=status@entry=0x7ffc82dba888) at ../../lib/cert-session.c:776
#5 0x00007fe2f8ed8000 in gnutls_certificate_verify_peers2 (session=<optimised out>, status=status@entry=0x7ffc82dba888)
    at ../../lib/cert-session.c:653
#6 0x000055ef6b7698ef in verify_certificate (state=<optimised out>, errstr=0x7ffc82dbaa20)
    at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/tls-gnu.c:2519
#7 0x000055ef6b7a5d7b in tls_client_start.constprop.0 (cctx=cctx@entry=0x55ef6be0e688, conn_args=conn_args@entry=0x55ef6bdfe5f8,
    tlsp=0x55ef6b7f59c0 <tls_out>, errstr=errstr@entry=0x7ffc82dbaa20, cookie=<optimised out>)
    at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/tls-gnu.c:3593
#8 0x000055ef6b78b0ef in smtp_setup_conn (sx=0x55ef6bdfe5e8, suppress_tls=<optimised out>) at transports/smtp.c:2673
#9 0x000055ef6b776350 in do_callout (pm_mailfrom=<optimised out>, se_mailfrom=<optimised out>, options=<optimised out>,
    callout_connect=<optimised out>, callout_overall=<optimised out>, callout=<optimised out>, tf=0x7ffc82dbbc10,
    host_list=<optimised out>, addr=0x7ffc82dbbdd0)
    at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/verify.c:677
#10 verify_address (vaddr=<optimised out>, fp=<optimised out>, options=<optimised out>, callout=<optimised out>,
    callout_overall=<optimised out>, callout_connect=<optimised out>, se_mailfrom=<optimised out>, pm_mailfrom=<optimised out>,
    routed=<optimised out>) at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/verify.c:1947
#11 0x000055ef6b6f1660 in acl_verify (where=where@entry=0, addr=addr@entry=0x7ffc82dbc5e0,
    arg=0x55ef6babc2b8 "recipient/defer_ok/callout=30s,defer_ok,use_postmaster", user_msgptr=user_msgptr@entry=0x7ffc82dbca50,
    log_msgptr=log_msgptr@entry=0x7ffc82dbca58, basic_errno=basic_errno@entry=0x7ffc82dbc38c)
    at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/acl.c:2168
#12 0x000055ef6b6f479e in acl_check_condition (level=<optimised out>, basic_errno=0x7ffc82dbc38c, log_msgptr=<optimised out>,
    user_msgptr=<optimised out>, epp=<synthetic pointer>, addr=<optimised out>, where=<optimised out>, cb=0x55ef6babc298,
    verb=<optimised out>) at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/acl.c:3838
#13 acl_check_internal (where=where@entry=0, addr=addr@entry=0x7ffc82dbc5e0, s=s@entry=0x55ef6bab9990 "acl_check_rcpt",
    user_msgptr=user_msgptr@entry=0x7ffc82dbca50, log_msgptr=log_msgptr@entry=0x7ffc82dbca58)
    at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/acl.c:4225
#14 0x000055ef6b6f7b9e in acl_check (where=0, recipient=<optimised out>, s=0x55ef6bab9990 "acl_check_rcpt",
    user_msgptr=0x7ffc82dbca50, log_msgptr=0x7ffc82dbca58)
    at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/acl.c:4539
#15 0x000055ef6b75c2fd in smtp_setup_msg () at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/smtp_in.c:5283
#16 0x000055ef6b6e5cda in handle_smtp_call (accepted=0x7ffc82dbceb0, accept_socket=<optimised out>,
    listen_socket_count=<optimised out>, listen_sockets=<optimised out>)
    at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/daemon.c:551
#17 daemon_go () at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/daemon.c:2594
#18 main (argc=<optimised out>, cargv=<optimised out>)
    at /build/exim4-sMcKLv/exim4-4.95/b-exim4-daemon-light/build-Linux-x86_64/exim.c:4947

Related branches

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Created attachment 1414
exim4-daemon-light 4.96~RC0-1

Versions: 4.94 good, 4.9[56] bad.
OS: Debian testing x86_64
TLS: gnutls 3.7.4-2, haven't tried OpenSSL

This doesn't happen in an immediate delivery attempt or in "exim -M", but in a queue runner, if the first remote server responds with a deferral, exim crashes some time when talking to the next server. It can happen in tls_close() (after the message was successfully delivered, if the remote parry allows, or after getting a deferral): smtp_deliver -> tls_close -> gnutls_certificate_free_credentials -> gnutls_x509_trust_list_deinit,

or it can be: smtp_deliver -> smtp_setup_conn -> tls_client_start -> verify_certificate -> gnutls_certificate_verify_peers2 -> gnutls_certificate_verify_peers -> _gnutls_x509_cert_verify_peers -> gnutls_x509_trust_list_verify_crt2 -> gnutls_x509_trust_list_get_issuer -> _gnutls_trust_list_get_issuer

And on one occasion it crashed in arc_sign() (on an exim thus built)

But long story short, it seems like exim would consistently crash, SIGFPE or SIGSEGV, during a subsequent delivery attempt after a deferral response.

The real life circumstances are a gmail account over quota or a forwarded message graylisted by gmail or such. But I am reproducing this by simply configuring my own exim servers to defer.

Attached is a log excerpt and backtrace from Debian's exim4-daemon-light 4.96~RC0-1

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Created attachment 1415
backtrace for crash during ARC signing

Adding a backtrace for crash in arc_sign()

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

The deferrals are either in response to RCPT TO (gmail over quota) or post DATA (suspicious content)

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

This needs following up; we can't trust that bt

> warning: core file may not match specified executable file.

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

If I go out of my way to invoke "/usr/sbin/exim4 -q" when causing the crash, that message is not displayed.
On Debian, /usr/sbin/exim -> exim4. One naturally uses "exim -q", and gdb gives that message, every single time. The rest of the bt seems unaffected.

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Either way I think this should be very easily reproducible, I've tried several boxes, several custom builds, with or without ARC, DMARC, MySQL and so on, exim 4.95 and 4.96, the only thing in common was that I was using Debian's packaging and stuck with gnutls.

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

I wonder if your build is failing to null-fill not-specifically-initialized
file-scope statics?

#0 0x00005620ff359503 in arc_sign (signspec=<optimized out>, sigheaders=0x5621001fc580, errstr=errstr@entry=0x7ffd081d0980) at ./b-exim4-daemon-custom/build-Linux-x86_64/arc.c:1663

   1660 if ((rheaders = arc_sign_scan_headers(&arc_sign_ctx, sigheaders)))
   1661 {
   1662 hdr_rlist ** rp;
   1663 for (rp = &headers_rlist; *rp; ) rp = &(*rp)->prev;
   1664 *rp = rheaders;
   1665 }

What do "p *rp" and "p headers_rlist" say for that core?
(On that theory, an "=NULL" at line 93 would help. But only for the arcsigning
case).

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

I didn't keep that exact same core, but I have a new one that seems equivalent.

#0 0x0000559e51e99993 in arc_sign (signspec=<optimized out>, sigheaders=0x559e529bb580, errstr=errstr@entry=0x7ffcc0624e20) at ./b-exim4-daemon-custom/build-Linux-x86_64/arc.c:1663
        rp = 0x672e6c69616d3d48
        identity = 0x559e529bb8a8 "****.com"
        selector = 0x559e529bb8c8 "arc-20220506"
        privkey = 0x559e525ad2f8 "-----BEGIN PRIVATE KEY-----\n"...
        opts = 0x559e529bb92a ""
        s = <optimized out>
        options = <optimized out>
        sep = 58
        headers = <optimized out>
        rheaders = 0x559e529bb950
        ar = {data = <optimized out>, len = <optimized out>}
        instance = <optimized out>
        g = 0x0
        b = <optimized out>
        __FUNCTION__ = "arc_sign"
        ret_sigheaders = <optimized out>

(gdb) p *rp
Cannot access memory at address 0x672e6c69616d3d48
(gdb) p headers_rlist
$1 = (hdr_rlist *) 0x559e52b48d78

Does this help?
Please talk to me as you would to a little child, as you find necessary :-)

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

See if you can repro with that NULL-init.

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

--- exim4-4.96~RC0.orig/src/arc.c
+++ exim4-4.96~RC0/src/arc.c
@@ -90,7 +90,7 @@ typedef struct arc_ctx {

 static time_t now;
 static time_t expire;
-static hdr_rlist * headers_rlist;
+static hdr_rlist * headers_rlist = NULL;
 static arc_ctx arc_sign_ctx = { NULL };
 static arc_ctx arc_verify_ctx = { NULL };

Core was generated by `/usr/sbin/exim4 -q'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005599b240c993 in arc_sign (signspec=<optimized out>, sigheaders=0x5599b2fea580, errstr=errstr@entry=0x7ffeafba7af0) at ./b-exim4-daemon-custom/build-Linux-x86_64/arc.c:1663
1663 ./b-exim4-daemon-custom/build-Linux-x86_64/arc.c: No such file or directory.
(gdb) set pagination off
(gdb) bt full
#0 0x00005599b240c993 in arc_sign (signspec=<optimized out>, sigheaders=0x5599b2fea580, errstr=errstr@entry=0x7ffeafba7af0) at ./b-exim4-daemon-custom/build-Linux-x86_64/arc.c:1663
        rp = 0x672e6c69616d3d48
        identity = 0x5599b2fea8a8 "****.com"
        selector = 0x5599b2fea8c8 "arc-20220506"
        privkey = 0x5599b2bdc2f8 "-----BEGIN PRIVATE KEY-----\n"...
        opts = 0x5599b2fea92a ""
        s = <optimized out>
        options = <optimized out>
        sep = 58
        headers = <optimized out>
        rheaders = 0x5599b2fea950
        ar = {data = <optimized out>, len = <optimized out>}
        instance = <optimized out>
        g = 0x0
        b = <optimized out>
        __FUNCTION__ = "arc_sign"
        ret_sigheaders = <optimized out>

(gdb) p *rp
Cannot access memory at address 0x672e6c69616d3d48
(gdb) p headers_rlist
$1 = (hdr_rlist *) 0x5599b3177d78

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :
Download full text (4.0 KiB)

Some more details on the ARC issue.

If I queue the message with "-oMa 25.25.25.25 -odq", the queue runner crashes as follows which seems to be the "normal" and "correct" way to crash, as would real messages coming from the Internet.

#0 0x00007f56f3ac1af9 in gnutls_x509_trust_list_deinit (list=0x55876d7fe0b0, all=1) at ../../../lib/x509/verify-high.c:213
        i = <optimized out>
        j = 0
#1 0x00007f56f3a300cb in gnutls_certificate_free_credentials (sc=0x55876d7ff2b0) at ../../lib/cert-cred.c:403
No locals.
#2 0x000055876bf9c1c7 in tls_close (ct_ctx=0x55876d5701a0, do_shutdown=do_shutdown@entry=2) at ./b-exim4-daemon-custom/build-Linux-x86_64/tls-gnu.c:3777
        state = 0x55876d5701a0
        tlsp = 0x55876c02e3e0 <tls_out>
        __FUNCTION__ = "tls_close"
#3 0x000055876bfc9c79 in smtp_deliver (addrlist=addrlist@entry=0x55876d55f988, host=host@entry=0x55876dac5838, host_af=host_af@entry=2, defport=<optimized out>, interface=<optimized out>, tblock=tblock@entry=0x55876d56f4f8, message_defer=<optimized out>, suppress_tls=<optimized out>) at ./b-exim4-daemon-custom/build-Linux-x86_64/transports/smtp.c:4850
        n = <optimized out>
        ob = <optimized out>
        yield = <optimized out>
        save_errno = 1812207714
        rc = <optimized out>
        message = 0x0
        new_message_id = "\360+sJ\376\177\000\000\000\000\000\000\000\000\000\000@"
        sx = 0x55876da3a130
        __FUNCTION__ = "smtp_deliver"
        pass_message = 0
        dane_held = <optimized out>
        tcw_done = 1
        tcw = 0
        SEND_MESSAGE = <optimized out>

The ARC crash happens because I omit -oMa 25.25.25.25 and the message is thus locally submitted. Per the configuration, DKIM signing occurs (which is not what would normally happen). The message injected (locally or with -oMa) does not have any Authentication-Results header.

2022-05-12 16:49:21 1npC09-0007sc-1D <= ***@***.com U=root P=local S=4573 id=***
2022-05-12 16:49:46 1npC09-0007sc-1D ARC: no Authentication-Results header for signing
2022-05-12 16:49:46 1npC09-0007sc-1D H=mail.gedalya.net [******]: SMTP error from remote mail server after pipelined end of data: 451 Temporary local problem - please try later
2022-05-12 16:49:46 1npC09-0007sc-1D H=mx2.gedalya.net [******] Network is unreachable
2022-05-12 16:49:46 1npC09-0007sc-1D SIGSEGV (fault address: (nil))
2022-05-12 16:49:46 1npC09-0007sc-1D SIGSEGV (null pointer indirection)
2022-05-12 16:49:46 1npC09-0007sc-1D SIGSEGV (30343 delivering 1npC09-0007sc-1D to mx2.gedalya.net [****] (<email address hidden>)
2022-05-12 16:49:46 1npC09-0007sc-1D Delivery status for <email address hidden>: got 0 of 7 bytes (pipeheader) from transport process 30343 for transport smtp
2022-05-12 16:49:46 1npC09-0007sc-1D == <email address hidden> R=dnslookup T=remote_smtp defer (-1): smtp transport process returned non-zero status 0x008b: terminated by signal 11

Running exim -q -d :

[attempt first server]
...
ARC: requesting bodyhash
DKIM: new bodyhash sha256/simple/-1
dkim signing direct-mode
...
GnuTLS<3>: ASSERT: ../../../lib/nettle/mpi.c[wrap_nettle_mpi_print]:60
DKIM [***.com] b computed: xx....xx
ARC: sign for ****.com
LOG: MAIN
  ARC: no A...

Read more...

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

Please say exactly what the commandline and the message headers submitted
were, for the non-oMa case.

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

# exim -odq -f <email address hidden> <email address hidden> < 2.msg

# cat 2.msg
Subject: test

this is a test

# exim -q 1npxYE-0005R6-18 1npxYE-0005R6-18
2022-05-14 19:35:58 1npxYE-0005R6-18 SIGSEGV (fault address: 0x402a)
2022-05-14 19:35:58 1npxYE-0005R6-18 SEGV_MAPERR
2022-05-14 19:35:58 1npxYE-0005R6-18 SIGSEGV (maybe attempt to write to immutable memory)
2022-05-14 19:35:58 1npxYE-0005R6-18 SIGSEGV (20909 delivering 1npxYE-0005R6-18 to mx2.gedalya.net [***] (<email address hidden>)
)
2022-05-14 19:35:58 1npxYE-0005R6-18 Delivery status for <email address hidden>: got 0 of 7 bytes (pipeheader) from transport process 20909 for transport smtp

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

Works for me.

Testsuite script:

exim -odq -f <email address hidden> <email address hidden>
Subject: test

this is a test

****
exim -d+all -q $msg1
****

Debug output section:

21:04:16 20777 will pipeline QUIT
21:04:16 20777 dkim signing direct-mode
21:04:16 20777 DKIM >> Body data for hash, canonicalized >>>>>>>>>>>>>>>>>>>>>>>>>>>>
21:04:16 20777 this{SP}is{SP}a{SP}test{CR}{LF}
21:04:16 20777 DKIM <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
21:04:16 20777 DKIM: finish bodyhash sha256/simple/-1 len 16
21:04:16 20777 DKIM: no signatures
21:04:16 20777 DKIM: no signatures to use
21:04:16 20777 ARC: sign for test.ex
21:04:16 20777 LOG: MAIN
21:04:16 20777 ARC: no Authentication-Results header for signing
21:04:16 20777 SMTP+> BDAT 332 LAST
21:04:16 20777 cmd buf flush 86 bytes (more expected)
21:04:16 20777 cannot use sendfile for body: spoolfile not wireformat
21:04:16 20777 writing data block fd=8 size=332 timeout=300 (more expected)
21:04:16 20777 SMTP+> QUIT
21:04:16 20777 cmd buf flush 6 bytes (more expected)
21:04:16 20777 SMTP(shutdown)>>
21:04:16 20777 sync_responses expect mail
21:04:16 20777 read response data: size=114
21:04:16 20777 SMTP<< 250 OK
21:04:16 20777 sync_responses expect rcpt for <email address hidden>
21:04:16 20777 SMTP<< 250 Accepted
21:04:16 20777 SMTP<< 250- 332 byte chunk, total 332
21:04:16 20777 250 OK id=1npxzs-0005P8-27
21:04:16 20777 S:journalling <email address hidden>
21:04:16 20777 ok=1 send_quit=0 send_rset=0 continue_more=0 yield=0 first_address is NULL
21:04:16 20777 SMTP<< 221 test.ex closing connection
21:04:16 20777 SMTP(close)>>
21:04:16 20777 cmdlog: '220:EHLO:250-:MAIL|:RCPT|:BDAT:QUIT:250:250:250-:221'
21:04:16 20777 set_process_info: 20777 delivering 1npxzr-0005Oz-2L: just tried 127.0.0.1 [127.0.0.1]:1225 for <email address hidden>: result OK
21:04:16 20777 Leaving tsmtp transport
21:04:16 20777 set_process_info: 20777 delivering 1npxzr-0005Oz-2L (just run tsmtp for <email address hidden> in subprocess)

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

It worked for you first of all in the sense that the remote party did not defer, which makes the test irrelevant.

The remote party which accepted your message does not seem to be my server, nor was "<email address hidden>" the *exact* sender address which my ACL is configured to defer, despite your request to be exact, but that has been added now.

In either case I really did mean it when I said this bug is triggered by the first remote server responding with a deferral and you're now more than welcome to test against my own servers by sending a message from <email address hidden> to <email address hidden>,

$ dig +short gedalya.net mx
10 mail.gedalya.net. <-- will defer
20 mx2.gedalya.net. <-- will accept, but the sending queue runner should crash

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

Still can't duplicate locally.

The first conn goes
cmdlog: '220:EHLO:250-:STARTTLS:220:EHLO:250-:MAIL|:RCPT|:BDAT:QUIT:250:451:503-:221'

The second:
22:48:10 25726 will pipeline QUIT
22:48:10 25726 dkim signing direct-mode
22:48:10 25726 DKIM >> Body data for hash, canonicalized >>>>>>>>>>>>>>>>>>>>>>>>>>>>
22:48:10 25726 this{SP}is{SP}a{SP}test{CR}{LF}
22:48:10 25726 DKIM <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
22:48:10 25726 DKIM: finish bodyhash sha256/simple/-1 len 16
22:48:10 25726 DKIM: no signatures
22:48:10 25726 DKIM: no signatures to use
22:48:10 25726 ARC: sign for test.ex
22:48:10 25726 LOG: MAIN
22:48:10 25726 ARC: no Authentication-Results header for signing
22:48:10 25726 SMTP+> BDAT 332 LAST
22:48:10 25726 cmd buf flush 86 bytes (more expected)
22:48:10 25726 gnutls_record_cork(session=0x5590af3a31a0)
22:48:10 25726 tls_write(0x5590af989bb8, 86, more)
22:48:10 25726 gnutls_record_send(session=0x5590af3a31a0, buffer=0x5590af989bb8, left=86)
22:48:10 25726 outbytes=86
22:48:10 25726 cannot use sendfile for body: spoolfile not wireformat
22:48:10 25726 writing data block fd=6 size=332 timeout=300 (more expected)
22:48:10 25726 tls_write(0x5590af38a048, 332, more)
22:48:10 25726 gnutls_record_send(session=0x5590af3a31a0, buffer=0x5590af38a048, left=332)
22:48:10 25726 outbytes=332
22:48:10 25726 SMTP>> QUIT
22:48:10 25726 cmd buf flush 6 bytes
22:48:10 25726 tls_write(0x5590af989bb8, 6)
22:48:10 25726 gnutls_record_send(session=0x5590af3a31a0, buffer=0x5590af989bb8, left=6)
22:48:10 25726 outbytes=6
22:48:10 25726 gnutls_record_uncork(session=0x5590af3a31a0)
22:48:10 25726 GnuTLS<2>: FIPS140-2 context is not set
22:48:10 25726 sync_responses expect mail
22:48:10 25726 Calling gnutls_record_recv(session=0x5590af3a31a0, buffer=0x5590af988bb8, len=4096)
22:48:10 25726 GnuTLS<2>: FIPS140-2 context is not set
22:48:10 25726 read response data: size=114
22:48:10 25726 SMTP<< 250 OK
22:48:10 25726 sync_responses expect rcpt for <email address hidden>
22:48:10 25726 SMTP<< 250 Accepted
22:48:10 25726 SMTP<< 250- 332 byte chunk, total 332
22:48:10 25726 250 OK id=1npzcQ-0006h2-32
22:48:10 25726 S:journalling <email address hidden>
22:48:10 25726 ok=1 send_quit=0 send_rset=0 continue_more=0 yield=0 first_address is NULL
22:48:10 25726 SMTP<< 221 test.ex closing connection
22:48:10 25726 Calling gnutls_record_recv(session=0x5590af3a31a0, buffer=0x5590af988bb8, len=4096)
22:48:10 25726 GnuTLS<2>: FIPS140-2 context is not set
22:48:10 25726 GnuTLS<3>: ASSERT: record.c[_gnutls_recv_in_buffers]:1589
22:48:10 25726 Got TLS_EOF
22:48:10 25726 tls_close(): shutting down TLS (with response-wait)
22:48:10 25726 tls_write((nil), 0)
22:48:10 25726 GnuTLS<3>: ASSERT: buffers.c[_gnutls_io_write_flush]:696
22:48:10 25726 GnuTLS<2>: FIPS140-2 context is not set
22:48:10 25726 SMTP(close)>>
22:48:10 25726 cmdlog: '220:EHLO:250-:STARTTLS:220:EHLO:250-:MAIL|:RCPT|:BDAT:QUIT:250:250:250-:221'
22:48:10 25726 set_process_info: 25726 delivering 1npzcP-0006gi-0q: just tried 127.0.0.1 [127.0.0.1]:1225 for <email address hidden>: result OK
22:48:10 25726 Leaving gsmtp transport

Revision history for this message
In , Eximusers-i (eximusers-i) wrote :

Managed to reproduce this with very vanilla exim in a Debian sid chroot:
----------------------
ametzler@argenau:/tmp/EXIM-from-source/exim-4.96-RC0$ grep -E -v '^#|^[[:space:]]*$' Local/Makefile
BIN_DIRECTORY=/usr/exim/bin
CONFIGURE_FILE=/usr/exim/configure
EXIM_USER=mail
SPOOL_DIRECTORY=/var/spool/exim
USE_GNUTLS=yes
USE_GNUTLS_PC=gnutls gnutls-dane
ROUTER_ACCEPT=yes
ROUTER_DNSLOOKUP=yes
ROUTER_IPLITERAL=yes
ROUTER_MANUALROUTE=yes
ROUTER_QUERYPROGRAM=yes
ROUTER_REDIRECT=yes
TRANSPORT_APPENDFILE=yes
TRANSPORT_AUTOREPLY=yes
TRANSPORT_PIPE=yes
TRANSPORT_SMTP=yes
LOOKUP_DBM=yes
LOOKUP_LSEARCH=yes
LOOKUP_DNSDB=yes
PCRE2_CONFIG=yes
SUPPORT_DANE=yes
DISABLE_MAL_AVE=yes
DISABLE_MAL_KAV=yes
DISABLE_MAL_MKS=yes
FIXED_NEVER_USERS=root
AUTH_CRAM_MD5=yes
HEADERS_CHARSET="ISO-8859-1"
SYSLOG_LOG_PID=yes
EXICYCLOG_MAX=10
COMPRESS_COMMAND=/usr/bin/gzip
COMPRESS_SUFFIX=gz
ZCAT_COMMAND=/usr/bin/zcat
SYSTEM_ALIASES_FILE=/etc/aliases
EXIM_TMPDIR="/tmp"
----------------------

/usr/exim/configure is unmodified.

(eximtest)root@argenau:/# /usr/exim/bin/exim -odq -f <email address hidden> <email address hidden> < /tmp/2.msg
(eximtest)root@argenau:/# /usr/exim/bin/exim -bp
 0m 312 1nqCu1-0003d6-07 <email address hidden>
          <email address hidden>
(eximtest)root@argenau:/# /usr/exim/bin/exim -d+all -q 1nqCu1-0003d6-07 2>&1 | tee /tmp/exim.debug
[... - will attach]
(eximtest)root@argenau:/# /usr/exim/bin/exim -bp
 1m 312 1nqCu1-0003d6-07 <email address hidden> *** frozen ***
        D <email address hidden>

(eximtest)root@argenau:/# cat /var/spool/exim/log/paniclog
2022-05-15 12:00:14 1nqCu1-0003d6-07 SIGSEGV (fault address: 0x1)
2022-05-15 12:00:14 1nqCu1-0003d6-07 SEGV_MAPERR
2022-05-15 12:00:14 1nqCu1-0003d6-07 SIGSEGV (null pointer indirection)
2022-05-15 12:00:14 1nqCu1-0003d6-07 SIGSEGV (13972 delivering 1nqCu1-0003d6-07 to mx2.gedalya.net [104.131.53.251] (<email address hidden>)
)
2022-05-15 12:00:14 1nqCu1-0003d6-07 Delivery status for <email address hidden>: got 0 of 7 bytes (pipeheader) from transport process 13972 for transport smtp

Revision history for this message
In , Eximusers-i (eximusers-i) wrote :

Created attachment 1416
(unpatched) exim debug output

Revision history for this message
In , Eximusers-i (eximusers-i) wrote :

(In reply to Andreas Metzler from comment #16)
> Managed to reproduce this with very vanilla exim in a Debian sid chroot:

Version 4.69RC0, built/installed with

env CFLAGS='-D_LARGEFILE_SOURCE -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2' LFLAGS='-Wl,-z,relro -Wl,-z,now' LDFLAGS='-Wl,-z,relro -Wl,-z,now' make FULLECHO=''
env CFLAGS='-D_LARGEFILE_SOURCE -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2' LFLAGS='-Wl,-z,relro -Wl,-z,now' LDFLAGS='-Wl,-z,relro -Wl,-z,now' make FULLECHO='' install

cu Andreas

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

(In reply to Andreas Metzler from comment #17)
> Created attachment 1416 [details]
> (unpatched) exim debug output

Thanks Andreas. For this variant, the message is properly transferred
(accepted by the destination on the second MX tried) and then we segv
after the peer has indicated a TLS close.

It'd be useful to peek at a core stack to see if the crash was actually in the
GnuTLS library on some subsequent call into it. The debug trace:

 12:00:14 13972 Calling gnutls_record_recv(session=0x558e5bd826b0, buffer=0x558e5bdb60e8, len=4096)
 12:00:14 13972 GnuTLS<3>: ASSERT: ../../lib/record.c[_gnutls_recv_in_buffers]:1589
 12:00:14 13972 Got TLS_EOF (that read returned empty)

 12:00:14 13972 tls_close(): shutting down TLS (with response-wait)
 12:00:14 13972 tls_write((nil), 0) (zero bytes to write,
                                                should not call into lib here)

 12:00:14 13972 GnuTLS<3>: ASSERT: ../../lib/buffers.c[_gnutls_io_write_flush]:696 (unclear how we got here)

 12:00:14 13972 LOG: MAIN PANIC
 12:00:14 13972 SIGSEGV (fault address: 0x1)

is concerning wrt. that _gnutls_io_write_flush location, but not
definitive as to the location trigerring the segv.

Revision history for this message
In , Eximusers-i (eximusers-i) wrote :

Hello,

given that I had some time but no thought/smartness to spare I ran git bisect which found
-----
6a9cf7f890226aa085842cd3d94b13e78ea31637 is the first bad commit
commit 6a9cf7f890226aa085842cd3d94b13e78ea31637
Date: Sat Oct 3 20:59:15 2020 +0100

    TLS: preload configuration items
-----

the nice thing about Gedalya testcase is that it does not require exim to be suid root. Out od laziness I used the throwaway chroot with exim-user=mail but I think no privileges (installing to ~/eximtest with exim-user=ametzler) would also work.

cu Andreas

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Did pretty much the same as Andreas, built in sid chroot using the same make command from git (4.96-RC1).

Unmodified runtime config, same EDITME as Andreas. No ARC or DKIM signing yet.

It crashed in tls_client_start > verify_certificate etc.

Will attach gedalya.vanilla.1.exim.bt and gedalya.vanilla.1.exim.debug

Is there any further testing I could do that would be helpful?

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Created attachment 1417
gedalya.vanilla.1.exim.bt

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Created attachment 1418
gedalya.vanilla.1.exim.debug

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Created attachment 1419
gedalya.vanilla.2.debug_and_bt

Disabling verification lets the message deliver and the crash occurs in tls_close > gnutls_certificate_free_credentials > gnutls_x509_trust_list_deinit

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

This one is different to Andreas'; the crash is during the verify stage of
TLS establishment. The stacktrace is:

 _gnutls_trust_list_get_issuer
 gnutls_x509_trust_list_get_issuer
 gnutls_x509_trust_list_verify_crt2
 _gnutls_x509_cert_verify_peers
 gnutls_certificate_verify_peers ^^^
 gnutls_certificate_verify_peers2 ^^^ GnuTLS library
 verify_certificate vvv Exim
 tls_client_start vvv
 smtp_setup_conn

From looking at the GnuTLS source I'm not able to guess what state it's
missing. It's unfortunate that it follows a null pointer rather than
checking and returning an error from the gnutls_certificate_verify_peers2 API
call; I'd call that a bug in GnuTLS.

It's interesting that we had a good TLS conn for the first MX tried, in the
same process. Presumably that leaves GnuTLS in some awkward state. If the
preload support Andreas identified is also relevant to this variant then
the "client CA bundle" is suspect. We're relying on the bundle loaded
during the parent Exim startup (either daemon or cmdline-send), rather than
(as before that commit) loading it afresh for every TLS connection.

A workaround would be to introduce a '$' into the transport
tls_verify_certificates option. "${expand:}" would suffice, added to the
existing; this is just to make exim think the option value might vary so must
not be cached.

I'd suggest raising a bug against GnuTLS for this.
Testing with a range of different GnuTLS versions might also be useful.

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Created attachment 1420
gedalya.vanilla.3.debug_and_bt (no preloading)

remote_smtp:
  driver = smtp
  tls_try_verify_hosts = :
  tls_verify_hosts = :
  tls_verify_cert_hostnames = :
  hosts_try_dane = :
  hosts_request_ocsp = :
  tls_verify_certificates = ${if bool {0} {} {system}}
.ifdef _HAVE_TLS_RESUME
  tls_resumption_hosts = *
.endif

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

${if bool {0} {} {}} just yields the same crash in gnutls_x509_trust_list_deinit

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

set ${if bool {0} {} {}} for both remote_smtp and smarthost_smtp

GnuTLS global init required
TLS: basic cred init, client
TLS: not preloading client certs, for transport 'remote_smtp'
TLS: not preloading CA bundle, for transport 'remote_smtp'
TLS: basic cred init, client
TLS: not preloading client certs, for transport 'smarthost_smtp'
TLS: not preloading CA bundle, for transport 'smarthost_smtp'

First conn:
TLS: tls_verify_certificates expanded empty, ignoring
TLS: server certificate verification not required
second conn:
TLS: tls_verify_certificates expanded empty, ignoring
TLS: server certificate verification not required

Crashes in gnutls_x509_trust_list_deinit

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

About line 3781 in src/tls-gnu.c there is a call to
gnutls_certificate_free_credentials().

Please test with that commented out. I think that is freeing the shared
CA-bundle, which we then try to re-use in the second connection within
the same process. That would account for the with-preload crash, and
perhaps for the without-preload also.

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

That fixes all cases I've tested in the last 24 hours.
I'll confirm later with DKIM/ARC.

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

So, the ARC thing is different.

Line 3781 is commented out.

remote_smtp:
  driver = smtp
.ifdef _HAVE_TLS_RESUME
  tls_resumption_hosts = *
.endif
  arc_sign = gedalya.net : rsa2 : /usr/exim/rsa2.key : timestamps
  dkim_domain = gedalya.net
  dkim_selector = rsa1
  dkim_private_key = /usr/exim/rsa1.key
  dkim_canon = relaxed
  dkim_sign_headers = From:Sender:Reply-To:Subject:Date:Message-ID:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding:Content-ID:Content-Description:=Resent-Date:=Resent-From:=Resent-Sender:=Resent-To:=Resent-Cc:=Resent-Message-ID:=In-Reply-To:=References:=List-Id:=List-Help:=List-Unsubscribe:=List-Subscribe:=List-Post:=List-Owner:=List-Archive

It crashes if all of the following conditions are met:

- TLS is used (no hosts_avoid_tls = *)
- First connection deferred
- DKIM signing is done
- ARC signing is done

Local/Makefile:

BIN_DIRECTORY=/usr/exim/bin
CONFIGURE_FILE=/usr/exim/configure
EXIM_USER=mail
SPOOL_DIRECTORY=/var/spool/exim
USE_GNUTLS=yes
USE_GNUTLS_PC=gnutls gnutls-dane
#USE_OPENSSL=yes
#USE_OPENSSL_PC=openssl
ROUTER_ACCEPT=yes
ROUTER_DNSLOOKUP=yes
ROUTER_IPLITERAL=yes
ROUTER_MANUALROUTE=yes
ROUTER_QUERYPROGRAM=yes
ROUTER_REDIRECT=yes
TRANSPORT_APPENDFILE=yes
TRANSPORT_AUTOREPLY=yes
TRANSPORT_PIPE=yes
TRANSPORT_SMTP=yes
LOOKUP_DBM=yes
LOOKUP_LSEARCH=yes
LOOKUP_DNSDB=yes
PCRE2_CONFIG=yes
SUPPORT_DANE=yes
DISABLE_MAL_AVE=yes
DISABLE_MAL_KAV=yes
DISABLE_MAL_MKS=yes
EXPERIMENTAL_ARC=yes
FIXED_NEVER_USERS=root
AUTH_CRAM_MD5=yes
HEADERS_CHARSET="ISO-8859-1"
SYSLOG_LOG_PID=yes
EXICYCLOG_MAX=10
COMPRESS_COMMAND=/usr/bin/gzip
COMPRESS_SUFFIX=gz
ZCAT_COMMAND=/usr/bin/zcat
SUPPORT_SPF=yes
LDFLAGS += -lspf2
SYSTEM_ALIASES_FILE=/etc/aliases
EXIM_TMPDIR="/tmp"

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Created attachment 1421
gedalya.vanilla.ARC.1.debug_and_bt

debug output and backtrace for ARC crash

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Created attachment 1422
gedalya.vanilla.ARC.2.debug_and_bt

ARC bug reproduced pretty much the same way when built with OpenSSL 3.0.3

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

I still can't duplicate the ARC-case segv. However, I did identify a lack
of re-initialization that might be relevant. Please add, at about
"src/arc.c" line 1532 :-

headers_rlist = NULL;

(I've managed a testcase for the non-ARC case; it doesn't consistently segv
on my platform but does, before the fix, consistently have identifiably bad
behavior).

Revision history for this message
Malcolm Scott (malcscott) wrote :

A similar/identical issue appears to have been discussed briefly on the Exim list: https://lists.exim.org/lurker/message/20211008.224037.c1fee944.gl.html

They suggest that it may be a GnuTLS bug.

Revision history for this message
Tobias Heider (tobhe) wrote :

Thanks for taking your time to report this issue and help making Ubuntu better.

It looks like this crash might be related to the upstream bug at https://gitlab.com/gnutls/gnutls/-/issues/1277

Changed in gnutls28 (Ubuntu):
status: New → Confirmed
Tobias Heider (tobhe)
Changed in exim4 (Ubuntu):
status: New → Confirmed
Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

I guess you mean in function arc_sign_init(), added, it fixes the issue.

summary: - Segfaults on sender verify callout, in _gnutls_trust_list_get_issuer
+ Segfaults on verify callout, in _gnutls_trust_list_get_issuer
description: updated
Revision history for this message
Malcolm Scott (malcscott) wrote :

@tobhe Thanks for looking into this. However that upstream bug was apparently fixed in GnuTLS 3.7.4; I just tried libgnutls30 3.7.4-2ubuntu1 from kinetic and I still see these crashes in exim4.

Revision history for this message
Tobias Heider (tobhe) wrote :

You are right, 3.7.3-4 from jammy already contains the fix. I am suspecting that the fix might be the cause of your segfault since this was the last change in this part of the code and it seems to be a regression introduced in jammy.

It looks like this bug hasn't been reported upstream yet. so we should probably take the discussion there.

Revision history for this message
In , Git-p (git-p) wrote :

Git commit: https://git.exim.org/exim.git/commitdiff/8c74b00980bc7e3e479e8dfcd7c0008b2ac3f543

commit 8c74b00980bc7e3e479e8dfcd7c0008b2ac3f543
Author: Jeremy Harris <email address hidden>
AuthorDate: Thu May 19 14:23:02 2022 +0100
Commit: Jeremy Harris <email address hidden>
CommitDate: Thu May 19 14:23:02 2022 +0100

    gnutls: do not free the cached creds on transport connection close. bug 2886
----
 doc/doc-txt/ChangeLog | 4 +++
 src/src/tls-gnu.c | 8 ++---
 test/confs/2011 | 72 +++++++++++++++++++++++++++++++++++++++++++
 test/log/2011 | 13 ++++++++
 test/rejectlog/2011 | 3 ++
 test/scripts/2000-GnuTLS/2011 | 20 ++++++++++++
 6 files changed, 115 insertions(+), 5 deletions(-)

Revision history for this message
In , Git-p (git-p) wrote :

Git commit: https://git.exim.org/exim.git/commitdiff/5a8015582376ff3cc0c0d034d9237008b10d2164

commit 5a8015582376ff3cc0c0d034d9237008b10d2164
Author: Jeremy Harris <email address hidden>
AuthorDate: Thu May 19 14:24:48 2022 +0100
Commit: Jeremy Harris <email address hidden>
CommitDate: Thu May 19 14:24:48 2022 +0100

    ARC: reset headers before signing for secondary MX. Bug 2886
---
 src/src/arc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/src/arc.c b/src/src/arc.c
index 4b6256e..86688f6 100644
--- a/src/src/arc.c
+++ b/src/src/arc.c
@@ -1527,6 +1527,7 @@ void
 arc_sign_init(void)
 {
 memset(&arc_sign_ctx, 0, sizeof(arc_sign_ctx));
+headers_rlist = NULL;
 }

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

Tested both issues with the latest commits, building with gnutls, working fine.

Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

Thanks for the confirm; closing.

Revision history for this message
In , Eximusers-i (eximusers-i) wrote :

Also works for me. Thank you.

Revision history for this message
Tobias Heider (tobhe) wrote :

I have forwarded this bug to upstream at https://gitlab.com/gnutls/gnutls/-/issues/1374

Revision history for this message
Tobias Heider (tobhe) wrote :

It looks like this is indeed an exim issue that was fixed in a recent update. exim bug report can be found at: https://bugs.exim.org/show_bug.cgi?id=2886

Changed in exim4 (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
no longer affects: gnutls28 (Ubuntu)
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

The corresponding commit that supposedly fixes this issue is:

https://git.exim.org/exim.git/commitdiff/5a8015582376ff3cc0c0d034d9237008b10d2164

I haven't tried to reproduce the problem, but if it's affecting Jammy then we will need to SRU the fix, which means that we need a reproducer first.

I took the liberty to backport the upstream patch and prepare a PPA build with it. Malcom, would you be able to test the following PPA and let us know if it fixes the problem?

https://launchpad.net/~sergiodj/+archive/ubuntu/exim4-bug1974214

Thanks.

Changed in exim4 (Ubuntu Jammy):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Malcolm Scott (malcscott) wrote :

Hi Sergio, unfortunately your patched package does not fix the problem -- we still see segfaults.

From the upstream discussion, I think we may need this commit as well:

https://git.exim.org/exim.git/commitdiff/8c74b00980bc7e3e479e8dfcd7c0008b2ac3f543

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote : Re: [Bug 1974214] Re: Segfaults on verify callout, in _gnutls_trust_list_get_issuer

On Thursday, May 26 2022, Malcolm Scott wrote:

> Hi Sergio, unfortunately your patched package does not fix the problem
> -- we still see segfaults.
>
> From the upstream discussion, I think we may need this commit as well:
>
> https://git.exim.org/exim.git/commitdiff/8c74b00980bc7e3e479e8dfcd7c0008b2ac3f543

Ah, you're right, I missed this other commit. I've backported it and
uploaded the new package to the PPA. Unfortunately LP is experiencing
some problems so it may take a while for the build to complete.

Cheers,

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Revision history for this message
Malcolm Scott (malcscott) wrote :

We've been running Sergio's exim 4.95-4ubuntu3~ppa2 for 27 hours so far with no segfaults (previously it was segfaulting every few minutes) -- looks like the bug is fixed with those patches; thanks!

Revision history for this message
Raf (4263004-noduck) wrote :

Previously exim would get SIGFPE on each mail delivery attempt. The PPA version has been installed for almost 2 days and no more crashes. Thanks!

Changed in exim4 (Ubuntu Kinetic):
assignee: nobody → Sergio Durigan Junior (sergiodj)
Changed in exim4 (Ubuntu Jammy):
assignee: nobody → Sergio Durigan Junior (sergiodj)
importance: Medium → High
Changed in exim4 (Ubuntu Kinetic):
importance: Medium → High
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you for testing the package from the PPA and providing feedback, Malcom and Raf.

I took some time today to try and come up with a reproducer for the bug, since this is one of the requirements for a successful SRU to Jammy. So far I haven't had any success, but I'll try to continue working on this tomorrow. I found some possible reproducer in the upstream bug but none of them worked OOTB for me.

Meanwhile, if any of you already have a simple reproducer that'd be great.

Thanks.

Revision history for this message
In , Sergio Durigan Junior (sergiodj) wrote :

Hi folks,

I'm working on backporting the patches that fix this issue into Ubuntu/Debian, but I'm having trouble reproducing the bug locally.

I'm following the steps outlined on comment #16 (assuming that Gedalya's server is still configured to defer the special email automatically), but I don't see the segfault.

I'd like to know if you were able to reproduce this issue using Debian's/Ubuntu's exim4 binary. Any suggestion is helpful.

Thanks.

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

(In reply to Sergio Durigan Junior from comment #41)
> Hi folks,
>
> I'm working on backporting the patches that fix this issue into
> Ubuntu/Debian, but I'm having trouble reproducing the bug locally.
>
> I'm following the steps outlined on comment #16 (assuming that Gedalya's
> server is still configured to defer the special email automatically), but I
> don't see the segfault.
>

My server is still doing that.

> I'd like to know if you were able to reproduce this issue using
> Debian's/Ubuntu's exim4 binary. Any suggestion is helpful.
>

Debian's exim4-daemon-light 4.95 has shown this behavior. 4.95-6 is already patched.

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

(In reply to Sergio Durigan Junior from comment #41)
> I'm following the steps outlined on comment #16 (assuming that Gedalya's
> server is still configured to defer the special email automatically), but I
> don't see the segfault.

I'm not seeing your messages hitting my primary MX server (mail.gedalya.net) which would be deferring. I do see two messages which hit mx2.gedalya.net, apparently without going to the primary server first.

exim would crash on a second attempt if it gets deferred on the first attempt.

Wild guess, are you running this in an IPv6-only container? mail.gedalya.net is IPv4-only, sorry :-)

Revision history for this message
In , Sergio Durigan Junior (sergiodj) wrote :

(In reply to Gedalya from comment #43)
> (In reply to Sergio Durigan Junior from comment #41)
> > I'm following the steps outlined on comment #16 (assuming that Gedalya's
> > server is still configured to defer the special email automatically), but I
> > don't see the segfault.
>
> I'm not seeing your messages hitting my primary MX server (mail.gedalya.net)
> which would be deferring. I do see two messages which hit mx2.gedalya.net,
> apparently without going to the primary server first.
>
> exim would crash on a second attempt if it gets deferred on the first
> attempt.
>
> Wild guess, are you running this in an IPv6-only container? mail.gedalya.net
> is IPv4-only, sorry :-)

Thanks for the help.

I'm testing this using an Ubuntu Jammy container, which has exim4 4.95-4ubuntu2 and is not patched. There's in fact a downstream bug related to this problem; this is why I'm trying to come up with a reproducer.

Thanks for also confirming that your server is still deferring emails. My container did have IPv6 enabled, so I completely disabled it just in case. Unfortunately, I'm still unable to reproduce the problem.

Here's what I'm doing:

- Launch container, disable IPv6 and also add "disable_ipv6" to exim4's config file.

- Run "dpkg-reconfigure exim4-config" and make sure to configure the package as an "internet site; mail is sent and received directly using SMTP". Other than that, everything is left as is.

- Run:

# exim4 -odq -f <email address hidden> <email address hidden> < 1.msg
# exim4 -bp
 0m 333 1nxCYB-0000At-E1 <email address hidden>
          <email address hidden>
# exim4 -d+all -q 1nxCYB-0000At-E1 2>&1 | tee /tmp/exim.debug

I get the following output:

https://dpaste.com//AVEDX2WT4

It seems strange that the second connectio (to mx2) didn't work either. I don't see a segmentation fault anywhere, although the email isn't being sent and keeps showing in "exim4 -bp".

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

(In reply to Sergio Durigan Junior from comment #44)
>
> Thanks for also confirming that your server is still deferring emails. My
> container did have IPv6 enabled, so I completely disabled it just in case.

You don't need to disable TPv6 as much as you need to enable IPv4.

> # exim4 -d+all -q 1nxCYB-0000At-E1 2>&1 | tee /tmp/exim.debug
>
> I get the following output:
>
> https://dpaste.com//AVEDX2WT4
>
> It seems strange that the second connectio (to mx2) didn't work either.

It's not working for the same reason mail.gedalya.net isn't working. It works for you via IPv6 but you can't reach my servers on IPv4.

> I don't see a segmentation fault anywhere, although the email isn't being sent
> and keeps showing in "exim4 -bp".

You need to successfully connect to the first server and receive a deferral (4xx SMTP code).

Figure out why you can't reach my servers out of this container. Ordinary network troubleshooting.

Note that some "cloud" providers block outbound TCP connections to destination port 25, some block only for IPv6 but not for IPv4, maybe you have a reverse case? But whatever, just do network troubleshooting, or set up a local reproducer.

Revision history for this message
In , Sergio Durigan Junior (sergiodj) wrote :

(In reply to Gedalya from comment #45)
> (In reply to Sergio Durigan Junior from comment #44)
> >
> > Thanks for also confirming that your server is still deferring emails. My
> > container did have IPv6 enabled, so I completely disabled it just in case.
>
> You don't need to disable TPv6 as much as you need to enable IPv4.

IPv4 was never disabled; it works normally.

> > # exim4 -d+all -q 1nxCYB-0000At-E1 2>&1 | tee /tmp/exim.debug
> >
> > I get the following output:
> >
> > https://dpaste.com//AVEDX2WT4
> >
> > It seems strange that the second connectio (to mx2) didn't work either.
>
> It's not working for the same reason mail.gedalya.net isn't working. It
> works for you via IPv6 but you can't reach my servers on IPv4.
>
> > I don't see a segmentation fault anywhere, although the email isn't being sent
> > and keeps showing in "exim4 -bp".
>
> You need to successfully connect to the first server and receive a deferral
> (4xx SMTP code).
>
> Figure out why you can't reach my servers out of this container. Ordinary
> network troubleshooting.
>
> Note that some "cloud" providers block outbound TCP connections to
> destination port 25, some block only for IPv6 but not for IPv4, maybe you
> have a reverse case? But whatever, just do network troubleshooting, or set
> up a local reproducer.

I tried something simpler here:

$ telnet mail.gedalya.net 25

It doesn't connect. I can connect to my personal email server via port 25, though, so I'm thinking that maybe my IP has been blocked on your side? Anyway, I was able to get ahold of another server (with another IP) and finally reproduced the bug.

Are you OK with me writing a test case for https://bugs.launchpad.net/ubuntu/+source/exim4/+bug/1974214 using your server? Only a member of the Ubuntu SRU team will eventually check it, and the test case can be deleted afterwards if you'd like.

Thanks.

Revision history for this message
In , Gedalya-b (gedalya-b) wrote :

(In reply to Sergio Durigan Junior from comment #46)
>
> I tried something simpler here:
>
> $ telnet mail.gedalya.net 25
>
> It doesn't connect. I can connect to my personal email server via port 25,
> though, so I'm thinking that maybe my IP has been blocked on your side?

Not in any way that I can tell. Fail2ban chains are empty.

> Anyway, I was able to get ahold of another server (with another IP) and
> finally reproduced the bug.
>
> Are you OK with me writing a test case for
> https://bugs.launchpad.net/ubuntu/+source/exim4/+bug/1974214 using your
> server? Only a member of the Ubuntu SRU team will eventually check it, and
> the test case can be deleted afterwards if you'd like.

That's OK.

description: updated
tags: added: server-todo
Changed in exim4 (Ubuntu Jammy):
status: Triaged → In Progress
Changed in exim4 (Ubuntu Kinetic):
status: Triaged → In Progress
Changed in exim:
status: Unknown → Fix Released
Changed in exim4 (Debian):
status: Unknown → Fix Released
description: updated
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package exim4 - 4.95-4ubuntu3

---------------
exim4 (4.95-4ubuntu3) kinetic; urgency=medium

  * d/p/lp1974214-segfault-smtp-delivery-0{1,2}.patch: Fix segfault when
    there's an SMTP delivery attempt following a deferral. (LP: #1974214)

 -- Sergio Durigan Junior <email address hidden> Fri, 03 Jun 2022 17:37:10 -0400

Changed in exim4 (Ubuntu Kinetic):
status: In Progress → Fix Released
Revision history for this message
In , Jgh146exb (jgh146exb) wrote :

The fix commit included a testcase.

Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello Malcolm, or anyone else affected,

Accepted exim4 into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/exim4/4.95-4ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in exim4 (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Malcolm Scott (malcscott) wrote :

I've been testing version 4.95-4ubuntu2.1 for the past several days, with no crashes (4.95-4ubuntu2 used to crash every few minutes on our configuration). Thanks!

tags: added: verification-done-jammy
removed: verification-needed-jammy
tags: added: verification-done
removed: verification-needed
tags: removed: server-todo
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package exim4 - 4.95-4ubuntu2.1

---------------
exim4 (4.95-4ubuntu2.1) jammy; urgency=medium

  * d/p/lp1974214-segfault-smtp-delivery-0{1,2}.patch: Fix segfault when
    there's an SMTP delivery attempt following a deferral. (LP: #1974214)

 -- Sergio Durigan Junior <email address hidden> Fri, 03 Jun 2022 17:51:15 -0400

Changed in exim4 (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Robie Basak (racb) wrote : Update Released

The verification of the Stable Release Update for exim4 has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.