Comment 2 for bug 1983605

Revision history for this message
Bryce Harrington (bryce) wrote :

Looking at the test history, the issue has been cropping up irregularly. We've had a string of failures til now, but prior to that was a string of successes with the same exim4 version.

The specific line in the autopkgtest that is failing is this:

    swaks -s localhost -tlso -q ehlo

Running this manually in an amd64 LXC container, I get:

$ swaks -s localhost -tlso -q ehlo
=== Trying localhost:25...
=== Connected to localhost.
<- 220 triage-kinetic.lxd ESMTP Exim 4.96 Ubuntu Mon, 26 Sep 2022 19:23:28 +0000
 -> EHLO triage-kinetic.lxd
<- 250-triage-kinetic.lxd Hello localhost [127.0.0.1]
<- 250-SIZE 52428800
<- 250-8BITMIME
<- 250-PIPELINING
<- 250-PIPECONNECT
<- 250-CHUNKING
<- 250-STARTTLS
<- 250-PRDR
<- 250 HELP
 -> STARTTLS
<- 220 TLS go ahead
=== TLS started with cipher TLSv1.3:TLS_AES_256_GCM_SHA384:256
=== TLS no local certificate set
=== TLS peer DN="/C=UK/O=Exim Developers/CN=triage-kinetic.lxd"
 ~> EHLO triage-kinetic.lxd
<~ 250-triage-kinetic.lxd Hello localhost [127.0.0.1]
<~ 250-SIZE 52428800
<~ 250-8BITMIME
<~ 250-PIPELINING
<~ 250-PIPECONNECT
<~ 250-CHUNKING
<~ 250-PRDR
<~ 250 HELP
 ~> QUIT
<~ 221 triage-kinetic.lxd closing connection
=== Connection closed with remote host.

So the difference in the failing case is just the already identified error message:

*** TLS startup failed (connect(): error:0A000438:SSL routines::tlsv1 alert internal error)

Perhaps the test should be using tls1.2? Googling turns up a lot of hits, a couple that look pertinent to evaluate further:
  * https://serverfault.com/questions/954489/ssl-routinesssl3-read-bytestlsv1-alert-decrypt-error-on-mutual-authendication
    (Suggests avoiding tls1 as deprecated)
  * https://serverfault.com/questions/1098419/updated-exim4-now-not-using-tls-due-to-validation-failure
    (Suggests configuring exim4 to disable remote verification)

However, I'm a bit skeptical since those sound like things that would either always pass or always fail, yet we see the test case sometimes work and sometimes not (and only on this architecture??) I would expect something more resembling a race condition (e.g. trying to use exim4 while it or one of its dependencies is still starting up, or two things trying to access the same port.)

I wonder if the autopkgtest could be improved by adding some additional basic checks prior to running this command, that verify the network conditions, validates the tls config, and/or checks the exim4 service with no encryption.