bind 9.16.1-ubuntu on ubuntu 20.04 randomly exits with segfault signal

Bug #1954854 reported by Michael Hafen
26
This bug affects 2 people
Affects Status Importance Assigned to Milestone
bind9 (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

$ lsb_release -rd
Description: Ubuntu 20.04.3 LTS
Release: 20.04

$ apt-cache policy bind9
bind9:
  Installed: 1:9.16.1-0ubuntu2.9
  Candidate: 1:9.16.1-0ubuntu2.9
  Version table:
 *** 1:9.16.1-0ubuntu2.9 500
        500 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
        500 http://us.archive.ubuntu.com/ubuntu focal-security/main amd64 Packages

From syslog:

Dec 14 16:19:10 dc-dns-02 kernel: [519037.010692] isc-worker0000[12829]: segfaul
t at 8 ip 00007f2e512d6166 sp 00007f2e4dfe4530 error 4 in libisc.so.1601.0.0[7f2
e512b4000+46000]
Dec 14 16:19:10 dc-dns-02 kernel: [519037.010706] Code: 00 00 48 8d 3d ab b4 02
00 e8 66 39 fe ff 66 0f 1f 44 00 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 53 48
 83 ec 08 4c 8b 67 10 <41> 83 7c 24 08 02 0f 85 be 00 00 00 49 89 fd 49 8b 7c 24
 10 48 89
Dec 14 16:19:18 dc-dns-02 systemd[1]: named.service: Main process exited, code=k
illed, status=11/SEGV
Dec 14 16:19:18 dc-dns-02 systemd[1]: named.service: Failed with result 'signal'
.

Checking the address gets:

00007f2e512d6166 – 7f2e512b4000 = 22166

Checking that far into libisc.so.1601.0.0 gets:

$ addr2line -e /usr/lib/x86_64-linux-gnu/libisc.so.1601.0.0 -fCi 0x22166
isc_lfsr_init
??:?
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
CasperMD5CheckResult: pass
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2021-08-27 (131 days ago)
InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
Package: bind9 1:9.16.1-0ubuntu2.9
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-5.4.0-91-generic root=UUID=061f07c5-843f-48ab-96f2-ec20430a184c ro maybe-ubiquity
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.4.0-91.102-generic 5.4.151
RelatedPackageVersions:
 bind9utils N/A
 apparmor 2.13.3-7ubuntu5.1
Tags: focal uec-images
Uname: Linux 5.4.0-91-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
mtime.conffile..etc.bind.named.conf: 2021-08-27T17:42:25.507370
mtime.conffile..etc.bind.named.conf.local: 2021-10-11T14:43:14.718188
mtime.conffile..etc.bind.named.conf.options: 2021-12-08T16:20:28.962570
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
CasperMD5CheckResult: pass
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2021-08-27 (131 days ago)
InstallationMedia: Ubuntu-Server 20.04.2 LTS "Focal Fossa" - Release amd64 (20210201.2)
Package: bind9 1:9.16.1-0ubuntu2.9
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-5.4.0-89-generic root=UUID=c21aa69f-82c2-4c8c-9416-fa1544fd48e6 ro maybe-ubiquity
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.4.0-89.100-generic 5.4.143
RelatedPackageVersions:
 bind9utils N/A
 apparmor 2.13.3-7ubuntu5.1
Tags: focal uec-images
Uname: Linux 5.4.0-89-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
mtime.conffile..etc.bind.named.conf: 2012-08-14T14:33:49
mtime.conffile..etc.bind.named.conf.local: 2021-10-11T14:39:40.855578
mtime.conffile..etc.bind.named.conf.options: 2021-09-14T11:14:45.635696

Revision history for this message
Paride Legovini (paride) wrote :

Thank you for taking the time to report this bug. While segmentation fault definitely shouldn't happen, there isn't really enough information here for a developer to confirm this is a bug in Ubuntu and not some other kind of system corruption, especially given that there are no other reports of similar crashes, at least that I'm aware of.

Useful information would be:

* Did this start happening after upgrading to 1:9.16.1-0ubuntu2.9? Does downgrading to 1:9.16.1-0ubuntu2.8 (or to the version you were using before) stop the crashes? This is very important for us to know.

* Did you identify some minimal steps that are able to trigger the crash?

* Does the crash always happen at the same address?

I'm marking this bug report as Incomplete for the moment.

Changed in bind9 (Ubuntu):
status: New → Incomplete
Revision history for this message
Michael Hafen (michael-hafen) wrote (last edit ):

This started happening when I upgraded the server from Ubuntu 18.04 to 20.04. Is there a convenient way to downgrade to 9.10.3?

As far as triggering the crash, I'm not sure. As I said it seems to happen randomly. I suspect it's related to the amount of load on the service.
Here's a grep from my syslog which shows when the crashes happen:

/var/log/syslog.1:Dec 15 07:29:37 dc-dns-02 named[15512]: REFUSED unexpected RCODE resolving 'q-seeqcview.com/A/IN': 205.251.193.49#53
/var/log/syslog.1:Dec 15 07:29:37 dc-dns-02 kernel: [573662.780659] isc-worker0000[15523]: segfault at 8 ip 00007f84a45bf166 sp 00007f84a12cd530 error 4 in libisc.so.1601.0.0[7f84a459d000+46000]
/var/log/syslog.1:Dec 15 07:29:37 dc-dns-02 kernel: [573662.780675] Code: 00 00 48 8d 3d ab b4 02 00 e8 66 39 fe ff 66 0f 1f 44 00 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 53 48 83 ec 08 4c 8b 67 10 <41> 83 7c 24 08 02 0f 85 be 00 00 00 49 89 fd 49 8b 7c 24 10 48 89

/var/log/syslog.2.gz:Dec 14 16:19:09 dc-dns-02 named[12817]: network unreachable resolving 'q-seeqcview.com/A/IN': 2600:9000:5302:fc00::1#53
/var/log/syslog.2.gz:Dec 14 16:19:10 dc-dns-02 kernel: [519037.010692] isc-worker0000[12829]: segfault at 8 ip 00007f2e512d6166 sp 00007f2e4dfe4530 error 4 in libisc.so.1601.0.0[7f2e512b4000+46000]
/var/log/syslog.2.gz:Dec 14 16:19:10 dc-dns-02 kernel: [519037.010706] Code: 00 00 48 8d 3d ab b4 02 00 e8 66 39 fe ff 66 0f 1f 44 00 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 53 48 83 ec 08 4c 8b 67 10 <41> 83 7c 24 08 02 0f 85 be 00 00 00 49 89 fd 49 8b 7c 24 10 48 89

/var/log/syslog.3.gz:Dec 13 14:26:10 dc-dns-02 kernel: [425857.862410] show_signal_msg: 20 callbacks suppressed
/var/log/syslog.3.gz:Dec 13 14:26:10 dc-dns-02 kernel: [425857.862416] isc-worker0000[2530]: segfault at 8 ip 00007f305aef1166 sp 00007f3057bff530 error 4 in libisc.so.1601.0.0[7f305aecf000+46000]
/var/log/syslog.3.gz:Dec 13 14:26:10 dc-dns-02 kernel: [425857.862429] Code: 00 00 48 8d 3d ab b4 02 00 e8 66 39 fe ff 66 0f 1f 44 00 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 53 48 83 ec 08 4c 8b 67 10 <41> 83 7c 24 08 02 0f 85 be 00 00 00 49 89 fd 49 8b 7c 24 10 48 89

The crash (at least for these three instances) is happening at the same address.
00007f84a45bf166 - 7f84a459d000 = 22166
and
00007f305aef1166 - 7f305aecf000 = 22166

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for the info Michael,
without steps to reproduce I see two paths from here.

## #1 trying different versions

As Paride asked it might be worth to check different versions.
There is no super-convenient way to downgrade bind9

You can add all of Bionics (or any other release in between cosmic, disco, eoan) to apt like:
root@f:~# cp /etc/apt/sources.list /etc/apt/sources.list.d/bionic.list
root@f:~# vim /etc/apt/sources.list.d/bionic.list
# herein replace all "focal" with bionic
root@f:~# apt update
root@f:~# apt-cache policy bind9
bind9:
  Installed: (none)
  Candidate: 1:9.16.1-0ubuntu2.9
  Version table:
     1:9.16.1-0ubuntu2.9 500
        500 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages
     1:9.16.1-0ubuntu2 500
        500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages
     1:9.11.3+dfsg-1ubuntu1.16 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
     1:9.11.3+dfsg-1ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages
# A downgrade to bionics 9.11.3 would then be:
root@f:~# v=1:9.11.3+dfsg-1ubuntu1.16; apt install bind9=$v bind9utils=$v

# Warning, this might mess up your dependencies and installed packages, I'd recommend doing this on a test system or with a full backup to restore after testing.

## #2 Full debugging

As you report that your offsets seem to be reliable aou could try to attach a full crash of bind9 if it generates one.

So if there is /var/crash/...bind9...crash
then try running
  $ sudo apport-collect 1954854

That should attach all logs and version information.

Then (if not auto-attached) add the crash file as well.
We could then try to get a full gdb backtrace, maybe that indicates an obvious issue or an error known by upstream.

Revision history for this message
Michael Hafen (michael-hafen) wrote : Dependencies.txt

apport information

tags: added: apport-collected focal uec-images
description: updated
Revision history for this message
Michael Hafen (michael-hafen) wrote : KernLog.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : SyslogBind9.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : modified.conffile..etc.bind.named.conf.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : modified.conffile..etc.bind.named.conf.local.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : modified.conffile..etc.bind.named.conf.options.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote :
Revision history for this message
Michael Hafen (michael-hafen) wrote :
description: updated
Revision history for this message
Michael Hafen (michael-hafen) wrote : Dependencies.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : KernLog.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : SyslogBind9.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : modified.conffile..etc.bind.named.conf.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : modified.conffile..etc.bind.named.conf.local.txt

apport information

Revision history for this message
Michael Hafen (michael-hafen) wrote : modified.conffile..etc.bind.named.conf.options.txt

apport information

Utkarsh Gupta (utkarsh)
tags: added: server-triage-discuss
Revision history for this message
Bryce Harrington (bryce) wrote :

From the 114 crash file:

(gdb) bt
#0 isc__nm_tcp_send (handle=0x7eff7522dbb0, region=0x7eff7d39a9b8, cb=0x7eff887675a0 <tcpdnssend_cb>,
    cbarg=0x7eff7d39a9a8) at tcp.c:852
#1 0x00007eff88a2e707 in client_sendpkg (client=client@entry=0x7eff754c31b0, buffer=<optimized out>,
    buffer=<optimized out>) at client.c:331
#2 0x00007eff88a2ffe9 in ns_client_send (client=client@entry=0x7eff754c31b0) at client.c:592
#3 0x00007eff88a3e9b0 in query_send (client=0x7eff754c31b0) at query.c:552
#4 0x00007eff88a469a7 in ns_query_done (qctx=qctx@entry=0x7eff85476850) at query.c:10914
#5 0x00007eff88a4dde6 in query_respond (qctx=0x7eff85476850) at query.c:7407
#6 query_prepresponse (qctx=qctx@entry=0x7eff85476850) at query.c:9906
#7 0x00007eff88a49936 in query_gotanswer (qctx=qctx@entry=0x7eff85476850, res=res@entry=0) at query.c:6823
#8 0x00007eff88a4f4c6 in query_resume (qctx=0x7eff85476850) at query.c:6121
#9 fetch_callback (task=<optimized out>, event=<optimized out>) at query.c:5703
#10 0x00007eff88770fa1 in dispatch (threadid=<optimized out>, manager=<optimized out>) at task.c:1152
#11 run (queuep=<optimized out>) at task.c:1344
#12 0x00007eff88239609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007eff8815a293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) list tcp.c:852
file: "src/unix/tcp.c", line number: 852, symbol: "???"
847 src/unix/tcp.c: No such file or directory.
file: "tcp.c", line number: 852, symbol: "???"
847 void *cbarg) {
848 isc_nmsocket_t *sock = handle->sock;
849 isc__netievent_tcpsend_t *ievent = NULL;
850 isc__nm_uvreq_t *uvreq = NULL;
851
852 REQUIRE(sock->type == isc_nm_tcpsocket);
853
854 uvreq = isc__nm_uvreq_get(sock->mgr, sock);
855 uvreq->uvbuf.base = (char *)region->base;
856 uvreq->uvbuf.len = region->length;
(gdb)

(gdb) print sock
$1 = (isc_nmsocket_t *) 0x0
(gdb) print sock->mgr
Cannot access memory at address 0x10

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi,

With Bryce's initial analysis in mind, I started digging to see what upstream has done to fix this problem. I could not find upstream bug reports similar to this one, but I did notice that they're being more careful when accessing some variables in the code mentioned above (things under lib/isc/netmgr/). For example, the following commit:

commit 634bdfb16d8f91ba411f43d0e871ff45cebe125e
Author: Ondřej Surý <email address hidden>
AuthorDate: Thu Nov 12 10:32:18 2020 +0100
Commit: Ondřej Surý <email address hidden>
CommitDate: Tue Dec 1 16:47:07 2020 +0100

    Refactor netmgr and add more unit tests

did a huge refactor in the code, including tightening the guards when accessing the "isc_nmsocket_t" structure:

 void
 isc__nm_tcp_send(isc_nmhandle_t *handle, isc_region_t *region, isc_nm_cb_t cb,
                 void *cbarg) {
+ REQUIRE(VALID_NMHANDLE(handle));
+ REQUIRE(VALID_NMSOCK(handle->sock));

Unfortunately, the aforementioned commit is too large to be safely backported to the Focal bind9 package (and even if it could be cleanly backported, I'd still be very reticent about SRU'ing it).

I decided to take a simpler approach and see how things go. I grepped for places that are accessing the "isc_nmsocket_t" structure and added guards to verify that it's valid. You can see the patch here: https://paste.ubuntu.com/p/xBW3J33SXZ/. I then prepared a bind9 package with this diff and uploaded it to a PPA:

https://launchpad.net/~sergiodj/+archive/ubuntu/bind9-1954854-segfault/+packages

@Michael, could you please give this a try and see if the bug still manifests? This would be a good first step before we can decide whether it makes sense to SRU this or not.

Thanks.

Revision history for this message
Michael Hafen (michael-hafen) wrote :

I have just finished adding that ppa and updating bind from there. I'll give it a month, since it seems it can be that long between crashes, and let you know the results; if not sooner.

Paride Legovini (paride)
tags: removed: server-triage-discuss
Revision history for this message
Michael Hafen (michael-hafen) wrote (last edit ):

Bind crashed again, this time the log is much different.

Jan 28 22:04:44 dc-dns-02 named[647]: network unreachable resolving 'q-seeqcview.com/A/IN': 2600:9000:5302:fc00::1#53
Jan 28 22:04:47 dc-dns-02 named[647]: tcp.c:855: REQUIRE((__builtin_expect(!!((sock) != ((void *)0)), 1) && __builtin_expect(!!(((const isc__magic_t *)(sock))->magic == ((('N') << 24 | ('M') << 16 | ('S') << 8 | ('K')))), 1))) failed, back trace
Jan 28 22:04:47 dc-dns-02 named[647]: #0 0x55a892a78e43 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #1 0x7f07eb0f4ac0 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #2 0x7f07eb11131a in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #3 0x7f07eb3d9707 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #4 0x7f07eb3dafe9 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #5 0x7f07eb3e99b0 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #6 0x7f07eb3f19a7 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #7 0x7f07eb3f8de6 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #8 0x7f07eb3f4936 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #9 0x7f07eb3f6a3e in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #10 0x7f07eb3f7048 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #11 0x7f07eb3f1a24 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #12 0x7f07eb3f4e96 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #13 0x7f07eb3fa4c6 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #14 0x7f07eb11c161 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #15 0x7f07eabe4609 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: #16 0x7f07eab05293 in ??
Jan 28 22:04:47 dc-dns-02 named[647]: exiting (due to assertion failure)
Jan 28 22:04:58 dc-dns-02 systemd[1]: named.service: Main process exited, code=killed, status=6/ABRT
Jan 28 22:04:58 dc-dns-02 systemd[1]: named.service: Failed with result 'signal'.

Revision history for this message
Michael Hafen (michael-hafen) wrote :

And the other server:

Jan 29 07:15:54 dc-dns-01 named[648]: client @0x7f67b057e930 10.126.63.77#51569: view internal: update '63.126.10.IN-ADDR.ARPA/IN' denied
Jan 29 07:15:55 dc-dns-01 named[648]: tcp.c:855: REQUIRE((__builtin_expect(!!((sock) != ((void *)0)), 1) && __builtin_expect(!!(((const isc__magic_t *)(sock))->magic == ((('N') << 24 | ('M') << 16 | ('S') << 8 | ('K')))), 1))) failed, back trace
Jan 29 07:15:55 dc-dns-01 named[648]: #0 0x55fb77598e43 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #1 0x7f67c0d7bac0 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #2 0x7f67c0d9831a in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #3 0x7f67c1060707 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #4 0x7f67c1061fe9 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #5 0x7f67c10709b0 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #6 0x7f67c10789a7 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #7 0x7f67c107fde6 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #8 0x7f67c107b936 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #9 0x7f67c10814c6 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #10 0x7f67c0da3161 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #11 0x7f67c086b609 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: #12 0x7f67c078c293 in ??
Jan 29 07:15:55 dc-dns-01 named[648]: exiting (due to assertion failure)
Jan 29 07:16:00 dc-dns-01 systemd[1]: named.service: Main process exited, code=killed, status=6/ABRT
Jan 29 07:16:00 dc-dns-01 systemd[1]: named.service: Failed with result 'signal'.

Paride Legovini (paride)
tags: added: server-triage-discuss
Paride Legovini (paride)
tags: removed: server-triage-discuss
Revision history for this message
Paride Legovini (paride) wrote :

Hello Michael, thanks for reporting back. Honestly I'm not sure on how to proceed on debugging this. I'll go back my first comment and suggest trying a newer version of bind9 to check if the issue is gone in the newer upstream releases.

I'm preparing a PPA with Jammy's bind9 (1:9.16.15-1ubuntu3) built for Focal (link coming soon).
Could you try installing and monitoring it? If *does* crash, I think we should follow-up upstream and report the bug there.

OTOH if it does *not* crash, then we can consider bisecting and finding out which new version fixed the issue, and then dig even deeper and try finding out which commit or commits make the difference. That won't be an easy task, especially given that the reproducibility of this crash is low. I don't think we're going to do this unless an easier way to trigger the crash is found.

Revision history for this message
Paride Legovini (paride) wrote :

Hello, here is the PPA:

https://launchpad.net/~paride/+archive/ubuntu/lp1954854-bind9

The package built fine on Focal without modifications.

Revision history for this message
Michael Hafen (michael-hafen) wrote :

Just finished adding that ppa and upgrading bind9 to 9.16.15 from there on all my servers. Removed the other one too.
Give it another month to see what happens now.

Revision history for this message
Michael Hafen (michael-hafen) wrote :

It's been a month now, and no crashes on any of my three servers with this version of bind. Looks like this version is good (for me).
If you want to bisect, I'll keep helping with it.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the feedback, Michael.

First of all, let me say that we appreciate the help you've been providing here. Given that we are not able to reproduce the bug here, we depend on your reports in order to make progress and take decisions regarding it.

Unfortunately, I think it will be hard for us to keep working on this issue as is. Bisecting it may indeed be a good next step, but since it can take a long time (1 month) for the bug to manifest to you, we may be looking at spending many months trying to pinpoint exactly what the issue is and when it was introduced.

I will add this bug to my long term TODO list and try to investigate it a bit more when I have the time (which may never happen), because I vaguely remember thinking about other approaches to try and fix it using the commit I mentioned in my previous comment (#21).

Meanwhile, as I said above, it seems that unfortunately we won't be able to dedicate much more time investigating this bug. I will lower its priority to reflect that, but will also change its status to Confirmed (not Triaged because we weren't able to reproduce it).

If you happen to find more information about the bug, or (even better) if you find ways to reproduce it, don't hesitate to let us know.

Changed in bind9 (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → Low
Revision history for this message
Michael Hafen (michael-hafen) wrote :

I was looking to upgrade to 9.18 from ISC's repo, since we're interested in DOT/DOH here, but I'll hold off on that until after 22.04.01 just in case you get some time to dig into this again.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi there,

Sorry for the delay in getting back to this.

Bug #1997375 has been recently filed and looks like a dupe of this one. I took some time to investigate the issue a bit more and found another possible fix. I backported it and uploaded the package to the following PPA:

https://launchpad.net/~sergiodj/+archive/ubuntu/bind9-bug1997375/+packages

Michael, are you still experiencing the crashes? If yes, would you be able to give the PPA above a try and let us know what happens?

Thanks in advance!

Revision history for this message
Michael Hafen (michael-hafen) wrote :

I haven't had a crash since switching to Paride's PPA for this bug; that was back in February.
I'll give yours a try though, to see what happens.
I have my three servers switched to it now.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote : Re: [Bug 1954854] Re: bind 9.16.1-ubuntu on ubuntu 20.04 randomly exits with segfault signal

On Monday, November 28 2022, Michael Hafen wrote:

> I haven't had a crash since switching to Paride's PPA for this bug; that was back in February.
> I'll give yours a try though, to see what happens.
> I have my three servers switched to it now.

Thanks a lot, Michael.

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Revision history for this message
Michael Hafen (michael-hafen) wrote :

I haven't had a crash since switching to https://launchpad.net/~sergiodj/+archive/ubuntu/bind9-bug1997375/+packages for Bind, so this looks like it's also a good solution.

I'll be upgrading my servers to Ubuntu 22.04 soon, so I won't be able to provide much more help on this.

Thanks to everyone for working on this for us.

Revision history for this message
Michael Hafen (michael-hafen) wrote :

Seems I spoke too soon, there was a crash Jan 27th:

Jan 27 08:15:25 dc-dns-02 kernel: [6290627.150120] isc-worker0000[222965]: segfa
ult at 8 ip 00007f444065a166 sp 00007f443d3635f0 error 4 in libisc.so.1601.0.0[7
f4440638000+46000]
Jan 27 08:15:25 dc-dns-02 kernel: [6290627.150144] Code: 00 00 48 8d 3d ab b4 02
 00 e8 66 39 fe ff 66 0f 1f 44 00 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 53 4
8 83 ec 08 4c 8b 67 10 <41> 83 7c 24 08 02 0f 85 be 00 00 00 49 89 fd 49 8b 7c 2
4 10 48 89
Jan 27 08:15:55 dc-dns-02 systemd[1]: named.service: Main process exited, code=k
illed, status=11/SEGV
Jan 27 08:15:55 dc-dns-02 systemd[1]: named.service: Failed with result 'signal'
.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

On Tuesday, January 31 2023, Michael Hafen wrote:

> Seems I spoke too soon, there was a crash Jan 27th:
>
> Jan 27 08:15:25 dc-dns-02 kernel: [6290627.150120] isc-worker0000[222965]: segfa
> ult at 8 ip 00007f444065a166 sp 00007f443d3635f0 error 4 in libisc.so.1601.0.0[7
> f4440638000+46000]
> Jan 27 08:15:25 dc-dns-02 kernel: [6290627.150144] Code: 00 00 48 8d 3d ab b4 02
> 00 e8 66 39 fe ff 66 0f 1f 44 00 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 53 4
> 8 83 ec 08 4c 8b 67 10 <41> 83 7c 24 08 02 0f 85 be 00 00 00 49 89 fd 49 8b 7c 2
> 4 10 48 89
> Jan 27 08:15:55 dc-dns-02 systemd[1]: named.service: Main process exited, code=k
> illed, status=11/SEGV
> Jan 27 08:15:55 dc-dns-02 systemd[1]: named.service: Failed with result 'signal'
> .

Hi Michael,

Did this happen when using the package provided by me?

Thanks,

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Revision history for this message
Michael Hafen (michael-hafen) wrote :

I'm surprised to find that I wasn't running your packages. It took a long time to crash on the release packages, but it crashed again soon after I restarted it.

I'm running your packages now, so I'll give it another month to see if it crashes.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

On Thursday, February 02 2023, Michael Hafen wrote:

> I'm surprised to find that I wasn't running your packages. It took a
> long time to crash on the release packages, but it crashed again soon
> after I restarted it.
>
> I'm running your packages now, so I'll give it another month to see if
> it crashes.

Ah, phew :-). Thanks for the clarification.

Cheers,

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Revision history for this message
Michael Hafen (michael-hafen) wrote :

I am, out of necessity, upgrading my bind servers to Ubuntu 22.04 today.
I've had no problems with Sergio's packages (other than apt trying to replace them).
Unfortunately I won't be able to help more than this.
Thanks for all you have done here. I appreciate the help keeping Bind in Ubuntu 20.04 running well.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

First of all, thank you very much Michael for doing the informal verification.

Unfortunately, in order to proceed, and because we weren't able to reproduce this problem ourselves, we would still need your help to formally verify a future upload to bring the fix to the Ubuntu archive. Without your feedback, the package would not migrate from -proposed to -updates.

But I don't think all is lost! These last weeks we have been working on updating the bind9 package on Focal to the latest 9.16.x upstream release, which seems to contain the fix that I backported. If all goes well, this new release should land in the following weeks, and will finally address this bug (and others).

Thanks again for sticking with us through this slow and sometimes painful process of investigating, backporting and testing fixes.

To post a comment you must log in.