systemd-resolved fails resolution via TCP

Bug #1966775 reported by Chanhun Jeong
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

It happened in one of our production instances.
Suddenly fails to resolve the hostname.
Later I found it has happened if the DNS response message is bigger than 512 bytes.
But never reproduced from other instances which have the same instance template.

with nslookup
```bash
$ nslookup pod51041.outlook.com
;; Truncated, retrying in TCP mode.
;; Connection to 127.0.0.53#53(127.0.0.53) for pod51041.outlook.com failed: timed out.
```

and dig with another address.
```bash
$ time dig +noanswer +noedns toomany.ddstreet.org
;; Truncated, retrying in TCP mode.
;; Connection to 127.0.0.53#53(127.0.0.53) for toomany.ddstreet.org failed: timed out.

real 0m7.070s
user 0m0.004s
sys 0m0.004s
```

system versions
```bash
$ systemctl --version
systemd 245 (245.4-4ubuntu3.15)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

$ uname -r
5.8.0-1041-aws
```

status
```bash
$ resolvectl status
Global
       LLMNR setting: no
MulticastDNS setting: no
  DNSOverTLS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      18.172.in-addr.arpa
                      19.172.in-addr.arpa
                      20.172.in-addr.arpa
                      21.172.in-addr.arpa
                      22.172.in-addr.arpa
                      23.172.in-addr.arpa
                      24.172.in-addr.arpa
                      25.172.in-addr.arpa
                      26.172.in-addr.arpa
                      27.172.in-addr.arpa
                      28.172.in-addr.arpa
                      29.172.in-addr.arpa
                      30.172.in-addr.arpa
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

Link 2 (ens5)
      Current Scopes: DNS
DefaultRoute setting: yes
       LLMNR setting: yes
MulticastDNS setting: no
  DNSOverTLS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
  Current DNS Server: 172.29.0.2
         DNS Servers: 172.29.0.2
          DNS Domain: ap-south-1.compute.internal
```

strace to resolved process
```bash
$ strace -fp $(which systemd-resolved)
epoll_pwait(4, [{EPOLLIN, {u32=3491475680, u64=187650612637920}}], 14, -1, NULL, 8) = 1
recvfrom(12, NULL, 0, MSG_PEEK|MSG_TRUNC, NULL, NULL) = 38
recvmsg(12, {msg_name={sa_family=AF_INET, sin_port=htons(52259), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128->16, msg_iov=[{iov_base="\344g\1 \0\1\0\0\0\0\0\0\10pod51041\7outlook\3com\0\0\1\0\1", iov_len=3928}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("lo"), ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("127.0.0.53")}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 38
newfstatat(AT_FDCWD, "/etc/hosts", {st_mode=S_IFREG|0644, st_size=262, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/resolv.conf", {st_mode=S_IFREG|0644, st_size=622, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/stub-resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
getrandom("\x62\xe8", 2, GRND_NONBLOCK) = 2
newfstatat(AT_FDCWD, "/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/resolv.conf", {st_mode=S_IFREG|0644, st_size=622, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/stub-resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
sendmsg(12, {msg_name={sa_family=AF_INET, sin_port=htons(52259), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=16, msg_iov=[{iov_base="\344g\203\200\0\1\0\35\0\0\0\0\10pod51041\7outlook\3com\0\0\1\0\1\300\f\0\1\0\1\0\0\0\36\0\0044`zB\300\f\0\1\0\1\0\0\0\36\0\0044`\243\"\300\f\0\1\0\1\0\0\0\36\0\0044`+\242\300\f\0\1\0\1\0\0\0\36\0\4(a\226\362\300\f\0\1\0\1\0\0\0\36\0\0044`%\"\300\f\0\1\0\1\0\0\0\36\0\0044`Ob\300\f\0\1\0\1\0\0\0\36\0\0044`y\242\300\f\0\1\0\1\0\0\0\36\0\0044`y\362\300\f\0\1\0\1\0\0\0\36\0\0044`\2432\300\f\0\1\0\1\0\0\0\36\0\0044`q\202\300\f\0\1\0\1\0\0\0\36\0\0044`y\222\300\f\0\1\0\1\0\0\0\36\0\0044`z\22\300\f\0\1\0\1\0\0\0\36\0\0044`y\342\300\f\0\1\0\1\0\0\0\36\0\0044`wb\300\f\0\1\0\1\0\0\0\36\0\0044`yb\300\f\0\1\0\1\0\0\0\36\0\0044`y\"\300\f\0\1\0\1\0\0\0\36\0\0044`y\22\300\f\0\1\0\1\0\0\0\36\0\0044`q\322\300\f\0\1\0\1\0\0\0\36\0\0044`q\342\300\f\0\1\0\1\0\0\0\36\0\0044`q\222\300\f\0\1\0\1\0\0\0\36\0\0044`y2\300\f\0\1\0\1\0\0\0\36\0\0044`y\2\300\f\0\1\0\1\0\0\0\36\0\0044`q\262\300\f\0\1\0\1\0\0\0\36\0\0044`z2\300\f\0\1\0\1\0\0\0\36\0\0044`wR\300\f\0\1\0\1\0\0\0\36\0\0044`y\202\300\f\0\1\0\1\0\0\0\36\0\0044`z\"\300\f\0\1\0\1\0\0\0\36\0\0044`q\302\300\f\0\1\0\1\0\0\0\36\0\0044`\3\262", iov_len=502}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("lo"), ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("0.0.0.0")}}], msg_controllen=28, msg_flags=0}, 0) = 502
gettid() = 607
epoll_pwait(4, [{EPOLLIN, {u32=3491475680, u64=187650612637920}}], 14, -1, NULL, 8) = 1
recvfrom(12, NULL, 0, MSG_PEEK|MSG_TRUNC, NULL, NULL) = 53
recvmsg(12, {msg_name={sa_family=AF_INET, sin_port=htons(37976), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128->16, msg_iov=[{iov_base="\335H\1\0\0\1\0\0\0\0\0\0\nmonitoring\nap-south-1\tamazonaws\3com\0\0\34\0\1", iov_len=3928}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("lo"), ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("127.0.0.53")}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 53
newfstatat(AT_FDCWD, "/etc/hosts", {st_mode=S_IFREG|0644, st_size=262, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/resolv.conf", {st_mode=S_IFREG|0644, st_size=622, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/stub-resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
getrandom("\xaf\x1e", 2, GRND_NONBLOCK) = 2
newfstatat(AT_FDCWD, "/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/resolv.conf", {st_mode=S_IFREG|0644, st_size=622, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/stub-resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 15
setsockopt(15, SOL_IP, IP_UNICAST_IF, [33554432], 4) = 0
setsockopt(15, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(15, SOL_IP, IP_PKTINFO, [1], 4) = 0
connect(15, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.29.0.2")}, 16) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN, {u32=3491583952, u64=187650612746192}}) = 0
write(15, "\257\36\1\0\0\1\0\0\0\0\0\1\nmonitoring\nap-south-1\tamazonaws\3com\0\0\34\0\1\0\0)\2\0\0\0\0\0\0\0", 64) = 64
gettid() = 607
timerfd_settime(11, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=105529, tv_nsec=683183000}}, NULL) = 0
epoll_pwait(4, [{EPOLLIN, {u32=3491475680, u64=187650612637920}}], 17, -1, NULL, 8) = 1
recvfrom(12, NULL, 0, MSG_PEEK|MSG_TRUNC, NULL, NULL) = 53
recvmsg(12, {msg_name={sa_family=AF_INET, sin_port=htons(54540), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128->16, msg_iov=[{iov_base="\2504\1\0\0\1\0\0\0\0\0\0\nmonitoring\nap-south-1\tamazonaws\3com\0\0\1\0\1", iov_len=3928}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("lo"), ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("127.0.0.53")}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 53
newfstatat(AT_FDCWD, "/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/resolv.conf", {st_mode=S_IFREG|0644, st_size=622, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/stub-resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
getrandom("\xc8\xc3", 2, GRND_NONBLOCK) = 2
newfstatat(AT_FDCWD, "/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/resolv.conf", {st_mode=S_IFREG|0644, st_size=622, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/resolve/stub-resolv.conf", {st_mode=S_IFREG|0644, st_size=752, ...}, 0) = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 16
setsockopt(16, SOL_IP, IP_UNICAST_IF, [33554432], 4) = 0
setsockopt(16, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(16, SOL_IP, IP_PKTINFO, [1], 4) = 0
connect(16, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.29.0.2")}, 16) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 16, {EPOLLIN, {u32=3491849328, u64=187650613011568}}) = 0
write(16, "\310\303\1\0\0\1\0\0\0\0\0\1\nmonitoring\nap-south-1\tamazonaws\3com\0\0\1\0\1\0\0)\2\0\0\0\0\0\0\0", 64) = 64
gettid() = 607
epoll_pwait(4, [{EPOLLIN, {u32=3491583952, u64=187650612746192}}], 20, -1, NULL, 8) = 1
recvfrom(15, NULL, 0, MSG_PEEK|MSG_TRUNC, NULL, NULL) = 146
recvmsg(15, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.29.0.2")}, msg_namelen=128->16, msg_iov=[{iov_base="\257\36\201\200\0\1\0\0\0\1\0\1\nmonitoring\nap-south-1\tamazonaws\3com\0\0\34\0\1\300\f\0\6\0\1\0\0\0\1\0F\7ns-1164\tawsdns-17\3org\0\21awsdns-hostmaster\6amazon\300,\0\0\0\1\0\0\34 \0\0\3\204\0\22u\0\0\0\0<\0\0)\20\0\0\0\0\0\0\0", iov_len=3928}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("ens5"), ipi_spec_dst=inet_addr("172.29.40.107"), ipi_addr=inet_addr("172.29.40.107")}}], msg_controllen=32, msg_flags=0}, 0) = 146
epoll_ctl(4, EPOLL_CTL_DEL, 15, NULL) = 0
close(15) = 0
sendmsg(12, {msg_name={sa_family=AF_INET, sin_port=htons(37976), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=16, msg_iov=[{iov_base="\335H\201\200\0\1\0\0\0\0\0\0\nmonitoring\nap-south-1\tamazonaws\3com\0\0\34\0\1", iov_len=53}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("lo"), ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("0.0.0.0")}}], msg_controllen=28, msg_flags=0}, 0) = 53
gettid() = 607
epoll_pwait(4, [{EPOLLIN, {u32=3491849328, u64=187650613011568}}], 17, -1, NULL, 8) = 1
recvfrom(16, NULL, 0, MSG_PEEK|MSG_TRUNC, NULL, NULL) = 80
recvmsg(16, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.29.0.2")}, msg_namelen=128->16, msg_iov=[{iov_base="\310\303\201\200\0\1\0\1\0\0\0\1\nmonitoring\nap-south-1\tamazonaws\3com\0\0\1\0\1\300\f\0\1\0\1\0\0\0\27\0\0044_P\272\0\0)\20\0\0\0\0\0\0\0", iov_len=3928}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("ens5"), ipi_spec_dst=inet_addr("172.29.40.107"), ipi_addr=inet_addr("172.29.40.107")}}], msg_controllen=32, msg_flags=0}, 0) = 80
epoll_ctl(4, EPOLL_CTL_DEL, 16, NULL) = 0
close(16) = 0
sendmsg(12, {msg_name={sa_family=AF_INET, sin_port=htons(54540), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=16, msg_iov=[{iov_base="\2504\201\200\0\1\0\1\0\0\0\0\nmonitoring\nap-south-1\tamazonaws\3com\0\0\1\0\1\300\f\0\1\0\1\0\0\0\27\0\0044_P\272", iov_len=69}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("lo"), ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("0.0.0.0")}}], msg_controllen=28, msg_flags=0}, 0) = 69
gettid() = 607
timerfd_settime(11, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=0}}, NULL) = 0
epoll_pwait(4, ^Cstrace: Process 607 detached
```

In a normal case, I can see that the systemd-resolved creates a UDP socket first and a TCP socket later.
But in this case, it never creates a TCP socket and just retries with UDP again while the client sent TCP packets.

Revision history for this message
Nick Rosbrook (enr0n) wrote :

I cannot reproduce this on Focal:

root@focal:~# nslookup pod51041.outlook.com
;; Truncated, retrying in TCP mode.
Server: 127.0.0.53
Address: 127.0.0.53#53

Non-authoritative answer:
Name: pod51041.outlook.com
Address: 52.96.122.34
Name: pod51041.outlook.com
Address: 52.96.163.50
[... snip ...]

root@focal:~# time dig +noanswer _noedns toomany.ddstreet.org

; <<>> DiG 9.16.1-Ubuntu <<>> +noanswer _noedns toomany.ddstreet.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 61517
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;_noedns. IN A

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Fri Jun 23 15:07:15 UTC 2023
;; MSG SIZE rcvd: 36

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60109
;; flags: qr rd ra; QUERY: 1, ANSWER: 40, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;toomany.ddstreet.org. IN A

;; Query time: 176 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Fri Jun 23 15:07:16 UTC 2023
;; MSG SIZE rcvd: 689

real 0m0.204s
user 0m0.011s
sys 0m0.019s
root@focal:~# resolvectl status
Global
       LLMNR setting: no
MulticastDNS setting: no
  DNSOverTLS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      18.172.in-addr.arpa
                      19.172.in-addr.arpa
                      20.172.in-addr.arpa
                      21.172.in-addr.arpa
                      22.172.in-addr.arpa
                      23.172.in-addr.arpa
                      24.172.in-addr.arpa
                      25.172.in-addr.arpa
                      26.172.in-addr.arpa
                      27.172.in-addr.arpa
                      28.172.in-addr.arpa
                      29.172.in-addr.arpa
                      30.172.in-addr.arpa
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

Link 8 (eth0)
      Current Scopes: DNS
DefaultRoute setting: yes
       LLMNR setting: yes
MulticastDNS setting: no
  DNSOverTLS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
  Current DNS Server: 10.136.78.1
         DNS Servers: 10.136.78.1
          DNS Domain: lxd

Are you still seeing this issue?

Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for systemd (Ubuntu) because there has been no activity for 60 days.]

Changed in systemd (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.