When resolving DNS names with getaddrinfo(), I have seen this hang for 5 seconds and then retry and succeed. The issue is that glibc will issue a both an A and AAAA query on the same socket, and in some circumstances they can be sent with the same DNS transaction ID as well.
I verified this with a packet capture; in the packet capture, I saw the A and AAAA queries for a name be made with the same DNS transaction ID, get responses, do nothing for five seconds, and then send the same DNS query again. On the glibc side, I confirmed that it's blocked waiting for the DNS response by interrupting it with gdb, even though the packet capture shows the response has well and truly arrived. I've attached a packet capture & a backtrace of the glibc hang.
The environment I noticed this bug in was:
* Docker for Mac on an arm64 m1 Macbook
* Docker for Mac Linux kernel version is 5.10.76-linuxkit
* Linux is also arm64, not emulated
* Container with the buggy DNS environment is Ubuntu bionic (also arm64, not emulated)
* Glibc 2.27-3ubuntu1.4
However one of the redhat reporters noticed this issue in m6 series EC2 instances in AWS.
I applied the upstream patch to glibc 2.27-3ubuntu1.4 and rebuilt the package, and the problem went away. I've attached the exact patch I applied, since I had to work through some conflicts.
So, I think that patch just needs to be backported to Bionic and (I think) Focal as well. Is that reasonable?
When resolving DNS names with getaddrinfo(), I have seen this hang for 5 seconds and then retry and succeed. The issue is that glibc will issue a both an A and AAAA query on the same socket, and in some circumstances they can be sent with the same DNS transaction ID as well.
I verified this with a packet capture; in the packet capture, I saw the A and AAAA queries for a name be made with the same DNS transaction ID, get responses, do nothing for five seconds, and then send the same DNS query again. On the glibc side, I confirmed that it's blocked waiting for the DNS response by interrupting it with gdb, even though the packet capture shows the response has well and truly arrived. I've attached a packet capture & a backtrace of the glibc hang.
I believe this is the same issue reported in these places: /bugzilla. redhat. com/show_ bug.cgi? id=1904153 /bugzilla. redhat. com/show_ bug.cgi? id=1903880 /sourceware. org/bugzilla/ show_bug. cgi?id= 26600
* In RHEL: https:/
* Also RHEL: https:/
* Upstream: https:/
The environment I noticed this bug in was:
* Docker for Mac on an arm64 m1 Macbook
* Docker for Mac Linux kernel version is 5.10.76-linuxkit
* Linux is also arm64, not emulated
* Container with the buggy DNS environment is Ubuntu bionic (also arm64, not emulated)
* Glibc 2.27-3ubuntu1.4
However one of the redhat reporters noticed this issue in m6 series EC2 instances in AWS.
A patch has been provided upstream for this issue: https:/ /sourceware. org/pipermail/ libc-alpha/ 2020-September/ 117547. html
I applied the upstream patch to glibc 2.27-3ubuntu1.4 and rebuilt the package, and the problem went away. I've attached the exact patch I applied, since I had to work through some conflicts.
So, I think that patch just needs to be backported to Bionic and (I think) Focal as well. Is that reasonable?
Thanks!