getaddrinfo crashes on negative answers after Truncated retry

Bug #1945072 reported by Christopher K Brown
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
New
Undecided
Unassigned

Bug Description

Programs using getaddrinfo will segfault downstack of the call in limited circumstances.

We have limited this crash to a specific situation, some of these details may not be relevant to the problem but are included in case they are:

- getaddrinfo makes two requests, A and AAAA, using UDP
- At least one of the replies to the two requests is truncated
- getaddrinfo makes two additional requests, A and AAAA, using TCP
- At least one of the the replies to the two TCP requests has an empty answer section
  - For example, the response comes back with a REFUSED or SERVFAIL
  - Even a NOERROR with an empty answer section will cause this crash

Sample gai call:

  struct addrinfo* results;
  int ec = getaddrinfo(host.c_str(), "", nullptr, &results);

This is difficult to reproduce without being able to control the replies yourself since downstream must give a negative answer on the TCP after answering the UDP. (This can happen, the downstream server can decide not to answer for any number of policy reasons.)

Using 18.04.5. glibc version is 2.27.

Revision history for this message
Florian Weimer (fweimer) wrote :

Would you be able to share packet capture (tcpdump -s 0 -w CAPTURE-FILE, and then the output of CAPTURE-FILE)? Thanks.

As far as I can tell, “getent ahosts large-a.t.enyo.de” results in DNS activity that matches the preconditions in your description, but I don't observe a crash (even with Ubuntu 18.04 LTS and its 2.27-based glibc).

Revision history for this message
Christopher K Brown (ckb42) wrote :

In trying to do the work requested for the report, it took a different turn. General sequence is still correct, with the following caveat:

- our DNS server added a CNAME record to the truncated UDP reply, bringing the size to over 512 bytes

The response is well formed, just too big. Since getaddrinfo never asks with EDNS, this is not correct and we must fix it - but getaddrifo shouldn't core on it. The crash happens downstream of this when the TCP replies come in, but only sometimes. I suspect a buffer overrun of some sort.

If you don't see it through inspection, I will add a pcap. Might be possible to replay it or something. The query which gives such a large truncated reply is

dig www.iiflstatements.com +ignore +notcp +noedns +qr

which comes in at 502 bytes. We add a CNAME as the first record in the answer section, which brings it to 636 bytes. We will try to do a little more work to see if we can make it repeatable.

Recap:

- getaddrinfo makes two requests, A and AAAA, using UDP and no EDNS
- At least one of the replies to the two requests is truncated
  - The truncated reply has a size greater than 512 bytes, otherwise error-free
- getaddrinfo makes two additional requests, A and AAAA, using TCP
- At least one of the the replies to the two TCP requests has an empty answer section
  - For example, the response comes back with a REFUSED or SERVFAIL

Revision history for this message
Florian Weimer (fweimer) wrote :

Thanks for the update. A packet capture could still be useful, and also a backtrace (with symbols) from the crash.

We have an upstream test case which is supposed to cover various packet sizes, and I am going to try to adopt it to cover this. The code itself is pretty much impenetrable. We have been slowly cleaning it up, but there is still a lot left to do.

Revision history for this message
Florian Weimer (fweimer) wrote :

I tried to replicate it with this patch to the test suite:

diff --git a/resolv/tst-bug18665-tcp.c b/resolv/tst-bug18665-tcp.c
index 9b1ff0fbd8..e8e0d12bb7 100644
--- a/resolv/tst-bug18665-tcp.c
+++ b/resolv/tst-bug18665-tcp.c
@@ -47,6 +47,41 @@ response (const struct resolv_response_context *ctx,
       struct resolv_response_flags flags = {.tc = true};
       resolv_response_init (b, flags);
       resolv_response_add_question (b, qname, qclass, qtype);
+
+ if (qtype == T_A)
+ {
+ resolv_response_section (b, ns_s_an);
+ resolv_response_open_record (b, qname, qclass, T_CNAME, 600);
+ const char *alias = "somewhat.longish.cname.example";
+ resolv_response_add_name (b, alias);
+ resolv_response_close_record (b);
+
+ for (int i = 0; i < 35; ++i)
+ {
+ resolv_response_open_record (b, alias, qclass, T_A, 600);
+ const char ipv4[4] = {10, 255, 255, i};
+ resolv_response_add_data (b, ipv4, sizeof (ipv4));
+ resolv_response_close_record (b);
+ }
+ }
+ else
+ {
+ resolv_response_section (b, ns_s_ns);
+ resolv_response_open_record (b, qname, qclass, T_SOA, 600);
+ resolv_response_add_name (b, "ns1.example");
+ resolv_response_add_name (b, "hostmaster.example");
+ const uint32_t values[5] =
+ {
+ htonl (2021092901),
+ htonl (600),
+ htonl (600),
+ htonl (360000),
+ htonl (600),
+ };
+ resolv_response_add_data (b, values, sizeof (values));
+ resolv_response_close_record (b);
+ }
+
       return;
     }

But I do not get a crash with current master. I think we really need those packet captures, sorry.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.