Comment 4 for bug 1409157

Revision history for this message
Niall Donegan (ndonegan) wrote :

The root cause for the five second delay when doing dns queries has been traced to the interaction between getaddrbyname and either vrouter or the vdns. My money is on something in the Bind patches in vdns.

By default getaddrbyname will send both an A and AAAA queries down a single udp socket, however only the A query is getting responded to. getaddrbyname will wait five seconds for the AAAA before doing both queries quickly with a socket each. I have verified this with tcpdumps on an affected VM.

There is a fix for this which can be put in resolv.conf:

single-request-reopen (since glibc 2.9)
                     The resolver uses the same socket for the A and AAAA
                     requests. Some hardware mistakenly sends back only one
                     reply. When that happens the client system will sit
                     and wait for the second reply. Turning this option on
                     changes this behavior so that if two requests from the
                     same port are not handled correctly it will close the
                     socket and open a new one before sending the second
                     Request.

For EL6 hosts, this can be "fixed" by putting the following line in /etc/sysconfig/network:

RES_OPTIONS=single-request-reopen

While the above does sort the problem on the client side, there's still something funky happening in Contrail that shouldn't be.