VM does not get proper domainname and delay in dns queries

Bug #1409157 reported by Alfred Shen
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Invalid
Medium
Hari Prasad Killi
OpenContrail
Invalid
Medium
Hari Prasad Killi

Bug Description

On Contrail v1.21, VM based on CentOS6.5 does not get the proper domainame and users reported >5 secs delay when using Contrail as the DNS server. The tcpdump on VM's tap device showed domainame was set correctly.

# tcpdump -i tapa5fc4c05-ca -vvvv port bootpc and bootps
 0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 02:a5:fc:4c:05:ca (oui Unknown), length 310, xid 0x79e67d22, Flags [none] (0x0000)
 Client-Ethernet-Address 02:a5:fc:4c:05:ca (oui Unknown)
 Vendor-rfc1048 Extensions
   Magic Cookie 0x63825363
   DHCP-Message Option 53, length 1: Request
   Requested-IP Option 50, length 4: 10.192.156.250
   Parameter-Request Option 55, length 13:
     Subnet-Mask, BR, Time-Zone, Classless-Static-Route
     Domain-Name, Domain-Name-Server, Hostname, YD
     YS, NTP, MTU, Option 119
     Default-Gateway
   Vendor-Class Option 60, length 43: "anaconda-Linux 2.6.32-431.el6.x86_64 x86_64"
   END Option 255, length 0
21:42:21.803349 IP (tos 0x0, ttl 16, id 0, offset 0, flags [none], proto UDP (17), length 337)
    10.192.156.1.bootps > 10.192.156.250.bootpc: [no cksum] BOOTP/DHCP, Reply, length 309, xid 0x79e67d22, Flags [none] (0x0000)
 Your-IP 10.192.156.250
 Server-IP 10.192.156.1
 Client-Ethernet-Address 02:a5:fc:4c:05:ca (oui Unknown)
 Vendor-rfc1048 Extensions
   Magic Cookie 0x63825363
   DHCP-Message Option 53, length 1: ACK
   Server-ID Option 54, length 4: 10.192.156.1
   Lease-Time Option 51, length 4: 4294967295
   Subnet-Mask Option 1, length 4: 255.255.255.0
   BR Option 28, length 4: 10.192.156.255
   Time-Server Option 4, length 4: 10.192.156.1
   Domain-Name Option 15, length 10: "dev.pdx.wd"
   Default-Gateway Option 3, length 4: 10.192.156.1
   Hostname Option 12, length 9: "net156-03"
   Domain-Name-Server Option 6, length 4: 10.192.156.1
   END Option 255, length 0

From VM side

[root@net156-03 ~]# domainname
(none)
[root@net156-03 ~]# uname -a
Linux net156-03 2.6.32-358.123.2.openstack.el6.x86_64 #1 SMP Thu Sep 26 17:14:58 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@net156-03 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=localhost.localdomain

On DNS query delay, there seems to be significant delay ( >5s) when using Contrail as the DNS server. Wondering if it has anything to do with allowing ipv6 AAAA.

Tags: vdns wpc
Revision history for this message
Aniket Daptari (adaptari) wrote :

Hi Alfred, Hari from the vRouter team has sought the following information for him to analyze this further.

Is /etc/resolv.conf getting updated with the "search" and "nameserver" values obtained from DHCP on your VM.

Update of resolv.conf and hostname etc are done by /sbin/dhclient-script. Is it possible to capture the execution output of this file so that we can check what is happening ? We can add the following lines to the script and run dhclient on the interface.
exec &> /tmp/out
set -x

I tested in a standard centos6.5 VM – once dhclient was done on the interface, I see that "hostname", "domainname –d" are showing the expected values (reflecting the data received from DHCP) and /etc/resolv.conf is updated in the VM.

tags: added: wpc
Changed in juniperopenstack:
assignee: nobody → Hari Prasad Killi (haripk)
Changed in opencontrail:
assignee: nobody → Hari Prasad Killi (haripk)
Revision history for this message
Alfred Shen (alfredcs) wrote :

After associating the virtual network with both newly created IPAM and default one, VMs can receive the domain name correctly. Any single IPAM association didn't seem to work. Wondering if this is by design or else.

[root@stack01 ~]# nn net-show net01
+-------------------------+-----------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------+-----------------------------------------------------------------------------------------------------------------+
| admin_state_up | true |
| contrail:fq_name | default-domain |
| | contrail |
| | net01 |
| contrail:instance_count | 0 |
| contrail:subnet_ipam | {"subnet_cidr": "172.16.1.0/24", "ipam_fq_name": ["default-domain", "contrail", "IPAM-1"]} |
| | {"subnet_cidr": "172.16.1.0/24", "ipam_fq_name": ["default-domain", "default-project", "default-network-ipam"]} |
| id | 52de6d0c-453b-4540-b71d-5dc329f1c61d |
| name | net01 |
| router:external | False |
| shared | False |
| status | ACTIVE |
| subnets | 238bd781-5479-46be-adc4-cf64c9df305c |
| | 238bd781-5479-46be-adc4-cf64c9df305c |
| tenant_id | e6735dcc26db4d0497b1d9d07ac7ca2b |
+-------------------------+-----------------------------------------------------------------------------------------------------------------+

[root@c10-1 ~]# hostname -f
c10-1.cloud.eng.pdx.wd
[root@c10-1 ~]# domainname -f
c10-1.cloud.eng.pdx.wd

Revision history for this message
Alfred Shen (alfredcs) wrote :

In addition, command dhclient -r hanged the virtual machine. VM broadcasted bootpc --> bootps but not retuning pkt receive when tcpdump snooping on the VM's tab device.

[root@c10-1 ~]# dhclient -r

(hanged)

[root@comp01-3 ~]# tcpdump -i tap3a515a4f-96 -vvv port bootps or bootpc
tcpdump: WARNING: tap3a515a4f-96: no IPv4 address assigned
tcpdump: listening on tap3a515a4f-96, link-type EN10MB (Ethernet), capture size 65535 bytes
23:26:46.731609 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 328)
    172.16.1.5.bootpc > 172.16.1.1.bootps: [bad udp cksum e763!] BOOTP/DHCP, Request from 02:3a:51:5a:4f:96 (oui Unknown), length 300, xid 0x5d782502, Flags [none] (0x0000)
   Client-IP 172.16.1.5
   Client-Ethernet-Address 02:3a:51:5a:4f:96 (oui Unknown)
   Vendor-rfc1048 Extensions
     Magic Cookie 0x63825363
     DHCP-Message Option 53, length 1: Release
     Server-ID Option 54, length 4: 172.16.1.1
     END Option 255, length 0
     PAD Option 0, length 0, occurs 50

(hanged)

Revision history for this message
Niall Donegan (ndonegan) wrote :

The root cause for the five second delay when doing dns queries has been traced to the interaction between getaddrbyname and either vrouter or the vdns. My money is on something in the Bind patches in vdns.

By default getaddrbyname will send both an A and AAAA queries down a single udp socket, however only the A query is getting responded to. getaddrbyname will wait five seconds for the AAAA before doing both queries quickly with a socket each. I have verified this with tcpdumps on an affected VM.

There is a fix for this which can be put in resolv.conf:

single-request-reopen (since glibc 2.9)
                     The resolver uses the same socket for the A and AAAA
                     requests. Some hardware mistakenly sends back only one
                     reply. When that happens the client system will sit
                     and wait for the second reply. Turning this option on
                     changes this behavior so that if two requests from the
                     same port are not handled correctly it will close the
                     socket and open a new one before sending the second
                     Request.

For EL6 hosts, this can be "fixed" by putting the following line in /etc/sysconfig/network:

RES_OPTIONS=single-request-reopen

While the above does sort the problem on the client side, there's still something funky happening in Contrail that shouldn't be.

tags: added: vdns
Revision history for this message
Édouard Thuleau (ethuleau) wrote :

I'm not sure which IPAM DNS you use but I met a bug with the 'tenant-dns-server' mode where in some cases the AAAA requests was drop [1].

That was fixed on 1.10, 2.0 and master (so 2.1 also), I'm not sure for 1.20.

[1] https://bugs.launchpad.net/opencontrail/+bug/1387710

Revision history for this message
Hari Prasad Killi (haripk) wrote :

Hey Hari,

I think we are Ok with the workaround for now. Thank you very much for your assistances!

-Alfred

From: Hari Prasad Killi <email address hidden>
Date: Wednesday, January 28, 2015 11:00 AM
To: Alfred Shen <email address hidden>, Aniket Daptari <email address hidden>
Cc: Ashish Ranjan <email address hidden>
Subject: Re: Assign correct domain name to VMs

Hi Alfred,
This is the summary of today's investigation.

Name resolution from the VM was working with IPAM being associated with the DNS server. Association with default-network-ipam wasn't required.

'hostname' on the VM wasn't showing the hostname sent via DHCP. However, /etc/resolv.conf had proper domain name and DNS server address.

When we reset hostname to 'localhost.localdomain' and ran dhclient again, hostname was set to the name sent via DHCP. /sbin/dhclient-script is run when dhclient finishes and it updates resolv.conf, hostname etc. with the values received from DHCP

I experimented further with this script, attached are two snapshots from the execution log of this script, when hostname is updated and not updated. We see that only if the current hostname value is '(none)' or localhost or localhost.localdomain, the hostname is being updated. When the value is anything else, hostname is not updated. So, if the VM has default values to hostname, the script updates the hostname. We can either control the behavior updating the script (in function dhconfig, before make_resolv_conf is invoked, it runs hostname and that is getting translated to the actions seen in the attached screenshots) or see to it that the default values are set in the VM prior to DHCP.

Please let me know if any further investigation is required.

Regards,
Hari

Revision history for this message
Alfred Shen (alfredcs) wrote :

Suggested workaround accepted.

Changed in juniperopenstack:
status: New → Fix Committed
status: Fix Committed → Confirmed
Changed in juniperopenstack:
importance: Undecided → Medium
Changed in opencontrail:
importance: Undecided → Medium
Changed in opencontrail:
status: New → Triaged
Changed in opencontrail:
status: Triaged → Invalid
Changed in juniperopenstack:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.