Dnsmasq sends AAAA and MX requests for private ip range to public DNS server, that slow down SSH access to target nodes

Bug #1393605 reported by Dmitry Borodaenko
32
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Low
Alexander Arzhanov
7.0.x
Invalid
Low
Alexander Arzhanov

Bug Description

SSH from FUEL Server Really slow when using node names ( >10 Sec), Causing the scripts execution really long time.
but ssh to IP address of node (control/compute) from FUEL is instant.
It was immediately suspected to do with reverse DNS lookup so 'UseDNS no' was set in /etc/ssh/sshd_config but problem still persists.

Later it was discovered that no entry where present in Fuels /etc/hosts for the existing nodes. After manually added this node entries ssh worked without >10 sec delay.

Was originally misfiled as a blueprint:
https://blueprints.launchpad.net/fuel/+spec/fuel-master-hosts

Discussion from the blueprint whiteboard:

[Dmitry Borodaenko 2014-10-17] I don't think duplicating IP addressing between DNS and /etc/hosts is a good idea, we should identify and resolve your DNS problem instead.

[Fabrizio Soppelsa 2014-10-20] Such an explosion of names in the hosts file doesn't look as the cleanest workaround... I would instead troubleshoot DNS performances and/or evaluate adding a DNS cache mechanism.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Michael, please provide more information about what was wrong with your DNS setup.

Changed in fuel:
status: New → Incomplete
Revision history for this message
Michael Kraynov (mkraynov) wrote :
Download full text (3.5 KiB)

I have very simple installation. The configuration settings wasn't change from default.

[root@fuel ~]# cat /etc/resolv.conf
search domain.tld
domain domain.tld
nameserver 10.20.0.2

[root@fuel ~]# cat /etc/dnsmasq.upstream
domain domain.tld
search domain.tld
nameserver 10.20.0.1

[root@fuel ~]# dockerctl shell cobbler cat /etc/dnsmasq.upstream
domain domain.tld
search domain.tld
nameserver 8.8.8.8

[root@fuel ~]# dockerctl shell cobbler cat /etc/dnsmasq.conf
# Cobbler generated configuration file for dnsmasq
# Mon Nov 17 11:46:51 2014

read-ethers
log-dhcp
log-queries
log-facility=/var/log/dnsmasq.log
addn-hosts = /var/lib/cobbler/cobbler_hosts
domain=local
dhcp-lease-max=1000
server=/local/
resolv-file=/etc/dnsmasq.upstream
dhcp-match=gpxe,175
interface=eth0

# This is one of the key options. dnsmasq tries to move out servername
# and PXE filename from special fields into DHCP options.
# Some old clients can't understand those DHCP options, so they
# will not be able to boot via PXE without this option enabled.
# For example gPXE will not work while iPXE works fine.
dhcp-no-override

dhcp-option=6,10.20.0.2

dhcp-range=internal,10.20.0.128,10.20.0.254,255.255.255.0
dhcp-option=net:internal,option:router,10.20.0.2
pxe-service=net:#gpxe,x86PC,"Install",pxelinux,10.20.0.2
dhcp-boot=net:internal,pxelinux.0,boothost,10.20.0.2

dhcp-host=net:x86_64,52:54:00:22:7a:02
dhcp-host=net:x86_64,52:54:00:be:22:02
dhcp-host=net:x86_64,52:54:00:dd:c8:02,node-5.domain.tld,10.20.0.7
dhcp-host=net:x86_64,52:54:00:22:7a:04
dhcp-host=net:x86_64,52:54:00:be:22:04
dhcp-host=net:x86_64,52:54:00:dd:c8:04,node-4.domain.tld,10.20.0.6
dhcp-host=net:x86_64,52:54:00:22:7a:03
dhcp-host=net:x86_64,52:54:00:be:22:03
dhcp-host=net:x86_64,52:54:00:dd:c8:03,node-1.domain.tld,10.20.0.3
dhcp-host=net:x86_64,52:54:00:22:7a:01
dhcp-host=net:x86_64,52:54:00:be:22:01
dhcp-host=net:x86_64,52:54:00:dd:c8:01,node-3.domain.tld,10.20.0.5
dhcp-host=net:x86_64,52:54:00:22:7a:05
dhcp-host=net:x86_64,52:54:00:be:22:05
dhcp-host=net:x86_64,52:54:00:dd:c8:05,node-2.domain.tld,10.20.0.4

debug2: channel 0: window 998976 sent adjust 49600

[root@fuel ~]# dockerctl shell cobbler cat /etc/resolv.conf
search domain.tld
domain domain.tld
nameserver 10.20.0.2

[root@fuel ~]# time ssh node-1 "date"
Warning: Permanently added 'node-1' (RSA) to the list of known hosts.
Tue Nov 18 05:22:57 UTC 2014

real 0m15.110s
user 0m0.012s
sys 0m0.005s
[root@fuel ~]# time ssh 10.20.0.3 "date"
Warning: Permanently added '10.20.0.3' (RSA) to the list of known hosts.
Tue Nov 18 05:23:02 UTC 2014

real 0m0.086s
user 0m0.013s
sys 0m0.003s

When I set up connection to the Internet issue was resolved.

[root@fuel ~]# ip addr add 172.16.0.250/24 dev eth1
[root@fuel ~]# ip link set up dev eth1
[root@fuel ~]# route add default gw 172.16.0.1
[root@fuel ~]# route del default gw 10.20.0.1
[root@fuel ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.20.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
172.16.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
169.254.0.0 0.0.0.0 255.255.0.0...

Read more...

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

Reason of slow response is that dnsmasq tried to forward AAAA and MX query to configured 8.8.8.8, and can't access it:

[root@fuel ~]# date;time host node-12;date
Thu Dec 18 16:29:10 UTC 2014
node-12.domain.tld has address 10.20.0.3
;; connection timed out; no servers could be reached
;; connection timed out; no servers could be reached

real 0m20.007s
user 0m0.005s
sys 0m0.001s
Thu Dec 18 16:29:30 UTC 2014

In the same time I looked into the /var/log/dnsmasq.log inside cobbler:
Dec 18 16:29:10 dnsmasq[1254]: query[A] node-12.domain.tld from 172.17.42.1
Dec 18 16:29:10 dnsmasq[1254]: /var/lib/cobbler/cobbler_hosts node-12.domain.tld is 10.20.0.3
Dec 18 16:29:10 dnsmasq[1254]: query[AAAA] node-12.domain.tld from 172.17.42.1
Dec 18 16:29:10 dnsmasq[1254]: forwarded node-12.domain.tld to 8.8.8.8
Dec 18 16:29:15 dnsmasq[1254]: query[AAAA] node-12.domain.tld from 172.17.42.1
Dec 18 16:29:15 dnsmasq[1254]: forwarded node-12.domain.tld to 8.8.8.8
Dec 18 16:29:20 dnsmasq[1254]: query[MX] node-12.domain.tld from 172.17.42.1
Dec 18 16:29:20 dnsmasq[1254]: forwarded node-12.domain.tld to 8.8.8.8
Dec 18 16:29:25 dnsmasq[1254]: query[MX] node-12.domain.tld from 172.17.42.1
Dec 18 16:29:25 dnsmasq[1254]: forwarded node-12.domain.tld to 8.8.8.8

Adding this line to /etc/dnsmasq.conf inside cobbler container helps:
local=/domain.tld/

[root@fuel ~]# date;time host node-12;date
Thu Dec 18 16:51:26 UTC 2014
node-12.domain.tld has address 10.20.0.3

real 0m0.005s
user 0m0.003s
sys 0m0.000s
Thu Dec 18 16:51:26 UTC 2014

And in the logs:
[root@a1540110b032 log]# tail -f dnsmasq.log
Dec 18 16:51:13 dnsmasq-dhcp[1654]: DHCP, IP range 10.20.0.128 -- 10.20.0.254, lease time 1h
Dec 18 16:51:13 dnsmasq[1654]: using local addresses only for domain domain.tld
Dec 18 16:51:13 dnsmasq[1654]: using local addresses only for domain local
Dec 18 16:51:13 dnsmasq[1654]: reading /etc/dnsmasq.upstream
Dec 18 16:51:13 dnsmasq[1654]: using nameserver 8.8.8.8#53
Dec 18 16:51:13 dnsmasq[1654]: using local addresses only for domain domain.tld
Dec 18 16:51:13 dnsmasq[1654]: using local addresses only for domain local
Dec 18 16:51:13 dnsmasq[1654]: read /etc/hosts - 7 addresses
Dec 18 16:51:13 dnsmasq[1654]: read /var/lib/cobbler/cobbler_hosts - 2 addresses
Dec 18 16:51:13 dnsmasq-dhcp[1654]: read /etc/ethers - 0 addresses
Dec 18 16:51:26 dnsmasq[1654]: query[A] node-12.domain.tld from 172.17.42.1
Dec 18 16:51:26 dnsmasq[1654]: /var/lib/cobbler/cobbler_hosts node-12.domain.tld is 10.20.0.3
Dec 18 16:51:26 dnsmasq[1654]: query[AAAA] node-12.domain.tld from 172.17.42.1
Dec 18 16:51:26 dnsmasq[1654]: config node-12.domain.tld is NODATA-IPv6
Dec 18 16:51:26 dnsmasq[1654]: query[MX] node-12.domain.tld from 172.17.42.1
Dec 18 16:51:26 dnsmasq[1654]: config node-12.domain.tld is NODATA

Need to confirm from dns experts, that this is correct solution.

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

Bug with similar problem
https://bugs.launchpad.net/mos/+bug/1409661

I think configuration of dnsmasq.conf should reviewed and configured to not send wrong name resolution requests to public server.

This bug assigned to Michael Kraynov, he is from Support. Please reassign to right person.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Also related to bug #1427940.

Changed in fuel:
assignee: Michael Kraynov (mkraynov) → Fuel Library Team (fuel-library)
summary: - Reverse DNS lookups slow down SSH access to target nodes
+ Dnsmasq send AAAA and MX requests for private ip range to public DNS
+ server, that slow down SSH access to target nodes
summary: - Dnsmasq send AAAA and MX requests for private ip range to public DNS
+ Dnsmasq sends AAAA and MX requests for private ip range to public DNS
server, that slow down SSH access to target nodes
Revision history for this message
Alexander Nevenchannyy (anevenchannyy) wrote :

Alexander Bozhenko, yes adding local domain in dnsmasq.conf are looks right for me.

Changed in fuel:
status: Confirmed → Triaged
tags: added: low-hanging-fruit
Revision history for this message
Bartłomiej Piotrowski (bpiotrowski) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

there is nothing similar to local=/domain.tld/

Changed in fuel:
status: Triaged → Won't Fix
Revision history for this message
Alexander Arzhanov (aarzhanov) wrote :

i can't reproduce this bug in following cases:
* Fuel master node connected to internet
* Fuel master node NOT connected to internet

It should be noted that Fuel master node must always be connected to the Internet(IBP provision issue, etc)

I checked on this ISO#98:
api: '1.0'
astute_sha: 34e0493afa22999c4a07d3198ceb945116ab7932
auth_required: true
build_id: 2015-07-27_09-24-22
build_number: '98'
feature_groups:
- mirantis
fuel-agent_sha: 2a65f11c10b0aeb5184247635a19740fc3edde21
fuel-library_sha: 39c3162ee2e2ff6e3af82f703998f95ff4cc2b7a
fuel-ostf_sha: 94a483c8aba639be3b96616c1396ef290dcc00cd
fuelmain_sha: 921918a3bd3d278431f35ad917989e46b0c24100
nailgun_sha: d5c19f6afc66b5efe3c61ecb49025c1002ccbdc6
openstack_version: 2015.1.0-7.0
production: docker
python-fuelclient_sha: 58c411d87a7eaf0fd6892eae2b5cb1eff4190c98
release: '7.0'
release_versions:
  2015.1.0-7.0:
    VERSION:
      api: '1.0'
      astute_sha: 34e0493afa22999c4a07d3198ceb945116ab7932
      build_id: 2015-07-27_09-24-22
      build_number: '98'
      feature_groups:
      - mirantis
      fuel-agent_sha: 2a65f11c10b0aeb5184247635a19740fc3edde21
      fuel-library_sha: 39c3162ee2e2ff6e3af82f703998f95ff4cc2b7a
      fuel-ostf_sha: 94a483c8aba639be3b96616c1396ef290dcc00cd
      fuelmain_sha: 921918a3bd3d278431f35ad917989e46b0c24100
      nailgun_sha: d5c19f6afc66b5efe3c61ecb49025c1002ccbdc6
      openstack_version: 2015.1.0-7.0
      production: docker
      python-fuelclient_sha: 58c411d87a7eaf0fd6892eae2b5cb1eff4190c98
      release: '7.0'

I marked this bug as Invalid.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.