NM fails to configure routing correctly when connect to two LANs

Bug #979067 reported by Guillaume Melquiond
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
dnsmasq (Ubuntu)
Invalid
Undecided
Unassigned
network-manager (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

My computer happens to be connected to two networks through eth0 and wlan0. NetworkManager connects to both of them automatically and use DHCP for both of them. Both networks are protected from the outside by firewalls and they are lots of DNS servers (obtained automatically). Since the upgrade to Precise, DNS no longer works without manual intervention.

What happens is that, since bug #903854, dnsmasq no longer receives the option --strict-order. As a consequence, it tries servers in a random order to resolve DNS requests. In particular, it tries to access DNS servers from the wireless network through the eth0 route, which crosses a firewall, and therefore it fails to resolve requests. It takes a lot of failures before all the unreachable servers have been exhausted, which makes for a poor user experience.

With the --strict-order option, dnsmasq tries to access servers that can be accessed by the default route, which succeeds immediately. Note that I am not sure that --strict-order is the proper fix and it may only be papering over a real bug, but at least it does fix the issue.

network-manager: 0.9.4.0-0ubuntu2
dnsmasq-base: 2.59-4

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Seems to me like this is possibly not the exact cause of the issue. It's expected that any network reachable via the default route (when no "more specific" route exists) will be attempted via the default route (because the kernel is supposed to always use the most specific route to get to a destination).

Furthermore, dnsmasq is supposed to be trying the dns nameservers in parallel, or at least sufficiently quickly and without blocking that this shouldn't affect network resolution in a bad way.

Could you please attach the contents of /run/nm-dns-dnsmasq.conf as well as the output of 'ip route' when you are reproducing the issue so that we can debug this?

Changed in network-manager (Ubuntu):
status: New → Incomplete
Revision history for this message
Guillaume Melquiond (guillaume-melquiond) wrote :

See below for the results of various commands. Prefix xx.xx is the Ethernet network while yy.yy is the wireless network. Both of them are in the public IP range (in case it matters). I also added the result of the dig command, without and with specifying a DNS server. Notice that directly querying the xx.xx server works, while querying dnsmask or an yy.yy server fails in the same way. I haven't put the tests in this message, but as you can guess, if I unplug eth0, both dnsmask and yy.yy server succeeds, and if I kill wlan0, both dnsmask and xx.xx server succeeds.

$ ip route
default via xx.xx.212.190 dev eth0 proto static
yy.yy.72.0/24 dev wlan0 proto kernel scope link src yy.yy.72.155 metric 2
169.254.0.0/16 dev eth0 scope link metric 1000
xx.xx.212.128/25 dev eth0 proto kernel scope link src xx.xx.212.250 metric 1

$ cat /var/run/nm-dns-dnsmasq.conf
server=xx.xx.213.253
server=yy.yy.34.35
server=yy.yy.36.37

$ dig www.google.fr
...
;; AUTHORITY SECTION:
fr. 8222 IN NS d.nic.fr.
...
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)

$ dig www.google.fr @xx.xx.213.253
...
;; ANSWER SECTION:
www.google.fr. 83423 IN CNAME www-cctld.l.google.com.
www-cctld.l.google.com. 114 IN A 173.194.34.24
...
;; AUTHORITY SECTION:
google.com. 169795 IN NS ns2.google.com.
...
;; Query time: 1 msec
;; SERVER: xx.xx.213.253#53(xx.xx.213.253)

$ dig www.google.fr @yy.yy.34.35
...
;; AUTHORITY SECTION:
fr. 2134 IN NS e.ext.nic.fr.
...
;; Query time: 1 msec
;; SERVER: yy.yy.34.35#53(yy.yy.34.35)

Changed in network-manager (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → Medium
summary: - Please reenable dnsmasq --strict-order
+ DNS unresponsive/slow when different DNS are provided by wifi and wired
+ connected to different networks
Revision history for this message
Thomas Hood (jdthood) wrote : Re: DNS unresponsive/slow when different DNS are provided by wifi and wired connected to different networks

The first problem is with your routing. Address yy.yy.34.35 is presumably to be reached via yy.yy.72.0/24 but the kernel does not know this.

Second problem is bug #1003842 again.

Revision history for this message
Thomas Hood (jdthood) wrote :

Hmm, not exactly #1003842, since you don't have the problem that some nameservers are "screening off" others with NXDOMAIN.

The worst we can say about dnsmasq in this context is that it could behave better in the case where several listed upstream nameservers are unreachable.

Thomas Hood (jdthood)
summary: - DNS unresponsive/slow when different DNS are provided by wifi and wired
- connected to different networks
+ (1) NM fails to configure routing correctly when connect to two LANs;
+ (2) dnsmasq initially slow when some upstream nameservers can't be
+ reached
Revision history for this message
Thomas Hood (jdthood) wrote :

The second part of this bug report, concerning dnsmasq's poor behavior when some upstream nameservers are unreachable, has also been reported in #991308. Let's continue discussing that problem over there and reserve this bug report (#979067) for the problem that routing is not configured correctly when Guillaume connects to two LANs.

summary: - (1) NM fails to configure routing correctly when connect to two LANs;
- (2) dnsmasq initially slow when some upstream nameservers can't be
- reached
+ NM fails to configure routing correctly when connect to two LANs
Thomas Hood (jdthood)
Changed in dnsmasq (Ubuntu):
status: New → Invalid
Revision history for this message
Ouroborus (deadchicken) wrote :

I have a similar situation (at least it fits the title). I have two wired NICs, one configured for static (10GbE) and the other configured for DHCP (1GbE). Only the DHCP-configured NIC ever has internet access. It seems that no matter how I arrange things, NetworkManager sets the default route to be the internet-less, static-configured NIC.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.