Wired interface gets impossibly high metric 20100

Bug #1814262 reported by Rachel Greenham on 2019-02-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
procps (Ubuntu)
High
Unassigned

Bug Description

Actually this might be a heisenbug. I've had an issue with this all morning since network-manager got an update this morning, but just now *while this bug was being submitted* it decided to correct itself.

What I was getting was, on a machine (Dell XPS 13 9370) with WiFi and a (Caldigit) Thunderbolt 3 dock with an ethernet port: After the network-manager update I noticed everything was slower than I was used to, and in gnome-shell the network icon showing was the WiFi one, not the wired one.

Looking at the output of route, or route -n for simplicity, I would see this:

rachel@rainbow:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.1.254 0.0.0.0 UG 600 0 0 wlp2s0
0.0.0.0 192.168.1.254 0.0.0.0 UG 20100 0 0 enp63s0
169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 wlp2s0
192.168.1.0 0.0.0.0 255.255.255.0 U 100 0 0 enp63s0
192.168.1.0 0.0.0.0 255.255.255.0 U 600 0 0 wlp2s0

So the metric on the default route on enp63s0 had 20,000 mysteriously added to it, which would obviously make it extremely low-priority. The system was choosing the wifi connection instead, which isn't that great in my office, hence observable slowness.

Now, this morning, this seemed to be the sticky situation. It didn't show any sign of changing, whatever I did, after restarts of network-manager, undock/redock, reboots, etc. I could change it manually with ifmetric (and it would work), but that was about it.

I would have reported the bug then, but I had to go out. When I got back I plugged in and initially saw the same thing again (that's where the above snippet was pasted from). But *while* the ubuntu-bug network-manager command was running, I noticed the gnome-shell network icon switch to wired, checked again, and saw:

rachel@rainbow:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.1.254 0.0.0.0 UG 100 0 0 enp63s0
0.0.0.0 192.168.1.254 0.0.0.0 UG 20600 0 0 wlp2s0
169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 wlp2s0
192.168.1.0 0.0.0.0 255.255.255.0 U 100 0 0 enp63s0
192.168.1.0 0.0.0.0 255.255.255.0 U 600 0 0 wlp2s0

So now the wifi connection has 20,000 added to it, which may still be wrong? But I wouldn't otherwise have noticed it because the system is again *behaving* as expected.

This all seemed to happen after the network-manager upgrade (from 1.12.6-0ubuntu4 to 1.15.2-0ubuntu1) this morning. I can't say if these metric+20,000 values were present before then, because I didn't have any cause to go looking at it, it always just worked. Could it be some issue with how the newer network-manager, or one of its associated packages, is figuring out the metrics on new connections? Like it's running some new heuristic to determine which one should really be the preferred? If it's like it was just now, when it fixed itself after a minute or so, that's not really a problem, but if it's like it was this morning when it just seemed to be stuck with the ethernet connection at 20100, it is.

ProblemType: Bug
DistroRelease: Ubuntu 19.04
Package: network-manager 1.15.2-0ubuntu1
ProcVersionSignature: Ubuntu 4.18.0-13.14-generic 4.18.17
Uname: Linux 4.18.0-13-generic x86_64
ApportVersion: 2.20.10-0ubuntu19
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Fri Feb 1 13:15:06 2019
IfupdownConfig:
 # interfaces(5) file used by ifup(8) and ifdown(8)
 auto lo
 iface lo inet loopback
InstallationDate: Installed on 2018-09-11 (142 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20180214)
IpRoute:
 default via 192.168.1.254 dev wlp2s0 proto dhcp metric 600
 default via 192.168.1.254 dev enp63s0 proto dhcp metric 20100
 169.254.0.0/16 dev wlp2s0 scope link metric 1000
 192.168.1.0/24 dev enp63s0 proto kernel scope link src 192.168.1.106 metric 100
 192.168.1.0/24 dev wlp2s0 proto kernel scope link src 192.168.1.101 metric 600
NetworkManager.state:
 [main]
 NetworkingEnabled=true
 WirelessEnabled=true
 WWANEnabled=true
RfKill:
 1: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: network-manager
UpgradeStatus: Upgraded to disco on 2019-01-13 (18 days ago)
nmcli-dev:
 DEVICE TYPE STATE IP4-CONNECTIVITY IP6-CONNECTIVITY DBUS-PATH CONNECTION CON-UUID CON-PATH
 wlp2s0 wifi connected full limited /org/freedesktop/NetworkManager/Devices/2 Strange Noises ae544444-3c3d-4714-8bf5-7e56bb7249c6 /org/freedesktop/NetworkManager/ActiveConnection/2
 enp63s0 ethernet connected limited limited /org/freedesktop/NetworkManager/Devices/4 enp63s0 7d394ed8-72c6-4081-9dac-18e320acdafd /org/freedesktop/NetworkManager/ActiveConnection/3
 lo loopback unmanaged unknown unknown /org/freedesktop/NetworkManager/Devices/1 -- -- --
nmcli-nm:
 RUNNING VERSION STATE STARTUP CONNECTIVITY NETWORKING WIFI-HW WIFI WWAN-HW WWAN
 running 1.15.2 connected started full enabled enabled enabled enabled enabled

Sebastien Bacher (seb128) wrote :

Thank you for your bug report. Is there any chance that you could also report the issue upstream on https://gitlab.freedesktop.org/NetworkManager/NetworkManager? We do keep up with updates for that component but don't know the code as well that they do and there might have a better idea of what's wrong there

Noted. I see there's no report of anything similar already, and if it
was the sort of problem it looked like to me, people would be screaming
blue murder about it, so I think I'll wait and see if it recurs or
becomes an ongoing problem, rather than a one-off. Maybe my LAN was
having a bad hair day...

Sebastien Bacher (seb128) wrote :

k, it could also be a bug in the new 1.15 serie with is new/still unstable so not likely used anywhere in 'production' and having little users at the moment

As it recurred again today and showed no signs of correcting itself like it did on Friday, I went ahead and reported it upstream, here: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/116

Reporting back on this:

The opinion there seems to be that the problem is down to the sys net.ipv4.conf.*.rp_filter values being set to 1 instead of defaulting to 0. This is done in the procps package, and I'm guessing is the way it is as a protection against IP spoofing. kernel doc page I was pointed to says:

 Current recommended practice in RFC3704 is to enable strict mode
 to prevent IP spoofing from DDos attacks. If using asymmetric routing
 or other complicated routing, then loose mode is recommended.

 The max value from conf/{all,interface}/rp_filter is used
 when doing source validation on the {interface}.

 Default value is 0. Note that some distributions enable it
 in startup scripts.

Presumably Ubuntu enables by default (I can see it does, in a file in the procps package) and Red Hat, where it seems the NetworkManager maintainers sit, does not.

This is going to have to be argued out between procps and network-manager maintainers I guess. You can have IP spoofing protection or you can have connectivity checking. Choose one, or argue who should fix it. :-) Personally, at least for now, my solution is to remove the connectivity-check package, which was presumably brought in by something, and keep the procps defaults.

Follow up comment on the upstream bug pointed to a commit where it suggests the rp_filter default should actually now be 2 rather than 1: https://github.com/systemd/systemd/commit/230450d4e4f1f5fc9fa4295ed9185eea5b6ea16e

Think at this point I need to just let you guys talk amongst yourself. :-) For me, my fix for now is to uninstall the connectivity-check package, which disables the functionality. I'm not going to mess about changing procps defaults.

Sebastien Bacher (seb128) wrote :

Thanks for the investigation work, I emailed the Ubuntu devel list about changing the default, let's see how the discussion goes
https://lists.ubuntu.com/archives/ubuntu-devel/2019-February/040588.html

Changed in network-manager (Ubuntu):
importance: Undecided → High
status: New → In Progress
affects: network-manager (Ubuntu) → procps (Ubuntu)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package procps - 2:3.3.15-2ubuntu2

---------------
procps (2:3.3.15-2ubuntu2) disco; urgency=medium

  * 10-network-security.conf: change the rp_filter default from 1 to 2,
    the strict mode isn't compatible with the n-m handling of
    captive portals (lp: #1814262)

 -- Sebastien Bacher <email address hidden> Thu, 07 Feb 2019 23:46:43 +0100

Changed in procps (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers