ifup service of network device stay active after driver stop

Bug #1672144 reported by Talat Batheesh
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Tim Gardner
Yakkety
Fix Released
Undecided
Tim Gardner
Zesty
Fix Released
Undecided
Unassigned

Bug Description

The network device systemd service stay active after unload the module of this network device, that call close port (ndo_stop).
once we try to load the NIC driver again, it try to start the ifup service of his NICs and due to the service is already up, so it fail and we didn't see the interface with the static configuration =.
below simple reproduce with the Mellanox ConnectX4 device (driver name mlx5_core).

Also we see this issue with Azure system, Ubuntu 17.04 guest over Hyper-v, the VF failed to start after re-enable SR-IOV from VM's vNIC.

For now we have a Work Around that to add a udev rule,
 echo DRIVERS==\"*mlx*\", SUBSYSTEM==\"net\", ACTION==\"add\",RUN+=\"/sbin/ifup --force $env{INTERFACE}\" > /lib/udev/rules.d/100-up.rules
Example:
#:/lib/udev/rules.d# cat 100-up.rules
DRIVERS=="*mlx*", SUBSYSTEM=="net", ACTION=="add",RUN+="/sbin/ifup --force $env{INTERFACE}"

***************************
* More info and reproduce *
***************************
# ifdown ens1f0
RTNETLINK answers: Cannot assign requested address
# ifup ens1f0
# ifconfig ens1f0
ens1f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 123.12.23.1 netmask 255.255.0.0 broadcast 123.12.255.255
        inet6 fe80::268a:7ff:fea1:fbdc prefixlen 64 scopeid 0x20<link>
        ether 24:8a:07:a1:fb:dc txqueuelen 1000 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 17 bytes 1392 (1.3 KB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

# ethtool -i ens1f0 |grep driv
driver: mlx5_core
# systemctl status ifup@ens1f
<email address hidden> <email address hidden>

# systemctl status <email address hidden>
* <email address hidden> - ifup for ens1f0
   Loaded: loaded (/lib/systemd/system/ifup@.service; static; vendor preset: enabled)
   Active: active (exited) since Sun 2017-03-12 09:40:04 IST; 2h 26min ago
 Main PID: 1608 (code=exited, status=0/SUCCESS)
   CGroup: /<email address hidden>

Mar 12 09:40:04 qa-h-vrt-039 systemd[1]: Started ifup for ens1f0.
Mar 12 09:40:04 qa-h-vrt-039 sh[1608]: ifup: interface ens1f0 already configured
root@qa-h-vrt-039:/tmp# modprobe -rv mlx5_ib
rmmod mlx5_ib
rmmod mlx5_core

# modprobe -rv mlx5_core

# ifconfig -a |grep ens1f0

# lsmod |grep mlx5

# systemctl status <email address hidden>
* <email address hidden> - ifup for ens1f0
   Loaded: loaded (/lib/systemd/system/ifup@.service; static; vendor preset: enabled)
   Active: active (exited) since Sun 2017-03-12 09:40:04 IST; 2h 27min ago
 Main PID: 1608 (code=exited, status=0/SUCCESS)
   CGroup: /<email address hidden>

Mar 12 09:40:04 qa-h-vrt-039 systemd[1]: Started ifup for ens1f0.
Mar 12 09:40:04 qa-h-vrt-039 sh[1608]: ifup: interface ens1f0 already configured

# modprobe mlx5_core

# ifconfig ens1f0
ens1f0: flags=4098<BROADCAST,MULTICAST> mtu 1500
        ether 24:8a:07:a1:fb:dc txqueuelen 1000 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eno1
iface eno1 inet dhcp

#ens1f0
auto ens1f0
iface ens1f0 inet static
address 123.12.23.1
netmask 255.255.0.0
mtu 1500

*********************************
* Another repto and investigate *
*********************************
once interface is created the system starts a service that is responsible for activating it (basically runs ifup).
so, at first shot everything works.
at the second driver reload:
Good flow (on good setup 4.9.0-rc5+):
1. driver is unloaded and the interface’s “ifup” service is shutdown:
Feb 23 00:54:09 reg-l-vrt-206-006 kernel: [67777.790189] mlx4_en: enP43508p0s2: Close port called
Feb 23 00:54:09 reg-l-vrt-206-006 kernel: [67777.868484] hv_netvsc a2be13bb-7244-44ff-a31a-dea8d58a79da eth1: VF down: enP43508p0s2
Feb 23 00:54:09 reg-l-vrt-206-006 kernel: [67777.868487] hv_netvsc a2be13bb-7244-44ff-a31a-dea8d58a79da eth1: Data path switched from VF: enP43508p0s2
Feb 23 00:54:09 reg-l-vrt-206-006 kernel: [67777.869575] hv_netvsc a2be13bb-7244-44ff-a31a-dea8d58a79da eth1: VF unregistering: enP43508p0s2
Feb 23 00:54:09 reg-l-vrt-206-006 systemd1: Stopping ifup for enP43508p0s2...
Feb 23 00:54:09 reg-l-vrt-206-006 ifdown47196: Cannot find device "enP43508p0s2"
Feb 23 00:54:09 reg-l-vrt-206-006 ifdown47196: Cannot find device "enP43508p0s2"
Feb 23 00:54:09 reg-l-vrt-206-006 systemd1: Stopped ifup for enP43508p0s2.
2. driver is loaded again, and a new instance of the service is started and it activates the interface.
Feb 23 00:54:15 reg-l-vrt-206-006 kernel: [67783.130286] mlx4_en: a9f4:00:02.0: Port 1: Initializing port
Feb 23 00:54:15 reg-l-vrt-206-006 kernel: [67783.130853] hv_netvsc a2be13bb-7244-44ff-a31a-dea8d58a79da eth1: VF registering: eth2
Feb 23 00:54:15 reg-l-vrt-206-006 kernel: [67783.137848] mlx4_core a9f4:00:02.0 enP43508p0s2: renamed from eth2
Feb 23 00:54:15 reg-l-vrt-206-006 systemd1: Started ifup for enP43508p0s2.
Bad flow (on problematic setup 4.10.0-Jack):
1. driver is unloaded, but close port not called?
and ifup service is not shut down!!
Feb 23 00:54:15 localhost kernel: [54441.503703] hv_netvsc cbedec5e-b321-4e5d-81dd-619d805f8f8d eth1: VF unregistering: enP48090p0s2
2. driver is loaded again
Feb 23 00:54:23 localhost kernel: [54449.948686] mlx4_en: bbda:00:02.0: Port 1: Initializing port
Feb 23 00:54:23 localhost kernel: [54449.949306] hv_netvsc cbedec5e-b321-4e5d-81dd-619d805f8f8d eth1: VF registering: eth2
Feb 23 00:54:23 localhost kernel: [54449.956623] mlx4_core bbda:00:02.0 enP48090p0s2: renamed from eth2
Feb 23 00:54:23 localhost systemd1: Started ifup for enP48090p0s2.

CVE References

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Does "The network device systemd service" equal to "sys-subsystem-net-devices-*.device"?

Revision history for this message
Talat Batheesh (talat-b87) wrote :

No, not equal

:~# systemctl cat sys-subsystem-net-devices-ens1f0.device
No files found for sys-subsystem-net-devices-ens1f0.device.

:~# ifup ens1f0
ifup: interface ens1f0 already configured

:~# systemctl cat <email address hidden>
# /lib/systemd/system/ifup@.service
[Unit]
Description=ifup for %I
After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service
Before=network.target shutdown.target network-online.target
Conflicts=shutdown.target
BindsTo=sys-subsystem-net-devices-%i.device
After=sys-subsystem-net-devices-%i.device
DefaultDependencies=no
IgnoreOnIsolate=yes

[Service]
# avoid stopping on shutdown via stopping system-ifup.slice
Slice=system.slice
ExecStart=/bin/sh -ec 'ifup --allow=hotplug %I; ifup --allow=auto %I; \
    if ifquery %I >/dev/null; then ifquery --state %I >/dev/null; fi'
ExecStop=/sbin/ifdown %I
RemainAfterExit=true
TimeoutStartSec=5min

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Hmm, I failed to do something like `systemd enable <email address hidden>` because there's no [Install] section.

Do you know which program started this service?

Revision history for this message
Talat Batheesh (talat-b87) wrote :

network interface ifup.
# ifup <interface-name>

Revision history for this message
Talat Batheesh (talat-b87) wrote :

This bug reproduced with tg3 , mlx4, mlx5 and bonding modules and i beleve that this issue reproduce on each interface that has ifup service.

simple reproduce
1. interfaces is up after reboot or ifup.
2. modprobe -r <module name>
#network interfaces closed and removed
3.systemctl status ifup@<NIC-name>.service
# service is up although the network interface doesn't exist.

This issue not reproduced with kernel 4.9.
when we moved to kernel 4.10 we see this issue.

thanks
Talat

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

It works on linux 4.11-rc3.
After kernel bisect, this commit fixes the issue:

commit 91864f5852f9996210fad400cf70fb85af091243
Author: Andrey Vagin <email address hidden>
Date: Sun Mar 12 21:36:18 2017 -0700

    net: use net->count to check whether a netns is alive or not

    The previous idea was to check whether a net namespace is in
    net_exit_list or not. It doesn't work, because net->exit_list is used in
    __register_pernet_operations and __unregister_pernet_operations where
    all namespaces are added to a temporary list to make cleanup in a error
    case, so list_empty(&net->exit_list) always returns false.

    Reported-by: Mantas Mikulėnas <email address hidden>
    Fixes: 002d8a1a6c11 ("net: skip genenerating uevents for network namespaces that are exiting")
    Signed-off-by: Andrei Vagin <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>

And it's also included in 4.10.5 stable tree, so it'll land to 17.04's linux kernel soon.
You can try 4.10.5 mainline kernel at here:
  http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.5/

no longer affects: systemd (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1672144

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Talat Batheesh (talat-b87) wrote :

Thank you ....
I will give a try and update.

Revision history for this message
Talat Batheesh (talat-b87) wrote :

The issue doesn't reproduce with the 4.10.5 mainline kernel.
could you please cherry-pick this commit to the zesty kernel?

Thanks
Talat

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

It should be included in next release of zesty kernel.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

v4.10.5 was included in Ubuntu-4.10.0-15.17

Changed in linux (Ubuntu Zesty):
status: Incomplete → Fix Released
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-yakkety
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.8.0-52.55

---------------
linux (4.8.0-52.55) yakkety; urgency=low

  * linux: 4.8.0-52.55 -proposed tracker (LP: #1686976)

  * CVE-2017-7477: macsec: avoid heap overflow in skb_to_sgvec (LP: #1685892)
    - macsec: avoid heap overflow in skb_to_sgvec
    - macsec: dynamically allocate space for sglist

  * net/ipv4: original ingress device index set as the loopback interface.
    (LP: #1683982)
    - net: fix incorrect original ingress device index in PKTINFO

  * Touchpad not working correctly after kernel upgrade (LP: #1662589)
    - Input: ALPS - fix V8+ protocol handling (73 03 28)

  * ifup service of network device stay active after driver stop (LP: #1672144)
    - net: use net->count to check whether a netns is alive or not

  * [Hyper-V] mkfs regression in kernel 4.4+ (LP: #1682215)
    - block: relax check on sg gap

  * Potential memory corruption with capi adapters (LP: #1681469)
    - powerpc/mm: Add missing global TLB invalidate if cxl is active

  * [Hyper-V/Azure] Please include Mellanox OFED drivers in Azure kernel and
    image (LP: #1650058)
    - net/mlx4_en: Fix bad WQE issue
    - net/mlx4_core: Fix racy CQ (Completion Queue) free
    - net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT
      transitions
    - net/mlx4_core: Avoid command timeouts during VF driver device shutdown

 -- Stefan Bader <email address hidden> Fri, 28 Apr 2017 12:17:12 +0200

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.9 KiB)

This bug was fixed in the package linux - 4.4.0-78.99

---------------
linux (4.4.0-78.99) xenial; urgency=low

  * linux: 4.4.0-78.99 -proposed tracker (LP: #1686645)

  * Please backport fix to reference leak in cgroup blkio throttle
    (LP: #1683976)
    - block: fix module reference leak on put_disk() call for cgroups throttle

  * UbuntuKVM guest crashed while running I/O stress test with Ubuntu kernel
    4.4.0-47-generic (LP: #1659111)
    - block: Unhash block device inodes on gendisk destruction
    - block: Use pointer to backing_dev_info from request_queue
    - block: Dynamically allocate and refcount backing_dev_info
    - block: Make blk_get_backing_dev_info() safe without open bdev
    - block: Get rid of blk_get_backing_dev_info()
    - block: Move bdev_unhash_inode() after invalidate_partition()
    - block: Unhash also block device inode for the whole device
    - block: Revalidate i_bdev reference in bd_aquire()
    - block: Initialize bd_bdi on inode initialization
    - block: Move bdi_unregister() to del_gendisk()
    - block: Allow bdi re-registration
    - bdi: Fix use-after-free in wb_congested_put()
    - block: Make del_gendisk() safer for disks without queues
    - block: Fix bdi assignment to bdev inode when racing with disk delete
    - bdi: Mark congested->bdi as internal
    - bdi: Make wb->bdi a proper reference
    - bdi: Unify bdi->wb_list handling for root wb_writeback
    - bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()
    - bdi: Do not wait for cgwbs release in bdi_unregister()
    - bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister()
    - block: Fix oops in locked_inode_to_wb_and_lock_list()
    - kobject: Export kobject_get_unless_zero()
    - block: Fix oops scsi_disk_get()

  * Touchpad not working correctly after kernel upgrade (LP: #1662589)
    - Input: ALPS - fix V8+ protocol handling (73 03 28)

  * Xenial update to v4.4.62 stable release (LP: #1683728)
    - drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3
    - drm/i915: Stop using RP_DOWN_EI on Baytrail
    - usb: dwc3: gadget: delay unmap of bounced requests
    - mtd: bcm47xxpart: fix parsing first block after aligned TRX
    - MIPS: Introduce irq_stack
    - MIPS: Stack unwinding while on IRQ stack
    - MIPS: Only change $28 to thread_info if coming from user mode
    - MIPS: Switch to the irq_stack in interrupts
    - MIPS: Select HAVE_IRQ_EXIT_ON_IRQ_STACK
    - MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch
    - crypto: caam - fix RNG deinstantiation error checking
    - Linux 4.4.62

  * ifup service of network device stay active after driver stop (LP: #1672144)
    - net: use net->count to check whether a netns is alive or not

  * [Hyper-V] mkfs regression in kernel 4.4+ (LP: #1682215)
    - block: relax check on sg gap

  * [Feature] KBL: intel_powerclamp driver support (LP: #1591641)
    - thermal/powerclamp: remove cpu whitelist
    - thermal/powerclamp: correct cpu support check
    - thermal/powerclamp: add back module device table

  * sysfs channel reads of lps22hb pressure sensor are stale (LP: #1682103)
    - iio: st_pressure: initialize lps22hb bootime

  * Backlight control does no...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.