msg_zerocopy.sh in net from ubuntu_kernel_selftests failed

Bug #1812620 reported by Po-Hsu Lin on 2019-01-21
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Undecided
Colin Ian King
linux (Ubuntu)
Medium
Unassigned
Disco
Undecided
Unassigned
Eoan
Undecided
Unassigned
Focal
Medium
Colin Ian King
linux-hwe (Ubuntu)
Undecided
Unassigned
Bionic
Low
Unassigned

Bug Description

== SRU Justification [ FOCAL ] ==

The msg_zerocopy.sh kernel self test will fail on machines that don't have 2 or 3 CPUs such as 1 CPU cloud instances since the C test program tries to set CPU affinity to CPUs 2 and 3 and bails out if it fails.

== Fix ==

Upstream linux-next commit

commit 16f6458f2478b55e2b628797bc81a4455045c74e
Author: Willem de Bruijn <email address hidden>
Date: Wed Aug 5 04:40:45 2020 -0400

    selftests/net: relax cpu affinity requirement in msg_zerocopy test

The fix now just emits a warning that CPU affinity can't be set rather than cause an exit(1) termination.

== Test cast ==

Run the msg_zerocopy.sh test from the kernel net selftest on a 1 CPU system. Without the fix the test fails. With the fix it runs successfully as expected.

== Regression Potential ==

The original test pinned the CPUs for a benchmarking metric, for our testing we are using this to test to see if the operations in the test work successfully. There is a potential that users using this test will not notice the warning if they are using this test as a benchmark on a 1 CPU system and may get more jittery timing in their benchmarks rather than a test failing and complaining they are not running it on a suitable multi-CPU system. However, the likelyhood of a user using this test on a single CPU system for benchmarking is small and as it stands the test will now run and produce potentially jittery benchmarks on a 1 CPU system compared to previously where it never ran.

--------------------

This test will return 1

$ sudo ./msg_zerocopy.sh
ipv4 tcp -t 1
./msg_zerocopy: setaffinity 2
./msg_zerocopy: setaffinity 3
$ echo $?
1

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: linux-image-4.18.0-13-generic 4.18.0-13.14
ProcVersionSignature: User Name 4.18.0-13.14-generic 4.18.17
Uname: Linux 4.18.0-13-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jan 21 07:41 seq
 crw-rw---- 1 root audio 116, 33 Jan 21 07:41 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.10-0ubuntu13.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Mon Jan 21 07:50:33 2019
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
PciMultimedia:

ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.18.0-13-generic root=UUID=2a4b0342-a2dd-4feb-b3e2-9644ca1c4a60 ro console=ttyS0,115200n8
RelatedPackageVersions:
 linux-restricted-modules-4.18.0-13-generic N/A
 linux-backports-modules-4.18.0-13-generic N/A
 linux-firmware 1.175.1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-xenial
dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-xenial:cvnQEMU:ct1:cvrpc-i440fx-xenial:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-xenial
dmi.sys.vendor: QEMU

CVE References

Po-Hsu Lin (cypressyew) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Po-Hsu Lin (cypressyew) on 2019-06-24
tags: added: linux-kvm sru-20190603 ubuntu-kernel-selftests
Po-Hsu Lin (cypressyew) on 2019-07-23
summary: - msg_zerocopy in net from ubuntu_kernel_selftests failed on C
+ msg_zerocopy in net from ubuntu_kernel_selftests failed
tags: added: sru-20190701
tags: added: gke
Po-Hsu Lin (cypressyew) wrote :

Saw this on 5.0 D-AWS (at lease on c4.large, m4.large)

Sean Feole (sfeole) on 2019-10-03
tags: added: sru-20190902
tags: added: disco
removed: cosmic
tags: added: aws
Sean Feole (sfeole) on 2019-10-09
tags: added: sru-20190930
Sean Feole (sfeole) on 2019-10-09
tags: added: gcp
Po-Hsu Lin (cypressyew) on 2019-10-14
no longer affects: linux-azure (Ubuntu Cosmic)
no longer affects: linux-aws (Ubuntu Cosmic)
no longer affects: linux (Ubuntu Cosmic)
Po-Hsu Lin (cypressyew) wrote :

For this issue on D-Azure 5.0
5.0.0-1023.24

It's not failing with all the instances.

Failed:
  * Standard_B1ms
  * Standard_F2s_v2
  * Standard_E2s_v3
  * Standard_D2s_v3

Passed:
  * Standard_L8s_v2
  * Standard_L4s
  * Standard_GS2
  * Standard_F32s_v2
  * Standard_D16s_v3

Test output on an instance (Standard_B1ms) that failed with this test:
 selftests: net: msg_zerocopy.sh
 ========================================
 ipv4 tcp -t 1
 ./msg_zerocopy: setaffinity 2
 ./msg_zerocopy: setaffinity 3
 not ok 1..20 selftests: net: msg_zerocopy.sh [FAIL]

Test output on an instance (Standard_L8s_v2) that passed with this test:
 selftests: net: msg_zerocopy.sh
 ========================================
 ipv4 tcp -t 1
 tx=57024 (3558 MB) txc=0 zc=n
 rx=28513 (3558 MB)
 ipv4 tcp -z -t 1
 tx=49763 (3105 MB) txc=49763 zc=n
 rx=24883 (3105 MB)
 ok
 ipv6 tcp -t 1
 tx=59548 (3716 MB) txc=0 zc=n
 rx=29775 (3716 MB)
 ipv6 tcp -z -t 1
 tx=49549 (3092 MB) txc=49549 zc=n
 rx=24776 (3092 MB)
 ok
 ipv4 udp -t 1
 tx=51972 (3243 MB) txc=0 zc=n
 rx=51963 (3242 MB)
 ipv4 udp -z -t 1
 tx=41616 (2596 MB) txc=41616 zc=n
 rx=41615 (2596 MB)
 ok
 ipv6 udp -t 1
 tx=53761 (3354 MB) txc=0 zc=n
 rx=53736 (3353 MB)
 ipv6 udp -z -t 1
 tx=41872 (2612 MB) txc=41872 zc=n
 rx=41872 (2612 MB)
 ok
 OK. All tests passed
 ok 1..20 selftests: net: msg_zerocopy.sh [PASS]

Changed in linux-azure (Ubuntu Disco):
status: New → Confirmed
Po-Hsu Lin (cypressyew) wrote :

For D-AWS , it's not failing on all nodes.
Take ARM64 instances for example, this has passed on a1.2xlarge but not a1.large / a1.medium

tags: added: sru-20191111
Po-Hsu Lin (cypressyew) wrote :

Found on E-AWS 5.3.0-1009.10-aws

Test passed with ARM64 instances a1.2xlarge but failed on the other 2, this issue can be found on AMD64 instances as well.

tags: added: sru-20191202
Po-Hsu Lin (cypressyew) on 2020-03-16
summary: - msg_zerocopy in net from ubuntu_kernel_selftests failed
+ msg_zerocopy.sh in net from ubuntu_kernel_selftests failed
Po-Hsu Lin (cypressyew) on 2020-04-01
tags: added: sru-20200316
Steve Langasek (vorlon) on 2020-07-02
Changed in linux-aws (Ubuntu Disco):
status: New → Won't Fix
Changed in linux-azure (Ubuntu Disco):
status: Confirmed → Won't Fix
Steve Langasek (vorlon) on 2020-07-02
Changed in linux (Ubuntu Disco):
status: New → Won't Fix
Po-Hsu Lin (cypressyew) on 2020-07-15
tags: added: oracle sru-20200629
Po-Hsu Lin (cypressyew) wrote :

With Oracle 5.4.0-1021
This is failing on VM.Standard2.1 but passed with VM.Standard2.16

tags: added: 5.4
Sean Feole (sfeole) wrote :

Focal 5.4(SRU 2020/06/29)

3367. 07/25 02:26:31 DEBUG| utils:0153| [stdout] # selftests: net: msg_zerocopy.sh
3368. 07/25 02:26:31 DEBUG| utils:0153| [stdout] # ipv4 tcp -t 1
3369. 07/25 02:26:31 DEBUG| utils:0153| [stdout] # ./msg_zerocopy: setaffinity 2
3370. 07/25 02:26:31 DEBUG| utils:0153| [stdout] # ./msg_zerocopy: setaffinity 3
3371. 07/25 02:26:31 DEBUG| utils:0153| [stdout] not ok 21 selftests: net: msg_zerocopy.sh # exit=1

tags: added: 5.3
Sean Feole (sfeole) on 2020-07-27
Changed in ubuntu-kernel-tests:
status: New → Confirmed
Colin Ian King (colin-king) wrote :

Only occurs on systems that cannot set affinity to CPU 2 and CPU 3 because the test tries to do this and fails. Tut tut, bad test.

Fix submitted and re-worked, improved fix sent upstream: https://www.spinics.net/lists/netdev/msg674973.html

Changed in ubuntu-kernel-tests:
assignee: nobody → Colin Ian King (colin-king)
Colin Ian King (colin-king) wrote :
description: updated
Changed in linux (Ubuntu Eoan):
status: New → Won't Fix
Changed in linux-aws (Ubuntu Eoan):
status: New → Won't Fix
Changed in linux-azure (Ubuntu Eoan):
status: New → Won't Fix
Po-Hsu Lin (cypressyew) wrote :

I would like to note that although Eoan and Disco has already reached their end-of-life, we still have 5.3 / 5.0 kernel for clouds.

Next is to check if this can be reproduced on those.

no longer affects: linux-aws (Ubuntu)
Changed in linux (Ubuntu Focal):
status: New → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Colin Ian King (colin-king) wrote :

Ran the tests with -proposed kernel on a 1 CPU system, tests now pass:

17:52:02 DEBUG| [stdout] # OK. All tests passed
17:52:02 DEBUG| [stdout] ok 21 selftests: net: msg_zerocopy.sh

tags: added: verification-done-focal
removed: verification-needed-focal
Sean Feole (sfeole) on 2020-08-10
Changed in ubuntu-kernel-tests:
status: Confirmed → Triaged
Po-Hsu Lin (cypressyew) wrote :

The net test was skipped on all of the instances with B-gke-5.3.0-1033.35

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.8.0-16.17

---------------
linux (5.8.0-16.17) groovy; urgency=medium

  * groovy/linux: 5.8.0-16.17 -proposed tracker (LP: #1891233)

  * Miscellaneous Ubuntu changes
    - hio -- Update to use bio_{start,end}_io_acct with 5.8+
    - Enable hio driver
    - [Packaging] Temporarily disable building doc package contents

linux (5.8.0-15.16) groovy; urgency=medium

  * groovy/linux: 5.8.0-15.16 -proposed tracker (LP: #1891177)

  * Miscellaneous Ubuntu changes
    - SAUCE: Documentation: import error c_funcptr_sig_re, c_sig_re (sphinx-
      doc/sphinx@0f49e30c)

linux (5.8.0-14.15) groovy; urgency=medium

  * groovy/linux: 5.8.0-14.15 -proposed tracker (LP: #1891085)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test

  * Fix missing HDMI/DP Audio on an HP Desktop (LP: #1890441)
    - ALSA: hda/hdmi: Add quirk to force connectivity

  * Add initial audio support for Lenovo ThinkStation P620 (LP: #1890317)
    - ALSA: usb-audio: Add support for Lenovo ThinkStation P620

  * Fix IOMMU error on AMD Radeon Pro W5700 (LP: #1890306)
    - PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken

  * Enlarge hisi_sec2 capability (LP: #1890222)
    - crypto: hisilicon - update SEC driver module parameter

  * Miscellaneous Ubuntu changes
    - [Config] Re-enable signing for ppc64el

 -- Seth Forshee <email address hidden> Tue, 11 Aug 2020 15:32:58 -0500

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Po-Hsu Lin (cypressyew) on 2020-08-27
no longer affects: linux-azure (Ubuntu)
Po-Hsu Lin (cypressyew) on 2020-08-27
no longer affects: linux-aws (Ubuntu Focal)
no longer affects: linux-aws (Ubuntu Eoan)
no longer affects: linux-aws (Ubuntu Disco)
Po-Hsu Lin (cypressyew) wrote :
no longer affects: linux-azure (Ubuntu Focal)
Changed in ubuntu-kernel-tests:
status: Triaged → Fix Released
Po-Hsu Lin (cypressyew) wrote :

Just noticed that this issue still exists in B/hwe (5.3)

Changed in ubuntu-kernel-tests:
status: Fix Released → Confirmed
Po-Hsu Lin (cypressyew) on 2020-08-31
no longer affects: linux-azure (Ubuntu Disco)
no longer affects: linux-azure (Ubuntu Eoan)
no longer affects: linux (Ubuntu Bionic)
Changed in linux (Ubuntu Focal):
assignee: nobody → Colin Ian King (colin-king)
Changed in linux-hwe (Ubuntu):
status: New → Invalid
tags: added: sru-20200810
Stefan Bader (smb) on 2020-08-31
Changed in linux-hwe (Ubuntu Bionic):
importance: Undecided → Low
status: New → Fix Committed
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Stefan Bader (smb) on 2020-08-31
Changed in linux (Ubuntu):
importance: Undecided → Medium
Launchpad Janitor (janitor) wrote :
Download full text (97.9 KiB)

This bug was fixed in the package linux - 5.4.0-45.49

---------------
linux (5.4.0-45.49) focal; urgency=medium

  * focal/linux: 5.4.0-45.49 -proposed tracker (LP: #1893050)

  * [Potential Regression] dscr_inherit_exec_test from powerpc in
    ubuntu_kernel_selftests failed on B/E/F (LP: #1888332)
    - powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

linux (5.4.0-44.48) focal; urgency=medium

  * focal/linux: 5.4.0-44.48 -proposed tracker (LP: #1891049)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

linux (5.4.0-43.47) focal; urgency=medium

  * focal/linux: 5.4.0-43.47 -proposed tracker (LP: #1890746)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Devlink - add RoCE disable kernel support (LP: #1877270)
    - devlink: Add new "enable_roce" generic device param
    - net/mlx5: Document flow_steering_mode devlink param
    - net/mlx5: Handle "enable_roce" devlink param
    - IB/mlx5: Rename profile and init methods
    - IB/mlx5: Load profile according to RoCE enablement state
    - net/mlx5: Remove unneeded variable in mlx5_unload_one
    - net/mlx5: Add devlink reload
    - IB/mlx5: Do reverse sequence during device removal

  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test

  * Enlarge hisi_sec2 capability (LP: #1890222)
    - Revert "UBUNTU: [Config] Disable hisi_sec2 temporarily"
    - crypto: hisilicon - update SEC driver module parameter

  * Fix missing HDMI/DP Audio on an HP Desktop (LP: #1890441)
    - ALSA: hda/hdmi: Add quirk to force connectivity

  * Fix IOMMU error on AMD Radeon Pro W5700 (LP: #1890306)
    - PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken

  * ASoC:amd:renoir: the dmic can't record sound after suspend and resume
    (LP: #1890220)
    - SAUCE: ASoC: amd: renoir: restore two more registers during resume

  * No sound, Dummy output on Acer Swift 3 SF314-57G with Ice Lake core-i7 CPU
    (LP: #1877757)
    - ASoC: SOF: Intel: hda: fix generic hda codec support

  * Fix right speaker of HP laptop (LP: #1889375)
    - SAUCE: hda/realtek: Fix right speaker of HP laptop

  * blk_update_request error when mount nvme partition (LP: #1872383)
    - SAUCE: nvme-pci: prevent SK hynix PC400 from using Write Zeroes command

  * soc/amd/renoir: detect dmic from acpi table (LP: #1887734)
    - ASoC: amd: add logic to check dmic hardware runtime
    - ASoC: amd: add ACPI dependency check
    - ASoC: amd: fixed kernel warnings

  * soc/amd/renoir: change the module name to make it work with ucm3
    (LP: #1888166)
    - AsoC: amd: add missing snd- module prefix to the acp3x-rn driver kernel
      module
    - SAUCE: remove a kernel module since its name is changed

  * Focal update: v5.4.55 upstream stable release (LP: #1890343)
    - AX.25: Fix out-of-bounds read in ax25_connect()
    - AX.25: Prevent out-of-bounds read in ax25_sendmsg()
    - dev: Defer free of skbs in flush_backlog
    - drivers/net/wan/x25_asy: Fix to make i...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Po-Hsu Lin (cypressyew) wrote :

Passed with F-kvm 5.4.0-1022.22

Po-Hsu Lin (cypressyew) on 2020-09-21
Changed in ubuntu-kernel-tests:
status: Confirmed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-hwe - 5.3.0-68.63

---------------
linux-hwe (5.3.0-68.63) bionic; urgency=medium

  * CVE-2020-16119
    - SAUCE: dccp: avoid double free of ccid on child socket

  * CVE-2020-16120
    - Revert "UBUNTU: SAUCE: overlayfs: ensure mounter privileges when reading
      directories"
    - ovl: pass correct flags for opening real directory
    - ovl: switch to mounter creds in readdir
    - ovl: verify permissions in ovl_path_open()
    - ovl: call secutiry hook in ovl_real_ioctl()
    - ovl: check permission to open real file

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
    - [Packaging] hwe: Update nvidia driver versions

  * Introduce the new NVIDIA 418-server and 440-server series, and update the
    current NVIDIA drivers (LP: #1881137)
    - [Packaging] hwe: Add build support for nvidia-server drivers

  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test

  * Fix false-negative return value for rtnetlink.sh in kselftests/net
    (LP: #1890136)
    - selftests: rtnetlink: correct the final return value for the test
    - selftests: rtnetlink: make kci_test_encap() return sub-test result

 -- Thadeu Lima de Souza Cascardo <email address hidden> Mon, 28 Sep 2020 08:30:12 -0300

Changed in linux-hwe (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers