msg_zerocopy.sh in net from ubuntu_kernel_selftests failed

Bug #1812620 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Colin Ian King
linux (Ubuntu)
Fix Released
Medium
Unassigned
Disco
Won't Fix
Undecided
Unassigned
Eoan
Won't Fix
Undecided
Unassigned
Focal
Fix Released
Medium
Colin Ian King
linux-hwe (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Fix Released
Low
Unassigned

Bug Description

== SRU Justification [ FOCAL ] ==

The msg_zerocopy.sh kernel self test will fail on machines that don't have 2 or 3 CPUs such as 1 CPU cloud instances since the C test program tries to set CPU affinity to CPUs 2 and 3 and bails out if it fails.

== Fix ==

Upstream linux-next commit

commit 16f6458f2478b55e2b628797bc81a4455045c74e
Author: Willem de Bruijn <email address hidden>
Date: Wed Aug 5 04:40:45 2020 -0400

    selftests/net: relax cpu affinity requirement in msg_zerocopy test

The fix now just emits a warning that CPU affinity can't be set rather than cause an exit(1) termination.

== Test cast ==

Run the msg_zerocopy.sh test from the kernel net selftest on a 1 CPU system. Without the fix the test fails. With the fix it runs successfully as expected.

== Regression Potential ==

The original test pinned the CPUs for a benchmarking metric, for our testing we are using this to test to see if the operations in the test work successfully. There is a potential that users using this test will not notice the warning if they are using this test as a benchmark on a 1 CPU system and may get more jittery timing in their benchmarks rather than a test failing and complaining they are not running it on a suitable multi-CPU system. However, the likelyhood of a user using this test on a single CPU system for benchmarking is small and as it stands the test will now run and produce potentially jittery benchmarks on a 1 CPU system compared to previously where it never ran.

--------------------

This test will return 1

$ sudo ./msg_zerocopy.sh
ipv4 tcp -t 1
./msg_zerocopy: setaffinity 2
./msg_zerocopy: setaffinity 3
$ echo $?
1

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: linux-image-4.18.0-13-generic 4.18.0-13.14
ProcVersionSignature: User Name 4.18.0-13.14-generic 4.18.17
Uname: Linux 4.18.0-13-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jan 21 07:41 seq
 crw-rw---- 1 root audio 116, 33 Jan 21 07:41 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.10-0ubuntu13.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Mon Jan 21 07:50:33 2019
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
PciMultimedia:

ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.18.0-13-generic root=UUID=2a4b0342-a2dd-4feb-b3e2-9644ca1c4a60 ro console=ttyS0,115200n8
RelatedPackageVersions:
 linux-restricted-modules-4.18.0-13-generic N/A
 linux-backports-modules-4.18.0-13-generic N/A
 linux-firmware 1.175.1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-xenial
dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-xenial:cvnQEMU:ct1:cvrpc-i440fx-xenial:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-xenial
dmi.sys.vendor: QEMU

CVE References

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: msg_zerocopy in net from ubuntu_kernel_selftests failed on C

Found in 4.18 Bionic AWS as well.

Po-Hsu Lin (cypressyew)
tags: added: linux-kvm sru-20190603 ubuntu-kernel-selftests
Po-Hsu Lin (cypressyew)
summary: - msg_zerocopy in net from ubuntu_kernel_selftests failed on C
+ msg_zerocopy in net from ubuntu_kernel_selftests failed
tags: added: sru-20190701
tags: added: gke
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: msg_zerocopy in net from ubuntu_kernel_selftests failed

Didn't see this on 5.0 B-AWS

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Saw this on 5.0 D-AWS (at lease on c4.large, m4.large)

Sean Feole (sfeole)
tags: added: sru-20190902
tags: added: disco
removed: cosmic
tags: added: aws
Sean Feole (sfeole)
tags: added: sru-20190930
Sean Feole (sfeole)
tags: added: gcp
Po-Hsu Lin (cypressyew)
no longer affects: linux-azure (Ubuntu Cosmic)
no longer affects: linux-aws (Ubuntu Cosmic)
no longer affects: linux (Ubuntu Cosmic)
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

For this issue on D-Azure 5.0
5.0.0-1023.24

It's not failing with all the instances.

Failed:
  * Standard_B1ms
  * Standard_F2s_v2
  * Standard_E2s_v3
  * Standard_D2s_v3

Passed:
  * Standard_L8s_v2
  * Standard_L4s
  * Standard_GS2
  * Standard_F32s_v2
  * Standard_D16s_v3

Test output on an instance (Standard_B1ms) that failed with this test:
 selftests: net: msg_zerocopy.sh
 ========================================
 ipv4 tcp -t 1
 ./msg_zerocopy: setaffinity 2
 ./msg_zerocopy: setaffinity 3
 not ok 1..20 selftests: net: msg_zerocopy.sh [FAIL]

Test output on an instance (Standard_L8s_v2) that passed with this test:
 selftests: net: msg_zerocopy.sh
 ========================================
 ipv4 tcp -t 1
 tx=57024 (3558 MB) txc=0 zc=n
 rx=28513 (3558 MB)
 ipv4 tcp -z -t 1
 tx=49763 (3105 MB) txc=49763 zc=n
 rx=24883 (3105 MB)
 ok
 ipv6 tcp -t 1
 tx=59548 (3716 MB) txc=0 zc=n
 rx=29775 (3716 MB)
 ipv6 tcp -z -t 1
 tx=49549 (3092 MB) txc=49549 zc=n
 rx=24776 (3092 MB)
 ok
 ipv4 udp -t 1
 tx=51972 (3243 MB) txc=0 zc=n
 rx=51963 (3242 MB)
 ipv4 udp -z -t 1
 tx=41616 (2596 MB) txc=41616 zc=n
 rx=41615 (2596 MB)
 ok
 ipv6 udp -t 1
 tx=53761 (3354 MB) txc=0 zc=n
 rx=53736 (3353 MB)
 ipv6 udp -z -t 1
 tx=41872 (2612 MB) txc=41872 zc=n
 rx=41872 (2612 MB)
 ok
 OK. All tests passed
 ok 1..20 selftests: net: msg_zerocopy.sh [PASS]

Changed in linux-azure (Ubuntu Disco):
status: New → Confirmed
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

For D-AWS , it's not failing on all nodes.
Take ARM64 instances for example, this has passed on a1.2xlarge but not a1.large / a1.medium

tags: added: sru-20191111
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Found on E-AWS 5.3.0-1009.10-aws

Test passed with ARM64 instances a1.2xlarge but failed on the other 2, this issue can be found on AMD64 instances as well.

tags: added: sru-20191202
Po-Hsu Lin (cypressyew)
summary: - msg_zerocopy in net from ubuntu_kernel_selftests failed
+ msg_zerocopy.sh in net from ubuntu_kernel_selftests failed
Po-Hsu Lin (cypressyew)
tags: added: sru-20200316
Steve Langasek (vorlon)
Changed in linux-aws (Ubuntu Disco):
status: New → Won't Fix
Changed in linux-azure (Ubuntu Disco):
status: Confirmed → Won't Fix
Steve Langasek (vorlon)
Changed in linux (Ubuntu Disco):
status: New → Won't Fix
Po-Hsu Lin (cypressyew)
tags: added: oracle sru-20200629
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

With Oracle 5.4.0-1021
This is failing on VM.Standard2.1 but passed with VM.Standard2.16

tags: added: 5.4
Revision history for this message
Sean Feole (sfeole) wrote :

Focal 5.4(SRU 2020/06/29)

3367. 07/25 02:26:31 DEBUG| utils:0153| [stdout] # selftests: net: msg_zerocopy.sh
3368. 07/25 02:26:31 DEBUG| utils:0153| [stdout] # ipv4 tcp -t 1
3369. 07/25 02:26:31 DEBUG| utils:0153| [stdout] # ./msg_zerocopy: setaffinity 2
3370. 07/25 02:26:31 DEBUG| utils:0153| [stdout] # ./msg_zerocopy: setaffinity 3
3371. 07/25 02:26:31 DEBUG| utils:0153| [stdout] not ok 21 selftests: net: msg_zerocopy.sh # exit=1

tags: added: 5.3
Sean Feole (sfeole)
Changed in ubuntu-kernel-tests:
status: New → Confirmed
Revision history for this message
Colin Ian King (colin-king) wrote :

Only occurs on systems that cannot set affinity to CPU 2 and CPU 3 because the test tries to do this and fails. Tut tut, bad test.

Fix submitted and re-worked, improved fix sent upstream: https://www.spinics.net/lists/netdev/msg674973.html

Changed in ubuntu-kernel-tests:
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :
description: updated
Changed in linux (Ubuntu Eoan):
status: New → Won't Fix
Changed in linux-aws (Ubuntu Eoan):
status: New → Won't Fix
Changed in linux-azure (Ubuntu Eoan):
status: New → Won't Fix
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I would like to note that although Eoan and Disco has already reached their end-of-life, we still have 5.3 / 5.0 kernel for clouds.

Next is to check if this can be reproduced on those.

no longer affects: linux-aws (Ubuntu)
Changed in linux (Ubuntu Focal):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Colin Ian King (colin-king) wrote :

Ran the tests with -proposed kernel on a 1 CPU system, tests now pass:

17:52:02 DEBUG| [stdout] # OK. All tests passed
17:52:02 DEBUG| [stdout] ok 21 selftests: net: msg_zerocopy.sh

tags: added: verification-done-focal
removed: verification-needed-focal
Sean Feole (sfeole)
Changed in ubuntu-kernel-tests:
status: Confirmed → Triaged
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

The net test was skipped on all of the instances with B-gke-5.3.0-1033.35

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.8.0-16.17

---------------
linux (5.8.0-16.17) groovy; urgency=medium

  * groovy/linux: 5.8.0-16.17 -proposed tracker (LP: #1891233)

  * Miscellaneous Ubuntu changes
    - hio -- Update to use bio_{start,end}_io_acct with 5.8+
    - Enable hio driver
    - [Packaging] Temporarily disable building doc package contents

linux (5.8.0-15.16) groovy; urgency=medium

  * groovy/linux: 5.8.0-15.16 -proposed tracker (LP: #1891177)

  * Miscellaneous Ubuntu changes
    - SAUCE: Documentation: import error c_funcptr_sig_re, c_sig_re (sphinx-
      doc/sphinx@0f49e30c)

linux (5.8.0-14.15) groovy; urgency=medium

  * groovy/linux: 5.8.0-14.15 -proposed tracker (LP: #1891085)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test

  * Fix missing HDMI/DP Audio on an HP Desktop (LP: #1890441)
    - ALSA: hda/hdmi: Add quirk to force connectivity

  * Add initial audio support for Lenovo ThinkStation P620 (LP: #1890317)
    - ALSA: usb-audio: Add support for Lenovo ThinkStation P620

  * Fix IOMMU error on AMD Radeon Pro W5700 (LP: #1890306)
    - PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken

  * Enlarge hisi_sec2 capability (LP: #1890222)
    - crypto: hisilicon - update SEC driver module parameter

  * Miscellaneous Ubuntu changes
    - [Config] Re-enable signing for ppc64el

 -- Seth Forshee <email address hidden> Tue, 11 Aug 2020 15:32:58 -0500

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Po-Hsu Lin (cypressyew)
no longer affects: linux-azure (Ubuntu)
Po-Hsu Lin (cypressyew)
no longer affects: linux-aws (Ubuntu Focal)
no longer affects: linux-aws (Ubuntu Eoan)
no longer affects: linux-aws (Ubuntu Disco)
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
no longer affects: linux-azure (Ubuntu Focal)
Changed in ubuntu-kernel-tests:
status: Triaged → Fix Released
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Just noticed that this issue still exists in B/hwe (5.3)

Changed in ubuntu-kernel-tests:
status: Fix Released → Confirmed
Po-Hsu Lin (cypressyew)
no longer affects: linux-azure (Ubuntu Disco)
no longer affects: linux-azure (Ubuntu Eoan)
no longer affects: linux (Ubuntu Bionic)
Changed in linux (Ubuntu Focal):
assignee: nobody → Colin Ian King (colin-king)
Changed in linux-hwe (Ubuntu):
status: New → Invalid
tags: added: sru-20200810
Stefan Bader (smb)
Changed in linux-hwe (Ubuntu Bionic):
importance: Undecided → Low
status: New → Fix Committed
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Stefan Bader (smb)
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (97.9 KiB)

This bug was fixed in the package linux - 5.4.0-45.49

---------------
linux (5.4.0-45.49) focal; urgency=medium

  * focal/linux: 5.4.0-45.49 -proposed tracker (LP: #1893050)

  * [Potential Regression] dscr_inherit_exec_test from powerpc in
    ubuntu_kernel_selftests failed on B/E/F (LP: #1888332)
    - powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

linux (5.4.0-44.48) focal; urgency=medium

  * focal/linux: 5.4.0-44.48 -proposed tracker (LP: #1891049)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

linux (5.4.0-43.47) focal; urgency=medium

  * focal/linux: 5.4.0-43.47 -proposed tracker (LP: #1890746)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Devlink - add RoCE disable kernel support (LP: #1877270)
    - devlink: Add new "enable_roce" generic device param
    - net/mlx5: Document flow_steering_mode devlink param
    - net/mlx5: Handle "enable_roce" devlink param
    - IB/mlx5: Rename profile and init methods
    - IB/mlx5: Load profile according to RoCE enablement state
    - net/mlx5: Remove unneeded variable in mlx5_unload_one
    - net/mlx5: Add devlink reload
    - IB/mlx5: Do reverse sequence during device removal

  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test

  * Enlarge hisi_sec2 capability (LP: #1890222)
    - Revert "UBUNTU: [Config] Disable hisi_sec2 temporarily"
    - crypto: hisilicon - update SEC driver module parameter

  * Fix missing HDMI/DP Audio on an HP Desktop (LP: #1890441)
    - ALSA: hda/hdmi: Add quirk to force connectivity

  * Fix IOMMU error on AMD Radeon Pro W5700 (LP: #1890306)
    - PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken

  * ASoC:amd:renoir: the dmic can't record sound after suspend and resume
    (LP: #1890220)
    - SAUCE: ASoC: amd: renoir: restore two more registers during resume

  * No sound, Dummy output on Acer Swift 3 SF314-57G with Ice Lake core-i7 CPU
    (LP: #1877757)
    - ASoC: SOF: Intel: hda: fix generic hda codec support

  * Fix right speaker of HP laptop (LP: #1889375)
    - SAUCE: hda/realtek: Fix right speaker of HP laptop

  * blk_update_request error when mount nvme partition (LP: #1872383)
    - SAUCE: nvme-pci: prevent SK hynix PC400 from using Write Zeroes command

  * soc/amd/renoir: detect dmic from acpi table (LP: #1887734)
    - ASoC: amd: add logic to check dmic hardware runtime
    - ASoC: amd: add ACPI dependency check
    - ASoC: amd: fixed kernel warnings

  * soc/amd/renoir: change the module name to make it work with ucm3
    (LP: #1888166)
    - AsoC: amd: add missing snd- module prefix to the acp3x-rn driver kernel
      module
    - SAUCE: remove a kernel module since its name is changed

  * Focal update: v5.4.55 upstream stable release (LP: #1890343)
    - AX.25: Fix out-of-bounds read in ax25_connect()
    - AX.25: Prevent out-of-bounds read in ax25_sendmsg()
    - dev: Defer free of skbs in flush_backlog
    - drivers/net/wan/x25_asy: Fix to make i...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Passed with F-kvm 5.4.0-1022.22

Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
status: Confirmed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-hwe - 5.3.0-68.63

---------------
linux-hwe (5.3.0-68.63) bionic; urgency=medium

  * CVE-2020-16119
    - SAUCE: dccp: avoid double free of ccid on child socket

  * CVE-2020-16120
    - Revert "UBUNTU: SAUCE: overlayfs: ensure mounter privileges when reading
      directories"
    - ovl: pass correct flags for opening real directory
    - ovl: switch to mounter creds in readdir
    - ovl: verify permissions in ovl_path_open()
    - ovl: call secutiry hook in ovl_real_ioctl()
    - ovl: check permission to open real file

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
    - [Packaging] hwe: Update nvidia driver versions

  * Introduce the new NVIDIA 418-server and 440-server series, and update the
    current NVIDIA drivers (LP: #1881137)
    - [Packaging] hwe: Add build support for nvidia-server drivers

  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test

  * Fix false-negative return value for rtnetlink.sh in kselftests/net
    (LP: #1890136)
    - selftests: rtnetlink: correct the final return value for the test
    - selftests: rtnetlink: make kci_test_encap() return sub-test result

 -- Thadeu Lima de Souza Cascardo <email address hidden> Mon, 28 Sep 2020 08:30:12 -0300

Changed in linux-hwe (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers