net selftest psock_fanout fails on xenial s390x due to incorrect queue lengths

Bug #1853375 reported by Connor Kuehl on 2019-11-20
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Undecided
Po-Hsu Lin
linux (Ubuntu)
Undecided
Unassigned
Xenial
Medium
Po-Hsu Lin
Bionic
Medium
Po-Hsu Lin

Bug Description

== SRU Justification ==
The psock_fanout test is not very stable, stress tests shows a high
fail rate here:
  * X - 44 failures out of 100, with 4.4 kernel on a KVM node
  * B - 33 failures out of 100, with 4.15 kernel on a KVM node

This is because of the unrelated background traffic which should be
filtered out.

Also, the error message, "ERROR: incorrect queue lengths", is a bit
misleading. You can see this even the test has passed, it's not the
real cause of the failure. This message has been changed to "warning: incorrect queue lengths" later.

== Fix ==
1. 2e7a7217 (selftests/net: cleanup unused parameter in psock_fanout)
2. cc30c93f (selftests/net: ignore background traffic in psock_fanout)

The first patch has already landed in Bionic, and it can be
cherry-picked into Xenial.

The second one needs to be backported to Xenial, but it can be applied
to Bionic with git am -C2.

== Test ==
Patch tested directly on 4.4 / 4.15 KVM nodes.

Stress tested 100 times with:
    i=0
    for idx in $(seq 1 100)
    do
        sudo ./psock_fanout &> /dev/null
        [ $? != 0 ] && i=$((i+1))
    done
    echo $i

All tests have passed without any failure.

== Regression Potential ==
Low, change limited to testing tools, not affecting actual kernel
functionality.

== Original Bug Report ==
s390x: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-xenial/xenial/s390x/l/linux/20191114_180123_738a0@/log.gz

Here's the relevant excerpt:

17:34:26 DEBUG| [stdout] --------------------
17:34:26 DEBUG| [stdout] running psock_fanout test
17:34:26 DEBUG| [stdout] --------------------
17:34:26 ERROR| [stderr] test: control single socket
17:34:27 ERROR| [stderr] test: control multiple sockets
17:34:27 ERROR| [stderr] test: datapath 0x0
17:34:27 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:27 ERROR| [stderr] info: count=20,0, expect=15,5
17:34:27 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:27 ERROR| [stderr] info: count=20,0, expect=20,5
17:34:27 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:27 ERROR| [stderr] info: trying alternate ports (4)
17:34:27 ERROR| [stderr] test: datapath 0x0
17:34:27 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:27 ERROR| [stderr] info: count=20,0, expect=15,5
17:34:27 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:27 ERROR| [stderr] info: count=20,0, expect=20,5
17:34:27 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:27 ERROR| [stderr] info: trying alternate ports (3)
17:34:27 ERROR| [stderr] test: datapath 0x0
17:34:27 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:27 ERROR| [stderr] info: count=20,0, expect=15,5
17:34:27 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:27 ERROR| [stderr] info: count=20,0, expect=20,5
17:34:27 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:27 ERROR| [stderr] info: trying alternate ports (2)
17:34:27 ERROR| [stderr] test: datapath 0x0
17:34:27 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:27 ERROR| [stderr] info: count=20,0, expect=15,5
17:34:27 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:27 ERROR| [stderr] info: count=20,0, expect=20,5
17:34:27 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:27 ERROR| [stderr] info: trying alternate ports (1)
17:34:27 ERROR| [stderr] test: datapath 0x0
17:34:28 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:28 ERROR| [stderr] info: count=20,0, expect=15,5
17:34:28 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:28 ERROR| [stderr] info: count=20,0, expect=20,5
17:34:28 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:28 ERROR| [stderr] info: trying alternate ports (0)
17:34:28 ERROR| [stderr] test: datapath 0x0
17:34:28 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:28 ERROR| [stderr] info: count=20,0, expect=15,5
17:34:28 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:28 ERROR| [stderr] info: count=20,0, expect=20,5
17:34:28 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:28 ERROR| [stderr] test: datapath 0x1000
17:34:28 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:28 ERROR| [stderr] info: count=20,0, expect=15,5
17:34:28 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:28 ERROR| [stderr] info: count=20,0, expect=15,5
17:34:28 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:28 ERROR| [stderr] info: count=20,0, expect=20,5
17:34:28 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:28 ERROR| [stderr] info: trying alternate ports (0)
17:34:28 ERROR| [stderr] test: datapath 0x0
17:34:28 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:28 ERROR| [stderr] info: count=20,0, expect=15,5
17:34:28 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:28 ERROR| [stderr] info: count=20,0, expect=20,5
17:34:28 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:28 ERROR| [stderr] test: datapath 0x1000
17:34:28 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:28 ERROR| [stderr] info: count=20,0, expect=15,5
17:34:28 ERROR| [stderr] ERROR: incorrect queue lengths
17:34:28 ERROR| [stderr] info: count=20,15, expect=20,15
17:34:28 ERROR| [stderr] test: datapath 0x1
17:34:28 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:28 ERROR| [stderr] info: count=10,10, expect=10,10
17:34:28 ERROR| [stderr] info: count=18,17, expect=18,17
17:34:28 ERROR| [stderr] test: datapath 0x3
17:34:28 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:28 ERROR| [stderr] info: count=15,5, expect=15,5
17:34:28 ERROR| [stderr] info: count=20,15, expect=20,15
17:34:28 ERROR| [stderr] test: datapath 0x6
17:34:29 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:29 ERROR| [stderr] info: count=5,15, expect=15,5
17:34:29 ERROR| [stderr] info: count=20,15, expect=15,20
17:34:29 ERROR| [stderr] test: datapath 0x7
17:34:29 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:29 ERROR| [stderr] info: count=5,15, expect=15,5
17:34:29 ERROR| [stderr] info: count=20,15, expect=15,20
17:34:29 ERROR| [stderr] test: datapath 0x2
17:34:29 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:29 ERROR| [stderr] info: count=20,0, expect=20,0
17:34:29 ERROR| [stderr] info: count=20,0, expect=20,0
17:34:29 ERROR| [stderr] test: datapath 0x2
17:34:29 ERROR| [stderr] info: count=0,0, expect=0,0
17:34:29 ERROR| [stderr] info: count=0,20, expect=0,20
17:34:29 ERROR| [stderr] info: count=0,20, expect=0,20
17:34:29 DEBUG| [stdout] [FAIL]

Po-Hsu Lin (cypressyew) on 2020-01-10
tags: added: s390x ubuntu-kernel-selftests xenial
Po-Hsu Lin (cypressyew) wrote :

This issue can be found on Bionic s390x/amd64 as well:

Test report on s390x zVM (kernel02):
selftests: run_afpackettests
========================================
--------------------
running psock_fanout test
--------------------
test: control single socket
test: control multiple sockets
test: unique ids
test: datapath 0x0
info: count=0,0, expect=0,0
info: count=20,0, expect=15,5
ERROR: incorrect queue lengths
info: count=20,0, expect=20,5
ERROR: incorrect queue lengths
info: trying alternate ports (4)
test: datapath 0x0
info: count=0,0, expect=0,0
info: count=20,0, expect=15,5
ERROR: incorrect queue lengths
info: count=20,0, expect=20,5
ERROR: incorrect queue lengths
info: trying alternate ports (3)
test: datapath 0x0
info: count=0,0, expect=0,0
info: count=15,5, expect=15,5
info: count=20,5, expect=20,5
test: datapath 0x1000
info: count=0,0, expect=0,0
info: count=15,5, expect=15,5
info: count=20,15, expect=20,15
test: datapath 0x1
info: count=0,0, expect=0,0
info: count=10,10, expect=10,10
info: count=18,17, expect=18,17
test: datapath 0x3
info: count=0,0, expect=0,0
info: count=15,5, expect=15,5
info: count=20,15, expect=20,15
test: datapath 0x6
info: count=0,0, expect=0,0
info: count=5,15, expect=15,5
info: count=20,15, expect=15,20
test: datapath 0x7
info: count=0,0, expect=0,0
info: count=5,15, expect=15,5
info: count=20,15, expect=15,20
test: datapath 0x2
info: count=0,0, expect=0,0
info: count=20,0, expect=20,0
info: count=20,0, expect=20,0
test: datapath 0x2
info: count=0,0, expect=0,0
info: count=0,20, expect=0,20
info: count=0,20, expect=0,20
test: datapath 0x2000
info: count=0,0, expect=0,0
info: count=20,20, expect=20,20
info: count=20,20, expect=20,20

tags: added: bionic

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1853375

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: New → Incomplete
Po-Hsu Lin (cypressyew) wrote :

BTW,
for 4.15 kernel on a KVM node, although it's showing "ERROR: incorrect queue lengths" all the time, the test result is not very consistent, sometimes it passes and sometimes it failed (33 failures out for 100 attempts)

The upstream code has also changed this ERROR to warning, we can use that to reduce confusion.

Changed in ubuntu-kernel-tests:
assignee: nobody → Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu):
assignee: nobody → Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Po-Hsu Lin (cypressyew)
Po-Hsu Lin (cypressyew) on 2020-01-10
Changed in linux (Ubuntu Bionic):
status: Incomplete → In Progress
Po-Hsu Lin (cypressyew) wrote :

I can confirm that this issue can be found on Xenial KVM kernel, psock_fanout test fail rate 44 out of 100 tests.

Changed in linux (Ubuntu Xenial):
status: New → In Progress
assignee: nobody → Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
status: New → In Progress
Po-Hsu Lin (cypressyew) on 2020-01-13
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
assignee: Po-Hsu Lin (cypressyew) → nobody
Po-Hsu Lin (cypressyew) on 2020-01-16
description: updated
Po-Hsu Lin (cypressyew) on 2020-01-16
description: updated
description: updated
Stefan Bader (smb) on 2020-01-16
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Po-Hsu Lin (cypressyew) wrote :

Test passed with Bionic 4.15.0-87.87

Po-Hsu Lin (cypressyew) wrote :

For Xenial, somehow the script is not shipped with executable mode:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1861973

tags: added: verification-done-bionic
removed: verification-needed-bionic
Po-Hsu Lin (cypressyew) wrote :

The executable issue mentioned in comment #8 will be addressed with another patch.

The patch itself is working as expected (if you do a manual chmod it will work), therefore I will mark this as verification done.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Changed in ubuntu-kernel-tests:
status: In Progress → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (79.8 KiB)

This bug was fixed in the package linux - 4.15.0-88.88

---------------
linux (4.15.0-88.88) bionic; urgency=medium

  * bionic/linux: 4.15.0-88.88 -proposed tracker (LP: #1862824)

  * Segmentation fault (kernel oops) with memory-hotplug in
    ubuntu_kernel_selftests on Bionic kernel (LP: #1862312)
    - Revert "mm/memory_hotplug: fix online/offline_pages called w.o.
      mem_hotplug_lock"
    - mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

linux (4.15.0-87.87) bionic; urgency=medium

  * bionic/linux: 4.15.0-87.87 -proposed tracker (LP: #1861165)

  * Bionic update: upstream stable patchset 2020-01-22 (LP: #1860602)
    - scsi: lpfc: Fix discovery failures when target device connectivity bounces
    - scsi: mpt3sas: Fix clear pending bit in ioctl status
    - scsi: lpfc: Fix locking on mailbox command completion
    - Input: atmel_mxt_ts - disable IRQ across suspend
    - iommu/tegra-smmu: Fix page tables in > 4 GiB memory
    - scsi: target: compare full CHAP_A Algorithm strings
    - scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
    - scsi: csiostor: Don't enable IRQs too early
    - powerpc/pseries: Mark accumulate_stolen_time() as notrace
    - powerpc/pseries: Don't fail hash page table insert for bolted mapping
    - powerpc/tools: Don't quote $objdump in scripts
    - dma-debug: add a schedule point in debug_dma_dump_mappings()
    - clocksource/drivers/asm9260: Add a check for of_clk_get
    - powerpc/security/book3s64: Report L1TF status in sysfs
    - powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
    - ext4: update direct I/O read lock pattern for IOCB_NOWAIT
    - jbd2: Fix statistics for the number of logged blocks
    - scsi: tracing: Fix handling of TRANSFER LENGTH == 0 for READ(6) and WRITE(6)
    - scsi: lpfc: Fix duplicate unreg_rpi error in port offline flow
    - f2fs: fix to update dir's i_pino during cross_rename
    - clk: qcom: Allow constant ratio freq tables for rcg
    - irqchip/irq-bcm7038-l1: Enable parent IRQ if necessary
    - irqchip: ingenic: Error out if IRQ domain creation failed
    - fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
    - scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences
    - scsi: ufs: fix potential bug which ends in system hang
    - powerpc/pseries/cmm: Implement release() function for sysfs device
    - powerpc/security: Fix wrong message when RFI Flush is disable
    - scsi: atari_scsi: sun3_scsi: Set sg_tablesize to 1 instead of SG_NONE
    - clk: pxa: fix one of the pxa RTC clocks
    - bcache: at least try to shrink 1 node in bch_mca_scan()
    - HID: logitech-hidpp: Silence intermittent get_battery_capacity errors
    - libnvdimm/btt: fix variable 'rc' set but not used
    - HID: Improve Windows Precision Touchpad detection.
    - scsi: pm80xx: Fix for SATA device discovery
    - scsi: ufs: Fix error handing during hibern8 enter
    - scsi: scsi_debug: num_tgts must be >= 0
    - scsi: NCR5380: Add disconnect_mask module parameter
    - scsi: iscsi: Don't send data to unbound connection
    - scsi: target: iscsi: Wait for all commands to finish before freeing a
...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (10.7 KiB)

This bug was fixed in the package linux - 4.4.0-174.204

---------------
linux (4.4.0-174.204) xenial; urgency=medium

  * xenial/linux: 4.4.0-174.204 -proposed tracker (LP: #1861122)

  * Xenial update: 4.4.211 upstream stable release (LP: #1860681)
    - hidraw: Return EPOLLOUT from hidraw_poll
    - HID: hidraw: Fix returning EPOLLOUT from hidraw_poll
    - HID: hidraw, uhid: Always report EPOLLOUT
    - cfg80211/mac80211: make ieee80211_send_layer2_update a public function
    - mac80211: Do not send Layer 2 Update frame before authorization
    - media: usb:zr364xx:Fix KASAN:null-ptr-deref Read in zr364xx_vidioc_querycap
    - p54usb: Fix race between disconnect and firmware loading
    - ALSA: line6: Fix write on zero-sized buffer
    - ALSA: line6: Fix memory leak at line6_init_pcm() error path
    - xen: let alloc_xenballooned_pages() fail if not enough memory free
    - wimax: i2400: fix memory leak
    - wimax: i2400: Fix memory leak in i2400m_op_rfkill_sw_toggle
    - ext4: fix use-after-free race with debug_want_extra_isize
    - ext4: add more paranoia checking in ext4_expand_extra_isize handling
    - rtc: mt6397: fix alarm register overwrite
    - iommu: Remove device link to group on failure
    - gpio: Fix error message on out-of-range GPIO in lookup table
    - hsr: reset network header when supervision frame is created
    - cifs: Adjust indentation in smb2_open_file
    - RDMA/srpt: Report the SCSI residual to the initiator
    - scsi: enclosure: Fix stale device oops with hot replug
    - scsi: sd: Clear sdkp->protection_type if disk is reformatted without PI
    - platform/x86: asus-wmi: Fix keyboard brightness cannot be set to 0
    - iio: imu: adis16480: assign bias value only if operation succeeded
    - mei: fix modalias documentation
    - clk: samsung: exynos5420: Preserve CPU clocks configuration during
      suspend/resume
    - compat_ioctl: handle SIOCOUTQNSD
    - tty: serial: imx: use the sg count from dma_map_sg
    - tty: serial: pch_uart: correct usage of dma_unmap_sg
    - media: exynos4-is: Fix recursive locking in isp_video_release()
    - spi: atmel: fix handling of cs_change set on non-last xfer
    - rtlwifi: Remove unnecessary NULL check in rtl_regd_init
    - rtc: msm6242: Fix reading of 10-hour digit
    - rseq/selftests: Turn off timeout setting
    - hexagon: work around compiler crash
    - ocfs2: call journal flush to mark journal as empty after journal recovery
      when mount
    - ALSA: seq: Fix racy access for queue timer in proc read
    - Fix built-in early-load Intel microcode alignment
    - block: fix an integer overflow in logical block size
    - USB: serial: simple: Add Motorola Solutions TETRA MTP3xxx and MTP85xx
    - USB: serial: opticon: fix control-message timeouts
    - USB: serial: suppress driver bind attributes
    - USB: serial: ch341: handle unbound port at reset_resume
    - USB: serial: io_edgeport: add missing active-port sanity check
    - USB: serial: quatech2: handle unbound ports
    - scsi: mptfusion: Fix double fetch bug in ioctl
    - usb: core: hub: Improved device recognition on remote wakeup
    - x86/efistub: Disable paging at mixed mode entry
    - mm/pag...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers