xfrm_policy.sh / pmtu.sh / udpgso_bench.sh from net in ubuntu_kernel_selftests will fail with timeout if running the whole suite

Bug #1856010 reported by Po-Hsu Lin
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Po-Hsu Lin
linux (Ubuntu)
Opinion
Undecided
Po-Hsu Lin
Bionic
Fix Released
Undecided
Po-Hsu Lin
Focal
Opinion
Undecided
Unassigned
Groovy
Opinion
Undecided
Unassigned
Hirsute
Opinion
Undecided
Unassigned
Impish
Opinion
Undecided
Po-Hsu Lin
linux-oem-5.6 (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned
Focal
Won't Fix
Undecided
Po-Hsu Lin
Groovy
Invalid
Undecided
Unassigned
Hirsute
Invalid
Undecided
Unassigned
Impish
Invalid
Undecided
Unassigned

Bug Description

[Impact]
These 3 tests will fail with timeout error when running the whole
"net" test in ubuntu_kernel_selftests:
  * not ok 12 selftests: net: xfrm_policy.sh # TIMEOUT
  * not ok 16 selftests: net: pmtu.sh # TIMEOUT
  * not ok 19 selftests: net: udpgso_bench.sh # TIMEOUT

They will pass if you run them manually. This is because of the
default 45 seconds timeout in kselftest framework.

A quick test shows these tests will take about:
  xfrm_policy.sh - 2m19.690s
  pmtu.sh - 3m6.832s
  udpgso_bench.sh - 0m57.985s

[Fix]
* b881d089c7c9c7 ("selftests/net: bump timeout to 5 minutes")

We have commit 852c8cbf34d3b3 ("selftests/kselftest/runner.sh: Add 45
second timeout per test") for default timeout added since Bionic.

However there is a SAUCE patch ("UBUNTU: SAUCE: selftests/net --
disable timeout") to disable timeout for the net test in newer
releases. I think we can leave it as-is for the moment unless some
test is hanging too long because of that.

Therefore Only Bionic needs this patch, the patch can be applied with
some context adjustment.

[Test]
With this patch applied, these tests should have a chance to finish.

[Where problems could occur]
The fix is just for testing tool, no actual impact to real kernel
functions. If this 5 minutes timeout is not enough, we might still
seeing this kind of failures in the test report.

[Original Bug Report]
These 3 tests will fail with timeout when running the whole "net" test in ubuntu_kernel_selftests:
  * not ok 12 selftests: net: xfrm_policy.sh # TIMEOUT
  * not ok 16 selftests: net: pmtu.sh # TIMEOUT
  * not ok 19 selftests: net: udpgso_bench.sh # TIMEOUT

However they will pass if you run them manually.

So there must be some test in net that will cause this.

From the test result it looks like the test was executed in the following sequence:
 ok 1 selftests: net: reuseport_bpf
 ok 2 selftests: net: reuseport_bpf_cpu
 ok 3 selftests: net: reuseport_bpf_numa
 ok 4 selftests: net: reuseport_dualstack
 # Successok 5 selftests: net: reuseaddr_conflict
 ok 6 selftests: net: tls
 ok 7 selftests: net: run_netsocktests
 ok 8 selftests: net: run_afpackettests
 ok 9 selftests: net: test_bpf.sh
 ok 10 selftests: net: netdevice.sh
 ok 11 selftests: net: rtnetlink.sh
 not ok 12 selftests: net: xfrm_policy.sh # TIMEOUT
 not ok 13 selftests: net: test_blackhole_dev.sh # exit=1
 ok 14 selftests: net: fib_tests.sh
 ok 15 selftests: net: fib-onlink-tests.sh
 not ok 16 selftests: net: pmtu.sh # TIMEOUT
 ok 17 selftests: net: udpgso.sh
 not ok 18 selftests: net: ip_defrag.sh # exit=255
 not ok 19 selftests: net: udpgso_bench.sh # TIMEOUT
 ok 20 selftests: net: fib_rule_tests.sh
 not ok 21 selftests: net: msg_zerocopy.sh # exit=1
 ok 22 selftests: net: psock_snd.sh
 ok 23 selftests: net: udpgro_bench.sh
 ok 24 selftests: net: udpgro.sh
 ok 25 selftests: net: test_vxlan_under_vrf.sh
 ok 26 selftests: net: reuseport_addr_any.sh
 ok 27 selftests: net: test_vxlan_fdb_changelink.sh
 ok 28 selftests: net: so_txtime.sh
 ok 29 selftests: net: ipv6_flowlabel.sh
 ok 30 selftests: net: tcp_fastopen_backup_key.sh

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: linux-image-5.3.0-1009-aws 5.3.0-1009.10
ProcVersionSignature: User Name 5.3.0-1009.10-aws 5.3.13
Uname: Linux 5.3.0-1009-aws aarch64
ApportVersion: 2.20.11-0ubuntu8.3
Architecture: arm64
Date: Wed Dec 11 06:42:39 2019
Ec2AMI: ami-047cec24582f6ae0d
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-2c
Ec2InstanceType: a1.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
SourcePackage: linux-aws
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Po-Hsu Lin (cypressyew)
tags: added: sru-20191202
tags: added: 5.3 aws ubuntu-kernel-selftests
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

There was a backport for 5.3 of a timeout, I am in the process of finding out a good timeout value for the net testsuite before I can send it upstream. Let me send the value I found so far, which works for xfrm_policy.sh, which seems to be the one that takes most time. 150 would be fine here.

Revision history for this message
Sean Feole (sfeole) wrote :

Thanks cascardo, i'll keep an eye for the update to autotest-client-tests and can issue a re-run once that fix is in place

Changed in ubuntu-kernel-tests:
status: New → Triaged
Po-Hsu Lin (cypressyew)
tags: added: sru-20200106
tags: added: gke
Po-Hsu Lin (cypressyew)
tags: added: sru-20200629
Po-Hsu Lin (cypressyew)
tags: added: focal
tags: added: sru-20200831
Po-Hsu Lin (cypressyew)
tags: added: sru-20201109
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

On KVM node zeppo, these tests will take:

xfrm_policy.sh - 2m15s
pmtu.sh - 2m36s
udpgso_bench.sh - 58s

The reason why it will fail when running the whole suite is that, the default timeout in kselftest framework for each tests is 45 seconds, so they're terminated by the default timeout.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Another two attempts with 5.12.0-051200rc8-generic on KVM node zeppo
xfrm_policy.sh - 2m19.690s / 2m11.881s
pmtu.sh - 3m6.832s / 3m4.413s
udpgso_bench.sh - 0m54.480s / 0m57.985s

Therefore I think a 5 min timeout might be a reasonable choice.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

On a bare-metal amd64 node "glameow" with 5.10.0-1020-oem:
xfrm_policy.sh - 2m19.858s
pmtu.sh - 2m53.198s
udpgso_bench.sh - 0m57.527s

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

https://<email address hidden>/

Changed in ubuntu-kernel-tests:
status: Triaged → In Progress
assignee: nobody → Po-Hsu Lin (cypressyew)
affects: linux-aws (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → Po-Hsu Lin (cypressyew)
Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu Bionic):
status: New → In Progress
assignee: nobody → Po-Hsu Lin (cypressyew)
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu Groovy):
status: New → Invalid
Changed in linux (Ubuntu Focal):
status: New → Invalid
Changed in linux (Ubuntu Hirsute):
status: New → Invalid
Changed in linux (Ubuntu Groovy):
status: Invalid → Opinion
Changed in linux (Ubuntu Focal):
status: Invalid → Opinion
Changed in linux (Ubuntu Hirsute):
status: Invalid → Opinion
Changed in linux (Ubuntu Impish):
status: In Progress → Opinion
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: oem oem-5.6 sru-20210412
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Oops, looks like we need this for OEM-5.6 as well.

Changed in linux-oem-5.6 (Ubuntu Focal):
assignee: nobody → Po-Hsu Lin (cypressyew)
status: New → In Progress
Changed in linux-oem-5.6 (Ubuntu Bionic):
status: New → Invalid
Changed in linux-oem-5.6 (Ubuntu Groovy):
status: New → Invalid
Changed in linux-oem-5.6 (Ubuntu Hirsute):
status: New → Invalid
Changed in linux-oem-5.6 (Ubuntu Impish):
status: New → Invalid
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
AceLan Kao (acelankao)
Changed in linux-oem-5.6 (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed on G/KVM, cycle sru-20210510 .

tags: added: groovy kvm linux-kvm sru-20210510
Po-Hsu Lin (cypressyew)
summary: xfrm_policy.sh / pmtu.sh / udpgso_bench.sh from net in
- ubuntu_kernel_selftests will fail if running the whole suite
+ ubuntu_kernel_selftests will fail with timeout if running the whole
+ suite
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

For G/KVM,
xfrm_policy.sh is failing with:
  expected ping to .254 to fail (exceptions)
  lp:1900645 xfrm_policy.sh in net from ubuntu_kernel_selftests failed with "expected ping to .254 to fail" on Groovy
pmtu.sh is failing with:
  lp:1887661 pmtu.sh from net in ubuntu_kernel_selftests failed with no error message
udpgso_bench.sh passed.

Therefore I changed the bug title here.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Didn't see this timeout failure on B-4.15

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed on H/KVM, cycle sru-20210510 .

tags: added: 5.11 hirsute
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (21.6 KiB)

This bug was fixed in the package linux - 4.15.0-144.148

---------------
linux (4.15.0-144.148) bionic; urgency=medium

  * bionic/linux: 4.15.0-144.148 -proposed tracker (LP: #1927648)

  * Introduce the 465 driver series, fabric-manager, and libnvidia-nscq
    (LP: #1925522)
    - debian/dkms-versions -- add NVIDIA 465 and migrate 450 to 460

  * xfrm_policy.sh / pmtu.sh / udpgso_bench.sh from net in
    ubuntu_kernel_selftests will fail if running the whole suite (LP: #1856010)
    - selftests/net: bump timeout to 5 minutes

  * locking/qrwlock: Fix ordering in queued_write_lock_slowpath() (LP: #1926184)
    - locking/barriers: Introduce smp_cond_load_relaxed() and
      atomic_cond_read_relaxed()
    - locking/qrwlock: Fix ordering in queued_write_lock_slowpath()

  * Bionic update: upstream stable patchset 2021-04-30 (LP: #1926808)
    - net: fec: ptp: avoid register access when ipg clock is disabled
    - powerpc/4xx: Fix build errors from mfdcr()
    - atm: eni: dont release is never initialized
    - atm: lanai: dont run lanai_dev_close if not open
    - Revert "r8152: adjust the settings about MAC clock speed down for RTL8153"
    - ixgbe: Fix memleak in ixgbe_configure_clsu32
    - net: tehuti: fix error return code in bdx_probe()
    - sun/niu: fix wrong RXMAC_BC_FRM_CNT_COUNT count
    - gpiolib: acpi: Add missing IRQF_ONESHOT
    - nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default
    - NFS: Correct size calculation for create reply length
    - net: hisilicon: hns: fix error return code of hns_nic_clear_all_rx_fetch()
    - net: wan: fix error return code of uhdlc_init()
    - atm: uPD98402: fix incorrect allocation
    - atm: idt77252: fix null-ptr-dereference
    - sparc64: Fix opcode filtering in handling of no fault loads
    - u64_stats,lockdep: Fix u64_stats_init() vs lockdep
    - drm/radeon: fix AGP dependency
    - nfs: we don't support removing system.nfs4_acl
    - ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
    - ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
    - squashfs: fix inode lookup sanity checks
    - squashfs: fix xattr id and id lookup sanity checks
    - arm64: dts: ls1046a: mark crypto engine dma coherent
    - arm64: dts: ls1012a: mark crypto engine dma coherent
    - arm64: dts: ls1043a: mark crypto engine dma coherent
    - ARM: dts: at91-sama5d27_som1: fix phy address to 7
    - dm ioctl: fix out of bounds array access when no devices
    - bus: omap_l3_noc: mark l3 irqs as IRQF_NO_THREAD
    - libbpf: Fix INSTALL flag order
    - macvlan: macvlan_count_rx() needs to be aware of preemption
    - net: dsa: bcm_sf2: Qualify phydev->dev_flags based on port
    - e1000e: add rtnl_lock() to e1000_reset_task
    - e1000e: Fix error handling in e1000_set_d0_lplu_state_82571
    - net/qlcnic: Fix a use after free in qlcnic_83xx_get_minidump_template
    - ftgmac100: Restart MAC HW once
    - can: peak_usb: add forgotten supported devices
    - can: c_can_pci: c_can_pci_remove(): fix use-after-free
    - can: c_can: move runtime PM enable/disable to c_can_platform
    - can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning
    - mac80211: fix rate mask reset
    - net: cdc-pho...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed on G/KVM, cycle sru-20210531 .

tags: added: sru-20210531
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

groovy/azure 5.8.0-1038.40

tags: added: 5.8 azure sru-20210621
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed in G/Oracle, cycle sru-20210621.

tags: added: oracle
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.6 (Ubuntu Focal):
status: Fix Committed → Won't Fix
Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.