net test from ubuntu_kernel_selftest will timeout on B-5.0

Bug #1888381 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Unassigned
linux-oem-osp1 (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Fix Released
Undecided
Unassigned

Bug Description

[Impact]
Partial reads on TLS sockets will cause processes to wait indefinitively.

[Test case]
Run tls net selftests from the kernel source tree.

[Regression potential]
We only touch code on the TLS socket family itself, so regressions on subsystems other than the one we are fixing is highly unlikely.

--------------------

Issue found on:
linux-gke-5.0 - 5.0.0-1045.46 / 5.0.0-1044.45
linux-oem-osp1 - 5.0.0-1065.70 / 5.0.0-1064.69

The net test in ubuntu_kernel_selftests will timeout and gets killed, this issue does not exist in gke 5.0.0-1043.44 and oem-osp1 5.0.0-1063.68 (the net test passed with 5.0.0-1063.68 without any error)

The test will hang here:
 Opening 127.0.0.1:9999
 Opening INADDR_ANY:9999
 bind: Address already in use
 Opening in6addr_any:9999
 Opening INADDR_ANY:9999
 bind: Address already in use
 Opening INADDR_ANY:9999 after closing ipv6 socket
 bind: Address already in use
 Timer expired (1800 sec.), nuking pid 27394
        ERROR ubuntu_kernel_selftests.net ubuntu_kernel_selftests.net timestamp=1595327738 localtime=Jul 21 10:35:38 Test timeout expired, rc=15
    END ERROR ubuntu_kernel_selftests.net ubuntu_kernel_selftests.net

Please find the attachment for the complete test log.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Po-Hsu Lin (cypressyew)
tags: added: 5.0 bionic kqa-blocker sru-20200629
tags: added: gke
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Sean Feole (sfeole) wrote :
Download full text (7.5 KiB)

I did see this failure in the logs, however was not able to reproduce it when ran manually on GKE.
I have attached the tarball of the logs.

Ftrace test output below:

07/21 13:32:26 DEBUG| utils:0116| Running 'sudo sh -c 'echo 1 > /proc/sys/net/ipv4/conf/all/accept_local''
07/21 13:32:26 DEBUG| utils:0116| Running 'sudo make -C linux/tools/testing/selftests TARGETS=ftrace run_tests'
07/21 13:32:26 DEBUG| utils:0153| [stdout] make: Entering directory '/home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests'
07/21 13:32:26 DEBUG| utils:0153| [stdout] make[1]: Entering directory '/home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/ftrace'
07/21 13:32:26 DEBUG| utils:0153| [stdout] make[1]: Nothing to be done for 'all'.
07/21 13:32:26 DEBUG| utils:0153| [stdout] make[1]: Leaving directory '/home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/ftrace'
07/21 13:32:26 DEBUG| utils:0153| [stdout] make[1]: Entering directory '/home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/ftrace'
07/21 13:32:26 DEBUG| utils:0153| [stdout] TAP version 13
07/21 13:32:26 DEBUG| utils:0153| [stdout] selftests: ftracetest
07/21 13:32:26 DEBUG| utils:0153| [stdout] ========================================
07/21 13:32:26 DEBUG| utils:0153| [stdout] === Ftrace unit tests ===
07/21 13:32:26 DEBUG| utils:0153| [stdout] [1] Basic trace file check [PASS]
07/21 13:32:28 DEBUG| utils:0153| [stdout] [2] Basic test for tracers [PASS]
07/21 13:32:29 DEBUG| utils:0153| [stdout] [3] Basic trace clock test [PASS]
07/21 13:32:29 DEBUG| utils:0153| [stdout] [4] Basic event tracing check [PASS]
07/21 13:32:29 DEBUG| utils:0153| [stdout] [5] Snapshot and tracing setting [PASS]
07/21 13:32:29 DEBUG| utils:0153| [stdout] [6] event tracing - enable/disable with event level files [PASS]
07/21 13:32:29 DEBUG| utils:0153| [stdout] [7] event tracing - restricts events based on pid [PASS]
07/21 13:32:29 DEBUG| utils:0153| [stdout] [8] event tracing - enable/disable with subsystem level files [PASS]
07/21 13:32:29 DEBUG| utils:0153| [stdout] [9] event tracing - enable/disable with top level files [PASS]
07/21 13:32:32 DEBUG| utils:0153| [stdout] [10] ftrace - function graph filters with stack tracer [PASS]
07/21 13:32:33 DEBUG| utils:0153| [stdout] [11] ftrace - function graph filters [PASS]
07/21 13:32:34 DEBUG| utils:0153| [stdout] [12] ftrace - function pid filters [PASS]
07/21 13:32:37 DEBUG| utils:0153| [stdout] [13] ftrace - test for function event triggers [PASS]
07/21 13:32:41 DEBUG| utils:0153| [stdout] [14] ftrace - function profiler with function tracing [PASS]
07/21 13:32:42 DEBUG| utils:0153| [stdout] [15] ftrace - test reading of set_ftrace_filter [PASS]
07/21 13:32:43 DEBUG| utils:0153| [stdout] [16] ftrace - test for function traceon/off triggers [PASS]
07/21 13:32:47 DEBUG| utils:0153| [stdout] [17] Test creation and deletion of trace instances while setting an event [PASS]
07/21 13:32:48 DEBUG| utils:0153| [st...

Read more...

Revision history for this message
Sean Feole (sfeole) wrote :
Download full text (95.2 KiB)

Net Test below:
07/21 13:21:56 DEBUG| utils:0116| Running 'sudo sh -c 'echo 1 > /proc/sys/net/ipv4/conf/all/accept_local''
07/21 13:21:56 DEBUG| utils:0116| Running 'make -C linux/tools/testing/selftests TARGETS=bpf'
07/21 13:21:56 DEBUG| utils:0153| [stdout] make: Entering directory '/home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests'
07/21 13:21:56 DEBUG| utils:0153| [stdout] make[1]: Entering directory '/home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf'
07/21 13:22:05 DEBUG| utils:0153| [stdout] make -C ../../../lib/bpf OUTPUT=/home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/
07/21 13:22:05 DEBUG| utils:0153| [stdout] make[2]: Entering directory '/home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/lib/bpf'
07/21 13:22:07 DEBUG| utils:0153| [stdout]
07/21 13:22:07 DEBUG| utils:0153| [stdout] Auto-detecting system features:
07/21 13:22:07 DEBUG| utils:0153| [stdout] ... libelf: [ on ]
07/21 13:22:07 DEBUG| utils:0153| [stdout] ... bpf: [ on ]
07/21 13:22:07 DEBUG| utils:0153| [stdout]
07/21 13:22:07 DEBUG| utils:0153| [stdout] HOSTCC /home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/fixdep.o
07/21 13:22:07 DEBUG| utils:0153| [stdout] HOSTLD /home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/fixdep-in.o
07/21 13:22:07 DEBUG| utils:0153| [stdout] LINK /home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/fixdep
07/21 13:22:07 ERROR| utils:0153| [stderr] Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'
07/21 13:22:07 DEBUG| utils:0153| [stdout] CC /home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/libbpf.o
07/21 13:22:08 DEBUG| utils:0153| [stdout] CC /home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/bpf.o
07/21 13:22:08 DEBUG| utils:0153| [stdout] LD /home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/libbpf-in.o
07/21 13:22:08 DEBUG| utils:0153| [stdout] LINK /home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/libbpf.a
07/21 13:22:08 DEBUG| utils:0153| [stdout] LINK /home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/libbpf.so
07/21 13:22:08 DEBUG| utils:0153| [stdout] make[2]: Leaving directory '/home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/lib/bpf'
07/21 13:22:08 DEBUG| utils:0153| [stdout] make -C ../../../lib/bpf OUTPUT=/home/jenkins/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/
07/21 13:22:08 DEBUG| utils:0153| [stdout] make[2]: Entering directory '/home/jenkins/autotest/client/tmp/ubuntu_kerne...

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

A manual test with B-oem-osp1 5.0.0-1065.70 with the source tree from 5.0.0-1063, the net test will hang with the tls test (no other significant output in syslog):

selftests: net: tls
========================================
[==========] Running 37 tests from 2 test cases.
[ RUN ] tls.sendfile
[ OK ] tls.sendfile
[ RUN ] tls.send_then_sendfile
[ OK ] tls.send_then_sendfile
[ RUN ] tls.recv_max
[ OK ] tls.recv_max
[ RUN ] tls.recv_small
[ OK ] tls.recv_small
[ RUN ] tls.msg_more
[ OK ] tls.msg_more
[ RUN ] tls.sendmsg_single
[ OK ] tls.sendmsg_single
[ RUN ] tls.sendmsg_fragmented
[ OK ] tls.sendmsg_fragmented
[ RUN ] tls.sendmsg_large
[ OK ] tls.sendmsg_large
[ RUN ] tls.sendmsg_multiple
[ OK ] tls.sendmsg_multiple
[ RUN ] tls.sendmsg_multiple_stress
[ OK ] tls.sendmsg_multiple_stress
[ RUN ] tls.splice_from_pipe
[ OK ] tls.splice_from_pipe
[ RUN ] tls.splice_from_pipe2
[ OK ] tls.splice_from_pipe2
[ RUN ] tls.send_and_splice
[ OK ] tls.send_and_splice
[ RUN ] tls.splice_to_pipe
[ OK ] tls.splice_to_pipe
[ RUN ] tls.recvmsg_single
[ OK ] tls.recvmsg_single
[ RUN ] tls.recvmsg_single_max
[ OK ] tls.recvmsg_single_max
[ RUN ] tls.recvmsg_multiple
[ OK ] tls.recvmsg_multiple
[ RUN ] tls.single_send_multiple_recv
[ OK ] tls.single_send_multiple_recv
[ RUN ] tls.multiple_send_single_recv
[ OK ] tls.multiple_send_single_recv
[ RUN ] tls.single_send_multiple_recv_non_align
(hang here)

Jul 21 13:56:42 gonzo systemd[1]: Starting Stop ureadahead data collection...
Jul 21 13:56:42 gonzo systemd[1]: Started Stop ureadahead data collection.
Jul 21 13:56:49 gonzo kernel: [ 213.788101] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 13:56:49 gonzo kernel: [ 213.789053] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 13:56:49 gonzo kernel: [ 213.790508] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 13:56:49 gonzo kernel: [ 213.791065] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 13:56:49 gonzo kernel: [ 213.791574] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 13:56:49 gonzo kernel: [ 213.792045] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 13:56:49 gonzo kernel: [ 213.792538] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 13:56:50 gonzo kernel: [ 214.892627] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 13:56:50 gonzo kernel: [ 214.894322] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 13:56:50 gonzo kernel: [ 214.894919] tls_set_device_offload_rx: netdev lo with no TLS offload

Next is to test with source -1065 tree

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

With the 5.0.0-1065.70 kernel + 5.0.0-1065.70 tree on gonzo, it's still hanging on the same point:

[ RUN ] tls.single_send_multiple_recv_non_align

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Sean, for your comment #3, there is no such tls net test.
  selftests: net: tls

Which source tree you're using when testing this?

(I can see this test in 5.0.0-1043.44 too)

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

OK I have download the tar file provided in comment #2, it's for 4.15 GKE not for 5.0 GKE:
'apt-get source --download-only linux-modules-extra-4.15.0-1066-gke'

Can you give it a try on 5.0 GKE?
Thanks

Revision history for this message
Sean Feole (sfeole) wrote :

I was able to reproduce this with linux-gke-5.0 after another attempt and ensuring the correct packages were installed.

Confirmed the modules pkg and kernel is 5.0, Linux sfgke 5.0.0-58-generic #62~18.04.1-Ubuntu SMP Tue Jul 14 03:37:30 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

[==========] Running 37 tests from 2 test cases.
[ RUN ] tls.sendfile
[ OK ] tls.sendfile
[ RUN ] tls.send_then_sendfile
[ OK ] tls.send_then_sendfile
[ RUN ] tls.recv_max
[ OK ] tls.recv_max
[ RUN ] tls.recv_small
[ OK ] tls.recv_small
[ RUN ] tls.msg_more
[ OK ] tls.msg_more
[ RUN ] tls.sendmsg_single
[ OK ] tls.sendmsg_single
[ RUN ] tls.sendmsg_fragmented
[ OK ] tls.sendmsg_fragmented
[ RUN ] tls.sendmsg_large
[ OK ] tls.sendmsg_large
[ RUN ] tls.sendmsg_multiple
[ OK ] tls.sendmsg_multiple
[ RUN ] tls.sendmsg_multiple_stress
[ OK ] tls.sendmsg_multiple_stress
[ RUN ] tls.splice_from_pipe
[ OK ] tls.splice_from_pipe
[ RUN ] tls.splice_from_pipe2
[ OK ] tls.splice_from_pipe2
[ RUN ] tls.send_and_splice
[ OK ] tls.send_and_splice
[ RUN ] tls.splice_to_pipe
[ OK ] tls.splice_to_pipe
[ RUN ] tls.recvmsg_single
[ OK ] tls.recvmsg_single
[ RUN ] tls.recvmsg_single_max
[ OK ] tls.recvmsg_single_max
[ RUN ] tls.recvmsg_multiple
[ OK ] tls.recvmsg_multiple
[ RUN ] tls.single_send_multiple_recv
[ OK ] tls.single_send_multiple_recv
[ RUN ] tls.multiple_send_single_recv
[ OK ] tls.multiple_send_single_recv
[ RUN ] tls.single_send_multiple_recv_non_align

Jul 21 19:33:25 sfgke kernel: [10842.325048] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 19:33:25 sfgke kernel: [10842.332940] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 19:33:25 sfgke kernel: [10842.340720] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 19:33:25 sfgke kernel: [10842.348469] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 19:33:25 sfgke kernel: [10842.356168] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 19:33:25 sfgke kernel: [10842.364229] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 19:33:25 sfgke kernel: [10842.372100] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 19:33:26 sfgke kernel: [10843.477618] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 19:33:27 sfgke kernel: [10843.529738] tls_set_device_offload_rx: netdev lo with no TLS offload
Jul 21 19:33:27 sfgke kernel: [10843.536771] tls_set_device_offload_rx: netdev lo with no TLS offload

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

The messages on dmesg "tls_set_device_offload_rx: netdev lo with no TLS offload" doesn't seem to be related to the hung. These messages are logged while running another net/tls testcase and can be seen also on previous kernel versions in which the hang can't be reproduced.

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

The issue is reproducible by the following tls testcases:
- single_send_multiple_recv_non_align
- recv_lowat

Commenting these out from ./tools/testing/selftests/net/tls.c the tls test completes successfully.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Kleber has identified the backport of commit 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") as the responsible for this regression.

It turns out this commit has 4 fixups upstream, 3 of which can be easily cherry picked, while the fourth one seems related to TLS 1.3 support, which is not included in our 5.0 kernels yet.

Picking those 3 commits allow the tests to pass without issues.

description: updated
Stefan Bader (smb)
Changed in linux-oem-osp1 (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :
Changed in linux-oem-osp1 (Ubuntu Bionic):
status: New → In Progress
Changed in linux-oem-osp1 (Ubuntu Bionic):
status: In Progress → Fix Committed
status: Fix Committed → In Progress
Changed in linux-oem-osp1 (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.9 KiB)

This bug was fixed in the package linux-oem-osp1 - 5.0.0-1067.72

---------------
linux-oem-osp1 (5.0.0-1067.72) bionic; urgency=medium

  * bionic/linux-oem-osp1: 5.0.0-1067.72 -proposed tracker (LP: #1889723)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync dkms-build and family

  [ Ubuntu: 5.0.0-60.64 ]

  * disco/linux: 5.0.0-60.64 -proposed tracker (LP: #1889602)
  * debian/scripts/file-downloader does not handle positive failures correctly
    (LP: #1878897)
    - [Packaging] file-downloader not handling positive failures correctly
  * dkms artifacts may expire from the pool (LP: #1850958)
    - [Packaging] autoreconstruct -- manage executable debian files
    - [packaging] handle downloads from the librarian better

  [ Ubuntu: 5.0.0-59.63 ]

  * disco/linux: 5.0.0-59.63 -proposed tracker (LP: #1888293)
  * net test from ubuntu_kernel_selftest will timeout on B-5.0 (LP: #1888381)
    - net/tls: fix lowat calculation if some data came from previous record
    - net/tls: fix no wakeup on partial reads
    - net/tls: fix poll ignoring partially copied records
  * Regression in kernel 4.15.0-91 causes kernel panic with Bcache
    (LP: #1867916)
    - bcache: check and adjust logical block size for backing devices
  * use-after-free in af_alg_accept() due to bh_lock_sock() (LP: #1884766)
    - crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()
  * Disco update: upstream stable patchset 2020-07-08 (LP: #1886876)
    - ARM: dts: renesas: Fix IOMMU device node names
    - scsi: core: free sgtables in case command setup fails
    - arm64: dts: meson: fixup SCP sram nodes
    - powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT
    - pinctrl: ocelot: Fix GPIO interrupt decoding on Jaguar2
    - clk: renesas: cpg-mssr: Fix STBCR suspend/resume handling
    - coresight: tmc: Fix TMC mode read in tmc_read_prepare_etb()
    - scsi: hisi_sas: Do not reset phy timer to wait for stray phy up
    - PCI: pci-bridge-emul: Fix PCIe bit conflicts
    - usb: gadget: core: sync interrupt before unbind the udc
    - clk: zynqmp: fix memory leak in zynqmp_register_clocks
    - scsi: vhost: Notify TCM about the maximum sg entries supported per command
    - IB/mlx5: Fix DEVX support for MLX5_CMD_OP_INIT2INIT_QP command
    - cifs: set up next DFS target before generic_ip_connect()
    - ASoC: qcom: q6asm-dai: kCFI fix
    - sparc32: mm: Don't try to free page-table pages if ctor() fails
    - net: dsa: lantiq_gswip: fix and improve the unsupported interface error
    - bpf, sockhash: Fix memory leak when unlinking sockets in sock_hash_free
    - RDMA/mlx5: Fix udata response upon SRQ creation
    - clk: meson: meson8b: Fix the polarity of the RESET_N lines
    - clk: meson: meson8b: Fix the vclk_div{1, 2, 4, 6, 12}_en gate bits
    - gpio: pca953x: fix handling of automatic address incrementing
    - ASoC: max98373: reorder max98373_reset() in resume
    - soundwire: slave: don't init debugfs on device registration error
    - arm64: dts: msm8996: Fix CSI IRQ types
    - scsi: target: loopback: Fix READ with data and sensebytes
    - SoC: rsnd: add interrupt support for SSI BUSIF buffer
    - ASoC: ux500: mop500:...

Changed in linux-oem-osp1 (Ubuntu Bionic):
status: In Progress → Fix Released
Po-Hsu Lin (cypressyew)
Changed in linux-oem-osp1 (Ubuntu):
status: Fix Committed → Fix Released
Changed in ubuntu-kernel-tests:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.