ubuntu_nbd_smoke_test failed on P9 with Bionic kernel

Bug #1822247 reported by Po-Hsu Lin on 2019-03-29
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Undecided
Unassigned
linux (Ubuntu)
Medium
Colin Ian King
Bionic
Undecided
Unassigned

Bug Description

== SRU Justification, BIONIC ==

[Impact]

Running the autotest regression test ubuntu_nbd_smoke occasionally trips I/O errors such as:

[ 700.668758] print_req_error: I/O error, dev nbd0, sector 0
[ 700.668840] Buffer I/O error on dev nbd0, logical block 0, async page read

This happens when the nbd client detaches and the backing store is removed. The fix is to ensure a rescan occurs.

[Fix]

Upstream fix fe1f9e6659ca6124f500a0f829202c7c902fab0c (" nbd: fix how we set bd_invalidated"). If a disconnect action happens the
partition table gets invalidated and rescanned properly.

[Test]

Without the fix running the Ubuntu Kernel Team nbd smoke test will occasionally trip the bug and the error message from the kernel appears and the test fails. With the fix, this test has been run tens of times without showing the failure.

To test run:
autotest/client/local-test autotest/client/tests/ubuntu_nbd_smoke_test/control

[Regression Potential]

This fix touches just the NBD driver and has been upstream since May 2018 without any subsequent bug reports against it, so it is a known good fix. At most it sets the bd_invalidated flag more aggressively, so the rescan will occur, so I believe the regression potential is very limited just to ndb client connect/disconnect activity.

---------

Reproduce rate: 2 out of 3 runs
This issue can be found on other arches as well, normally it will fail with the first run and pass with the second attempt.

  * Command:
      /home/ubuntu/autotest/client/tests/ubuntu_nbd_smoke_test/ubuntu_nbd_smoke
      _test.sh
  Exit status: 1
  Duration: 7.90551400185

  stdout:
  creating backing nbd image /tmp/nbd_image.img

  --------------------------------------------------------------------------------
  Image path: /tmp/nbd_image.img
  Mount point: /mnt/nbd-test-13924
  Date: Fri Mar 29 06:06:04 UTC 2019
  Host: baltar
  Kernel: 4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:32:48 UTC 2019
  Machine: baltar ppc64le ppc64le
  CPUs online: 160
  CPUs total: 160
  Page size: 65536
  Pages avail: 1904339
  Pages total: 2089666
  Free space:
  Filesystem Size Used Avail Use% Mounted on
  udev 61G 0 61G 0% /dev
  tmpfs 13G 12M 13G 1% /run
  /dev/sda2 1.8T 11G 1.7T 1% /
  tmpfs 64G 0 64G 0% /dev/shm
  tmpfs 5.0M 0 5.0M 0% /run/lock
  tmpfs 64G 0 64G 0% /sys/fs/cgroup
  tmpfs 13G 0 13G 0% /run/user/1000
  --------------------------------------------------------------------------------

  NBD device /dev/nbd0 created
  found nbd export
  NBD exports found:
  test
  starting client with NBD device /dev/nbd0
  Negotiation: ..size = 128MB
  creating ext4 on /dev/nbd0
  mkfs on /dev/nbd0 succeeded after 0 attempt(s)
  checking ext4 on /dev/nbd0
  fsck from util-linux 2.31.1
  /dev/nbd0: clean, 11/32768 files, 9787/131072 blocks

  mount:
  /dev/nbd0 on /mnt/nbd-test-13924 type ext4 (rw,relatime,data=ordered)
  mounted on /dev/nbd0

  free:
  Filesystem 1K-blocks Used Available Use% Mounted on
  /dev/nbd0 122835 1550 112111 2% /mnt/nbd-test-13924

  creating large file /mnt/nbd-test-13924/largefile
  -rw-r--r-- 1 root root 100M Mar 29 06:06 /mnt/nbd-test-13924/largefile

  free:
  Filesystem 1K-blocks Used Available Use% Mounted on
  /dev/nbd0 122835 103951 9710 92% /mnt/nbd-test-13924

  removing file /mnt/nbd-test-13924/largefile
  unmounting /mnt/nbd-test-13924
  stopping client
  disconnect, sock, done
  Found kernel warning, IO error and/or call trace
  echo
  [ 694.662746] creating backing nbd image /tmp/nbd_image.img
  [ 695.821047] NBD device /dev/nbd0 created
  [ 696.271555] found nbd export
  [ 697.318393] starting client with NBD device /dev/nbd0
  [ 697.323210] creating ext4 on /dev/nbd0
  [ 697.589620] mkfs on /dev/nbd0 succeeded after 0 attempt(s)
  [ 697.821953] checking ext4 on /dev/nbd0
  [ 697.919173] EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
  [ 697.925418] mounted on /dev/nbd0
  [ 697.927107] creating large file /mnt/nbd-test-13924/largefile
  [ 698.839327] removing file /mnt/nbd-test-13924/largefile
  [ 699.596736] unmounting /mnt/nbd-test-13924
  [ 700.664881] stopping client
  [ 700.667573] block nbd0: NBD_DISCONNECT
  [ 700.667733] block nbd0: shutting down sockets
  [ 700.668690] nbd0: detected capacity change from 0 to 134217728
  [ 700.668758] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.668840] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669068] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669124] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669203] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669243] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669315] ldm_validate_partition_table(): Disk read failed.
  [ 700.669324] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669364] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669542] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669580] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669646] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669686] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669747] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669785] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669850] Dev nbd0: unable to read RDB block 0
  [ 700.669895] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669932] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.670014] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.670053] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.670118] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.670161] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.670237] nbd0: unable to read partition table
  [ 700.880422] Found kernel warning, IO error and/or call trace
  [ 700.880479] echo
  killing server
  ================================================================================

  Completed

  Kernel issues:

  Found kernel warning, IO error and/or call trace:

  TEST:

  [ 694.662746] creating backing nbd image /tmp/nbd_image.img
  [ 695.821047] NBD device /dev/nbd0 created
  [ 696.271555] found nbd export
  [ 697.318393] starting client with NBD device /dev/nbd0
  [ 697.323210] creating ext4 on /dev/nbd0
  [ 697.589620] mkfs on /dev/nbd0 succeeded after 0 attempt(s)
  [ 697.821953] checking ext4 on /dev/nbd0
  [ 697.919173] EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
  [ 697.925418] mounted on /dev/nbd0
  [ 697.927107] creating large file /mnt/nbd-test-13924/largefile
  [ 698.839327] removing file /mnt/nbd-test-13924/largefile
  [ 699.596736] unmounting /mnt/nbd-test-13924
  [ 700.664881] stopping client
  [ 700.667573] block nbd0: NBD_DISCONNECT
  [ 700.667733] block nbd0: shutting down sockets
  [ 700.668690] nbd0: detected capacity change from 0 to 134217728
  [ 700.668758] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.668840] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669068] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669124] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669203] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669243] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669315] ldm_validate_partition_table(): Disk read failed.
  [ 700.669324] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669364] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669542] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669580] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669646] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669686] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669747] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669785] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.669850] Dev nbd0: unable to read RDB block 0
  [ 700.669895] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.669932] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.670014] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.670053] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.670118] print_req_error: I/O error, dev nbd0, sector 0
  [ 700.670161] Buffer I/O error on dev nbd0, logical block 0, async page read
  [ 700.670237] nbd0: unable to read partition table
  [ 700.880422] Found kernel warning, IO error and/or call trace
  [ 700.880479] echo
  stderr:
  bs=1024, sz=134217728 bytes
  timeout=30
  e2fsck 1.44.1 (24-Mar-2018)

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-46-generic 4.15.0-46.49
ProcVersionSignature: Ubuntu 4.15.0-46.49-generic 4.15.18
Uname: Linux 4.15.0-46-generic ppc64le
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Mar 29 05:54 seq
 crw-rw---- 1 root audio 116, 33 Mar 29 05:54 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.6
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CurrentDmesg:

Date: Fri Mar 29 06:10:51 2019
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 0451:80ff Texas Instruments, Inc.
 Bus 001 Device 004: ID 0557:2419 ATEN International Co., Ltd
 Bus 001 Device 002: ID 0557:7000 ATEN International Co., Ltd Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
PciMultimedia:

ProcFB: 0 astdrmfb
ProcKernelCmdLine: root=UUID=acd1a0d7-f6fc-4130-928c-c8b11ad6e4be ro console=hvc0
ProcLoadAvg: 0.09 0.30 0.36 1/1356 14937
ProcLocks:
 1: POSIX ADVISORY WRITE 3926 00:17:565 0 EOF
 2: POSIX ADVISORY WRITE 3919 00:17:570 0 EOF
 3: POSIX ADVISORY WRITE 1802 00:17:340 0 EOF
 4: FLOCK ADVISORY WRITE 3961 00:17:572 0 EOF
 5: FLOCK ADVISORY WRITE 4492 00:17:335 0 EOF
ProcSwaps:
 Filename Type Size Used Priority
 /swap.img file 8388544 0 -2
ProcVersion: Linux version 4.15.0-46-generic (buildd@bos02-ppc64el-009) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #49-Ubuntu SMP Wed Feb 6 09:32:48 UTC 2019
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-46-generic N/A
 linux-backports-modules-4.15.0-46-generic N/A
 linux-firmware 1.173.3
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
VarLogDump_list: total 0
cpu_cores: Number of cores present = 40
cpu_coreson: Number of cores online = 40
cpu_dscr: DSCR is 16
cpu_freq:
 min: 2.862 GHz (cpu 159)
 max: 2.862 GHz (cpu 1)
 avg: 2.862 GHz
cpu_runmode:
 Could not retrieve current diagnostics mode,
 No kernel interface to firmware
cpu_smt: SMT=4

Po-Hsu Lin (cypressyew) wrote :

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1822247

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: New → Incomplete
Po-Hsu Lin (cypressyew) wrote :
Download full text (7.2 KiB)

Similar failure could be found on X-4.15, AMD64 (onibi) / i386 (pepe)

* Command:
/home/ubuntu/autotest/client/tests/ubuntu_nbd_smoke_test/ubuntu_nbd_smoke
_test.sh
Exit status: 1
Duration: 8.1468269825

stdout:
creating backing nbd image /tmp/nbd_image.img

--------------------------------------------------------------------------------
Image path: /tmp/nbd_image.img
Mount point: /mnt/nbd-test-18217
Date: Fri Mar 29 07:44:02 UTC 2019
Host: onibi
Kernel: 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15 16:06:21 UTC 2019
Machine: onibi x86_64 x86_64
CPUs online: 4
CPUs total: 4
Page size: 4096
Pages avail: 1340920
Pages total: 2038430
Free space:
Filesystem Size Used Avail Use% Mounted on
udev 3.9G 0 3.9G 0% /dev
tmpfs 797M 17M 780M 3% /run
/dev/sda1 917G 9.1G 861G 2% /
tmpfs 3.9G 4.0K 3.9G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
tmpfs 797M 0 797M 0% /run/user/1000
tmpfs 100K 0 100K 0% /var/lib/lxd/shmounts
tmpfs 100K 0 100K 0% /var/lib/lxd/devlxd
cgmfs 100K 0 100K 0% /run/cgmanager/fs
--------------------------------------------------------------------------------

NBD device /dev/nbd0 created
found nbd export
NBD exports found:
test
starting client with NBD device /dev/nbd0
Negotiation: ..size = 128MB
creating ext4 on /dev/nbd0
mkfs on /dev/nbd0 succeeded after 0 attempt(s)
checking ext4 on /dev/nbd0
fsck from util-linux 2.27.1
/dev/nbd0: clean, 11/32768 files, 9787/131072 blocks

mount:
/dev/nbd0 on /mnt/nbd-test-18217 type ext4 (rw,relatime,data=ordered)
mounted on /dev/nbd0

free:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/nbd0 122835 1550 112111 2% /mnt/nbd-test-18217

creating large file /mnt/nbd-test-18217/largefile
-rw-r--r-- 1 root root 100M Mar 29 07:44 /mnt/nbd-test-18217/largefile

free:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/nbd0 122835 103951 9710 92% /mnt/nbd-test-18217

removing file /mnt/nbd-test-18217/largefile
unmounting /mnt/nbd-test-18217
stopping client
disconnect, sock, done
Found kernel warning, IO error and/or call trace
echo
[ 4088.409864] creating backing nbd image /tmp/nbd_image.img
[ 4090.412514] NBD device /dev/nbd0 created
[ 4090.464858] found nbd export
[ 4091.510701] starting client with NBD device /dev/nbd0
[ 4091.512739] creating ext4 on /dev/nbd0
[ 4091.799747] mkfs on /dev/nbd0 succeeded after 0 attempt(s)
[ 4092.155113] checking ext4 on /dev/nbd0
[ 4092.232523] EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
[ 4092.237839] mounted on /dev/nbd0
[ 4092.239111] creating large file /mnt/nbd-test-18217/largefile
[ 4093.429863] removing file /mnt/nbd-test-18217/largefile
[ 4093.685500] unmounting /mnt/nbd-test-18217
[ 4094.741235] stopping client
[ 4094.741850] block nbd0: NBD_DISCONNECT
[ 4094.741892] block nbd0: shutting down sockets
[ 4094.741988] nbd0: detected capacity change from 0 to 134217728
[ 4094.742025] print_req_error: 1 callbacks suppressed
[ 4094.742026] print_req_error: I/O error, dev nbd0, sector 0
[ 4094.742053] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4094.742091] print_req_error: I/O error, dev nbd0, sector 0
[ 4094.742113] Buffer I/O error on dev nbd0, logical block 0, a...

Read more...

Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
status: Incomplete → In Progress
Colin Ian King (colin-king) wrote :

I've not been able to reproduce this on my VMs, can I get access to one of these machines to figure out why it is failing and how to fix this.

description: updated
description: updated
Colin Ian King (colin-king) wrote :
Changed in linux (Ubuntu Bionic):
status: Incomplete → In Progress
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Po-Hsu Lin (cypressyew) wrote :

The nbd smoke test passed on Bionic P9 (4.15.0-49.53) without any issue.
Thanks!

tags: added: verification-done-bionic
removed: verification-needed-bionic
Colin Ian King (colin-king) wrote :

Thanks for testing!

Launchpad Janitor (janitor) wrote :
Download full text (12.6 KiB)

This bug was fixed in the package linux - 4.15.0-50.54

---------------
linux (4.15.0-50.54) bionic; urgency=medium

  * CVE-2018-12126 // CVE-2018-12127 // CVE-2018-12130
    - Documentation/l1tf: Fix small spelling typo
    - x86/cpu: Sanitize FAM6_ATOM naming
    - kvm: x86: Report STIBP on GET_SUPPORTED_CPUID
    - locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a
      new <linux/bits.h> file
    - tools include: Adopt linux/bits.h
    - x86/msr-index: Cleanup bit defines
    - x86/speculation: Consolidate CPU whitelists
    - x86/speculation/mds: Add basic bug infrastructure for MDS
    - x86/speculation/mds: Add BUG_MSBDS_ONLY
    - x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
    - x86/speculation/mds: Add mds_clear_cpu_buffers()
    - x86/speculation/mds: Clear CPU buffers on exit to user
    - x86/kvm/vmx: Add MDS protection when L1D Flush is not active
    - x86/speculation/mds: Conditionally clear CPU buffers on idle entry
    - x86/speculation/mds: Add mitigation control for MDS
    - x86/speculation/mds: Add sysfs reporting for MDS
    - x86/speculation/mds: Add mitigation mode VMWERV
    - Documentation: Move L1TF to separate directory
    - Documentation: Add MDS vulnerability documentation
    - x86/speculation/mds: Add mds=full,nosmt cmdline option
    - x86/speculation: Move arch_smt_update() call to after mitigation decisions
    - x86/speculation/mds: Add SMT warning message
    - x86/speculation/mds: Fix comment
    - x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
    - x86/speculation/mds: Add 'mitigations=' support for MDS

  * CVE-2017-5715 // CVE-2017-5753
    - s390/speculation: Support 'mitigations=' cmdline option

  * CVE-2017-5715 // CVE-2017-5753 // CVE-2017-5754 // CVE-2018-3639
    - powerpc/speculation: Support 'mitigations=' cmdline option

  * CVE-2017-5715 // CVE-2017-5754 // CVE-2018-3620 // CVE-2018-3639 //
    CVE-2018-3646
    - cpu/speculation: Add 'mitigations=' cmdline option
    - x86/speculation: Support 'mitigations=' cmdline option

  * Packaging resync (LP: #1786013)
    - [Packaging] resync git-ubuntu-log

linux (4.15.0-49.53) bionic; urgency=medium

  * linux: 4.15.0-49.53 -proposed tracker (LP: #1826358)

  * Backport support for software count cache flush Spectre v2 mitigation. (CVE)
    (required for POWER9 DD2.3) (LP: #1822870)
    - powerpc/64s: Add support for ori barrier_nospec patching
    - powerpc/64s: Patch barrier_nospec in modules
    - powerpc/64s: Enable barrier_nospec based on firmware settings
    - powerpc: Use barrier_nospec in copy_from_user()
    - powerpc/64: Use barrier_nospec in syscall entry
    - powerpc/64s: Enhance the information in cpu_show_spectre_v1()
    - powerpc/64: Disable the speculation barrier from the command line
    - powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.
    - powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC
    - powerpc/64: Call setup_barrier_nospec() from setup_arch()
    - powerpc/64: Make meltdown reporting Book3S 64 specific
    - powerpc/lib/code-patching: refactor patch_instruction()
    - powerpc/lib/feature-fixups: use raw_patch_instruction()
    - powerpc/asm: Add a patch_site mac...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for linux-aws has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers