ubuntu_nbd_smoke_test failed with I/O error on X-4.15

Bug #1814237 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Undecided
Unassigned
linux-aws (Ubuntu)
New
Undecided
Unassigned

Bug Description

Kernel: 4.15.0-45.48~16.04.1

creating backing nbd image /tmp/nbd_image.img

--------------------------------------------------------------------------------
Image path: /tmp/nbd_image.img
Mount point: /mnt/nbd-test-25789
Date: Thu Jan 31 14:42:42 UTC 2019
Host: onibi
Kernel: 4.15.0-45-generic #48~16.04.1-Ubuntu SMP Tue Jan 29 18:03:48 UTC 2019
Machine: onibi x86_64 x86_64
CPUs online: 4
CPUs total: 4
Page size: 4096
Pages avail: 1351388
Pages total: 2038431
Free space:
Filesystem Size Used Avail Use% Mounted on
udev 3.9G 0 3.9G 0% /dev
tmpfs 797M 17M 780M 3% /run
/dev/sda1 917G 9.1G 861G 2% /
tmpfs 3.9G 4.0K 3.9G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
tmpfs 797M 0 797M 0% /run/user/1000
tmpfs 100K 0 100K 0% /var/lib/lxd/shmounts
tmpfs 100K 0 100K 0% /var/lib/lxd/devlxd
cgmfs 100K 0 100K 0% /run/cgmanager/fs
--------------------------------------------------------------------------------

NBD device /dev/nbd0 created
found nbd export
NBD exports found:
test
starting client with NBD device /dev/nbd0
Negotiation: ..size = 128MB
creating ext4 on /dev/nbd0
mkfs on /dev/nbd0 succeeded after 0 attempt(s)
checking ext4 on /dev/nbd0
fsck from util-linux 2.27.1
/dev/nbd0: clean, 11/32768 files, 9787/131072 blocks

mount:
/dev/nbd0 on /mnt/nbd-test-25789 type ext4 (rw,relatime,data=ordered)
mounted on /dev/nbd0

free:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/nbd0 122835 1550 112111 2% /mnt/nbd-test-25789

creating large file /mnt/nbd-test-25789/largefile
-rw-r--r-- 1 root root 100M Jan 31 14:42 /mnt/nbd-test-25789/largefile

free:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/nbd0 122835 103951 9710 92% /mnt/nbd-test-25789

removing file /mnt/nbd-test-25789/largefile
unmounting /mnt/nbd-test-25789
stopping client
disconnect, sock, done
Found kernel warning, IO error and/or call trace
echo
[ 4593.254552] creating backing nbd image /tmp/nbd_image.img
[ 4595.506134] NBD device /dev/nbd0 created
[ 4595.557990] found nbd export
[ 4596.604855] starting client with NBD device /dev/nbd0
[ 4596.606795] creating ext4 on /dev/nbd0
[ 4597.013721] mkfs on /dev/nbd0 succeeded after 0 attempt(s)
[ 4597.390241] checking ext4 on /dev/nbd0
[ 4597.455371] EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
[ 4597.460552] mounted on /dev/nbd0
[ 4597.461810] creating large file /mnt/nbd-test-25789/largefile
[ 4599.056704] removing file /mnt/nbd-test-25789/largefile
[ 4599.276298] unmounting /mnt/nbd-test-25789
[ 4600.319460] stopping client
[ 4600.320083] block nbd0: NBD_DISCONNECT
[ 4600.320127] block nbd0: shutting down sockets
[ 4600.320267] nbd0: detected capacity change from 0 to 134217728
[ 4600.320306] print_req_error: 7 callbacks suppressed
[ 4600.320308] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.325890] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.333002] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.338653] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.345687] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.351253] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.358279] ldm_validate_partition_table(): Disk read failed.
[ 4600.358305] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.363892] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.371075] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.376645] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.383687] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.389230] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.396307] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.401819] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.408826] Dev nbd0: unable to read RDB block 0
[ 4600.413467] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.418955] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.425858] print_req_error: I/O error, dev nbd0, sector 24
[ 4600.431433] Buffer I/O error on dev nbd0, logical block 3, async page read
[ 4600.438422] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.443995] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.451003] nbd0: unable to read partition table
[ 4600.584152] Found kernel warning, IO error and/or call trace
[ 4600.584201] echo
killing server
================================================================================

Completed

Kernel issues:

Found kernel warning, IO error and/or call trace:

TEST:

[ 4593.254552] creating backing nbd image /tmp/nbd_image.img
[ 4595.506134] NBD device /dev/nbd0 created
[ 4595.557990] found nbd export
[ 4596.604855] starting client with NBD device /dev/nbd0
[ 4596.606795] creating ext4 on /dev/nbd0
[ 4597.013721] mkfs on /dev/nbd0 succeeded after 0 attempt(s)
[ 4597.390241] checking ext4 on /dev/nbd0
[ 4597.455371] EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
[ 4597.460552] mounted on /dev/nbd0
[ 4597.461810] creating large file /mnt/nbd-test-25789/largefile
[ 4599.056704] removing file /mnt/nbd-test-25789/largefile
[ 4599.276298] unmounting /mnt/nbd-test-25789
[ 4600.319460] stopping client
[ 4600.320083] block nbd0: NBD_DISCONNECT
[ 4600.320127] block nbd0: shutting down sockets
[ 4600.320267] nbd0: detected capacity change from 0 to 134217728
[ 4600.320306] print_req_error: 7 callbacks suppressed
[ 4600.320308] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.325890] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.333002] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.338653] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.345687] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.351253] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.358279] ldm_validate_partition_table(): Disk read failed.
[ 4600.358305] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.363892] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.371075] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.376645] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.383687] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.389230] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.396307] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.401819] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.408826] Dev nbd0: unable to read RDB block 0
[ 4600.413467] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.418955] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.425858] print_req_error: I/O error, dev nbd0, sector 24
[ 4600.431433] Buffer I/O error on dev nbd0, logical block 3, async page read
[ 4600.438422] print_req_error: I/O error, dev nbd0, sector 0
[ 4600.443995] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 4600.451003] nbd0: unable to read partition table
[ 4600.584152] Found kernel warning, IO error and/or call trace
[ 4600.584201] echo
stderr:
bs=1024, sz=134217728 bytes
timeout=30
e2fsck 1.42.13 (17-May-2015)

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I can see this test has passed with 4.15.0-44.47~16.04.1, so this might be a regression.

description: updated
Revision history for this message
Colin Ian King (colin-king) wrote :

Can we re-run this to see if it's 100% reproduceable?

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1814237

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Some more useful info:

* This issue started to be caught by the testcase after the following test script change:
https://kernel.ubuntu.com/git/ubuntu/autotest-client-tests.git/commit/ubuntu_nbd_smoke_test/ubuntu_nbd_smoke_test.sh?id=92ffd0ff14715392730943a898a95920b243ebef

* I have traced back these I/O errors on released from a couple of months ago. Checked xenial/linux-hwe and bionic/linux-azure.

* This doesn't seem to affect the master kernel, bionic/linux.

Revision history for this message
Sean Feole (sfeole) wrote :
Download full text (18.1 KiB)

This bug appeared to surface on aws Bionic linux-aws kernel-version": "4.15.0-1035.37", AMD64
We also see this occurring on arm64 instances a1.medium so it does not appear to be arch specific.

AMD64 Instances: c5.large , m4.large, m5a.large, r3.large , r5.large, t2.small and t3.medium.

This is not 100% reproducible, if affects different instance types during each SRU cycle and appears to be random. However the failure, although not identical bit for bit, is a buffer I/O error

I restarted a select few of the failed tests some passed completely and some failed again.

I attempted to reproduce on m5a.large.

1.) Deploy m5a.large using cloud prov script with Bionic, ensure kernel is 4.15.0-1035.37 (in proposed at the time of writing this comment)
2.) Clone autotest and autotest-client-tests:
    git clone https://github.com/autotest/autotest.git
    git clone git://kernel.ubuntu.com/ubuntu/autotest-client-tests

3.) sudo apt-get update && sudo apt-get install python-pkg-resources
4.)

untu@ip-172-31-10-8:~/autotest/client$ sudo ./autotest-local -d ~/autotest-client-tests/ ~/autotest-client-tests/ubuntu_nbd_smoke_test/control

19:42:55 INFO | Writing results to /home/ubuntu/autotest/client/results/default
19:42:55 INFO | START ---- ---- timestamp=1553283775 localtime=Mar 22 19:42:55
19:42:55 INFO | START ubuntu_nbd_smoke_test.nbd-smoke-test ubuntu_nbd_smoke_test.nbd-smoke-test timestamp=1553283775 timeout=600 localtime=Mar 22 19:42:55
19:42:58 ERROR| [stderr] bs=1024, sz=134217728 bytes
19:42:58 ERROR| [stderr] timeout=30
19:42:58 ERROR| [stderr] e2fsck 1.44.1 (24-Mar-2018)
19:43:01 ERROR| Exception escaping from test:
Traceback (most recent call last):
  File "/home/ubuntu/autotest/client/shared/test.py", line 411, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 823, in _call_test_function
    return func(*args, **dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 291, in execute
    postprocess_profiled_run, args, dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 212, in _call_run_once
    self.run_once(*args, **dargs)
  File "/home/ubuntu/autotest-client-tests/ubuntu_nbd_smoke_test/ubuntu_nbd_smoke_test.py", line 51, in run_once
    self.results = utils.system_output(cmd, retain_output=True)
  File "/home/ubuntu/autotest/client/shared/utils.py", line 1267, in system_output
    verbose=verbose, args=args).stdout
  File "/home/ubuntu/autotest/client/shared/utils.py", line 918, in run
    "Command returned non-zero exit status")
CmdError: Command </home/ubuntu/autotest-client-tests/ubuntu_nbd_smoke_test/ubuntu_nbd_smoke_test.sh> failed, rc=1, Command returned non-zero exit status
* Command:
    /home/ubuntu/autotest-client-
    tests/ubuntu_nbd_smoke_test/ubuntu_nbd_smoke_test.sh
Exit status: 1
Duration: 6.14667606354

stdout:
creating backing nbd image /tmp/nbd_image.img

--------------------------------------------------------------------------------
Image path: /tmp/nbd_image.img
Mount point: /mnt/nbd-test-7736
Date: Fri Mar 22 19:42:56 UTC 2019
Host: ip-172-31-10-8
Kernel: 4.15.0-1034-aw...

Revision history for this message
Sean Feole (sfeole) wrote :

Affects , 4.15.0-1036.38~16.04.1 linux-aws-hwe

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Probably a dup with bug 1822247.

Let's see if we can still see this in the next cycle.

Po-Hsu Lin (cypressyew)
tags: added: ubuntu-nbd-smoke-test
Po-Hsu Lin (cypressyew)
tags: added: 4.15 xenial
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.