stress_smoke_test passing and exiting rc=9 (linux 4.9.0-12.13 ADT test failure with linux 4.9.0-12.13)

Bug #1658633 reported by Andy Whitcroft on 2017-01-23
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
linux (Ubuntu)
High
Colin Ian King
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
Yakkety
Undecided
Unassigned

Bug Description

== SRU Request [ Trusty, Xenial, Yakkey ] + Zesty ==

When running the stress-ng --xattr stressor with several instances of the stressor on ext4 we can trip an xattr bug in the ext4 file system.

== Fix ==

Upstream commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dac7a4b4b1f664934e8b713f529b629f67db313c

ext4: lock the xattr block before checksuming it

We must lock the xattr block before calculating or verifying the
checksum in order to avoid spurious checksum failures.

https://bugzilla.kernel.org/show_bug.cgi?id=193661

Reported-by: Colin Ian King <email address hidden>
Signed-off-by: Theodore Ts'o <email address hidden>
Cc: <email address hidden>

== Test case ==

Fire up an x86 VM with 8 or more CPUs in the instance, run:

stress-ng --xattr 0 -t 60 -v

Without the fix, the file system will report broken xattrs and the file system will go read-only.

With the fix, it runs without fault.

== Regression Potential ==

This changes the checksumming in the ext4 xattr so it only touches the ext4 xattr part of the file system. Risk is therefore contained in the xattr handling on ext4. Tested with stress-ng and the generic file system tests without any regressions, so risk is limited and small.

---------------------------------------------------------

Testing failed on:
    ppc64el: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-zesty/zesty/ppc64el/l/linux/20170122_110123_770b2@/log.gz

Andy Whitcroft (apw) on 2017-01-23
tags: added: kernel-adt-failure

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1658633

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Download full text (3.4 KiB)

If I am reading this right the stress-smoke-tests are passing and still exiting 9:

10:48:46 DEBUG| Running '/tmp/autopkgtest.Oam7xW/build.p0V/linux-4.9.0/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh'
10:48:56 DEBUG| [stdout] affinity PASSED
10:49:06 DEBUG| [stdout] af-alg PASSED
10:49:16 DEBUG| [stdout] aio PASSED
10:49:26 DEBUG| [stdout] aiol PASSED
10:49:36 DEBUG| [stdout] atomic PASSED
10:49:46 DEBUG| [stdout] bigheap PASSED
10:49:46 DEBUG| [stdout] bind-mount SKIPPED (test framework out of resources or test should not be run)
10:49:57 DEBUG| [stdout] brk PASSED
10:50:07 DEBUG| [stdout] bsearch PASSED
10:50:17 DEBUG| [stdout] cache PASSED
10:50:27 DEBUG| [stdout] cap PASSED
10:50:37 DEBUG| [stdout] chdir PASSED
10:50:47 DEBUG| [stdout] chmod PASSED
10:50:57 DEBUG| [stdout] chown PASSED
10:51:07 DEBUG| [stdout] clock PASSED
10:51:21 DEBUG| [stdout] clone PASSED
10:51:32 DEBUG| [stdout] context PASSED
10:51:42 DEBUG| [stdout] cpu PASSED
10:51:52 DEBUG| [stdout] crypt PASSED
10:52:02 DEBUG| [stdout] daemon PASSED
10:52:03 DEBUG| [stdout] dccp PASSED
10:52:13 DEBUG| [stdout] dentry PASSED
10:52:23 DEBUG| [stdout] dir PASSED
10:52:33 DEBUG| [stdout] dirdeep PASSED
10:52:43 DEBUG| [stdout] dnotify PASSED
10:52:53 DEBUG| [stdout] dup PASSED
10:53:03 DEBUG| [stdout] epoll PASSED
10:53:13 DEBUG| [stdout] eventfd PASSED
10:53:23 DEBUG| [stdout] fallocate PASSED
10:53:33 DEBUG| [stdout] fanotify PASSED
10:53:43 DEBUG| [stdout] fault PASSED
10:53:53 DEBUG| [stdout] fcntl PASSED
10:54:04 DEBUG| [stdout] fiemap PASSED
10:54:14 DEBUG| [stdout] fifo PASSED
10:54:24 DEBUG| [stdout] filename PASSED
10:54:34 DEBUG| [stdout] flock PASSED
10:54:44 DEBUG| [stdout] fork PASSED
10:54:54 DEBUG| [stdout] fp-error PASSED
10:55:04 DEBUG| [stdout] fstat PASSED
10:55:14 DEBUG| [stdout] full PASSED
10:55:24 DEBUG| [stdout] futex PASSED
10:55:34 DEBUG| [stdout] get PASSED
10:55:44 DEBUG| [stdout] getdent PASSED
10:55:54 DEBUG| [stdout] getrandom PASSED
10:56:04 DEBUG| [stdout] handle PASSED
10:56:16 DEBUG| [stdout] hdd PASSED
10:56:26 DEBUG| [stdout] heapsort PASSED
10:56:36 DEBUG| [stdout] hsearch PASSED
10:56:36 DEBUG| [stdout] icache PASSED
10:56:46 DEBUG| [stdout] icmp-flood PASSED
10:56:56 DEBUG| [stdout] inotify PASSED
10:57:06 DEBUG| [stdout] io PASSED
10:57:16 DEBUG| [stdout] ioprio PASSED
10:57:26 DEBUG| [stdout] itimer PASSED
10:57:27 DEBUG| [stdout] key PASSED
10:57:37 DEBUG| [stdout] kill PASSED
10:57:47 DEBUG| [stdout] klog PASSED
10:57:57 DEBUG| [stdout] lease PASSED
10:58:07 DEBUG| [stdout] link PASSED
10:58:07 DEBUG| [stdout] lockbus PASSED
10:58:17 DEBUG| [stdout] locka PASSED
10:58:27 DEBUG| [stdout] lockf PASSED
10:58:37 DEBUG| [stdout] lockofd PASSED
10:58:47 DEBUG| [stdout] longjmp PASSED
10:58:57 DEBUG| [stdout] lsearch PASSED
10:59:07 DEBUG| [stdout] madvise PASSED
10:59:18 DEBUG| [stdout] malloc PASSED
10:59:28 DEBUG| [stdout] matrix PASSED
10:59:38 DEBUG| [stdout] membarrier PASSED
10:59:48 DEBUG| [stdout] memcpy PASSED
10:59:58 DEBUG| [stdout] memfd PASSED
11:00:08 DEBUG| [stdout] mergesort PASSED
11:00:18 DEBUG| [stdout] mincore PASSED
11:00:28 DEBUG| [stdout] mknod PASSED
11:00:34 INFO | ERROR ubuntu_stress_smo...

Read more...

description: updated
summary: - linux 4.9.0-12.13 ADT test failure with linux 4.9.0-12.13
+ stress_smoke_test passing and exiting rc=9 (linux 4.9.0-12.13 ADT test
+ failure with linux 4.9.0-12.13)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
assignee: nobody → Colin Ian King (colin-king)
importance: Undecided → High
status: Confirmed → In Progress
Colin Ian King (colin-king) wrote :

It is more probable that the mlock (or mmap) or later tests broke the test and we are just seeing the last PASSED tests that got fflushed to stdout before the ADT framework terminated.

Colin Ian King (colin-king) wrote :

Right, I think this occurs when ext4 goes read-only. A simple way to reproduce this on i386 systems with that kernel is:

sudo stress-ng --sockpair 10 && sudo stress-ng --xattr 10

xattr test causes ext4 to detect xattr issues and the file system gets remounted r/o, and we no longer can log the stress-ng ADT test log.

Colin Ian King (colin-king) wrote :

Does not break on xfs, so looks like an ext4 issue

Colin Ian King (colin-king) wrote :

Only breaks with ext4, i386 and > 1 cpu. Can't break amd64 or uniprocessor configs.

Colin Ian King (colin-king) wrote :

Issue still in 4.10-rc6

Colin Ian King (colin-king) wrote :

OK, looks like a need a cleanly formatted ext4 file system before *each* bisect otherwise I'm picking up xattr corruption from previous bisects.

Colin Ian King (colin-king) wrote :

Still an issue with 4.10

description: updated
description: updated
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.