xfstest fails with corrupt file /mnt/scratch/1 - non-zero size but no extents ( ext4 )

Bug #1704730 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Won't Fix
Medium
Canonical Kernel Team
linux (Ubuntu)
Won't Fix
Medium
Colin Ian King

Bug Description

xfstests fails non-zero size but no extents on ext4 filesystem

Environment
------------------
Kernel Build: 4.12.1-041201-generic

Model : 8247-22L
Platform : PowerNV ( P8 )

Uname output
-------------------
# uname -a
Linux ltc-test-ci2 4.12.1-041201-generic #201707121132 SMP Wed Jul 12 17:03:25 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Steps to reproduce:
----------------------------------------
1. Create a loop device with ext4 filesystem
2. git clone git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git; cd xfstests-dev
3. make
4. Create a local.config for running with created loop device
5. Run xfstests-dev test : ./check tests/generic/044

The test 044 fails with following
generic/044 - output mismatch (see /root/harish/xfstests-dev/results//generic/044.out.bad)
    --- tests/generic/044.out 2017-07-13 06:04:36.208323135 -0400
    +++ /root/harish/xfstests-dev/results//generic/044.out.bad 2017-07-14 06:24:08.153731112 -0400
    @@ -1 +1,1000 @@
     QA output created by 044
    +corrupt file /mnt/scratch/1 - non-zero size but no extents
    +corrupt file /mnt/scratch/2 - non-zero size but no extents
    +corrupt file /mnt/scratch/3 - non-zero size but no extents
    +corrupt file /mnt/scratch/4 - non-zero size but no extents
    +corrupt file /mnt/scratch/5 - non-zero size but no extents
    +corrupt file /mnt/scratch/6 - non-zero size but no extents
    ...
    (Run 'diff -u tests/generic/044.out /root/harish/xfstests-dev/results//generic/044.out.bad' to see the entire diff)
Ran: generic/044
Failures: generic/044
Failed 1 of 1 tests

Dmesg:
----------
[17244.878673] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
[17245.517227] EXT4-fs (loop2): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[17245.697100] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[17245.710634] run fstests generic/044 at 2017-07-14 06:23:49
[17246.534410] EXT4-fs (loop2): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[17246.535534] EXT4-fs (loop2): shut down requested (1)
[17246.535625] Aborting journal on device loop2-8.
[17247.278467] EXT4-fs (loop2): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[17259.888304] EXT4-fs (loop2): shut down requested (2)
[17259.995751] Aborting journal on device loop2-8.
[17260.113582] EXT4-fs (loop2): recovery complete
[17260.113902] EXT4-fs (loop2): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[17260.190076] EXT4-fs (loop2): mounted filesystem with ordered data mode. Opts: acl,user_xattr
[17264.821978] EXT4-fs (loop2): mounted filesystem with ordered data mode. Opts: acl,user_xattr

== Comment: #2 - SEETEENA THOUFEEK <email address hidden> - 2017-07-17 02:10:52 ==
Issue does not happen when running the same test with xfs file system. (ie, creating loop device with xfs file system).

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-156699 severity-medium targetmilestone-inin1710
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → xfsprogs (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
importance: Undecided → Medium
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Steve Langasek (vorlon) wrote :

I do not know if it's reasonable to expect xfstests to pass against ext4. If it is, this is something that would need to be handled in the kernel (not in xfsprogs, which was certainly not involved in the creation of the filesystem). Reassigning.

affects: xfsprogs (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
Changed in ubuntu-power-systems:
assignee: Canonical Foundations Team (canonical-foundations) → nobody
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Kernel version 4.12.1-041201-generic appears to be an upstream kernel. Does this bug happen with the supported stock Ubuntu kernel?

tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-07-20 01:12 EDT-------
(In reply to comment #7)
> Kernel version 4.12.1-041201-generic appears to be an upstream kernel. Does
> this bug happen with the supported stock Ubuntu kernel?

yes, the following are the results.

# ./check tests/generic/044
FSTYP -- ext4
PLATFORM -- Linux/ppc64le ltc-test-ci2 4.11.0-10-generic
MKFS_OPTIONS -- /dev/loop1
MOUNT_OPTIONS -- -o acl,user_xattr /dev/loop1 /mnt/scratch

generic/044 14s ... - output mismatch (see /root/harish/xfstests-dev/results//generic/044.out.bad)
--- tests/generic/044.out 2017-07-13 06:04:36.208323135 -0400
+++ /root/harish/xfstests-dev/results//generic/044.out.bad 2017-07-20 01:12:00.395104813 -0400
@@ -1 +1,1000 @@
QA output created by 044
+corrupt file /mnt/scratch/1 - non-zero size but no extents
+corrupt file /mnt/scratch/2 - non-zero size but no extents
+corrupt file /mnt/scratch/3 - non-zero size but no extents
+corrupt file /mnt/scratch/4 - non-zero size but no extents
+corrupt file /mnt/scratch/5 - non-zero size but no extents
+corrupt file /mnt/scratch/6 - non-zero size but no extents
...
(Run 'diff -u tests/generic/044.out /root/harish/xfstests-dev/results//generic/044.out.bad' to see the entire diff)
Ran: generic/044
Failures: generic/044
Failed 1 of 1 tests

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.13 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc1/

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-07-24 02:15 EDT-------
(In reply to comment #9)
> Did this issue start happening after an update/upgrade? Was there a prior
> kernel version where you were not having this particular problem?
>
Not sure if any prior kernel did not have this issue.. But issue seems to occur on 17 10 base kernel.

> Would it be possible for you to test the latest upstream kernel? Refer to
> https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.13
> kernel[0].
Tested with 4.13-rc2 from the same link, Issue is still observed.

xfstests-dev# ./check tests/generic/044
FSTYP -- ext4
PLATFORM -- Linux/ppc64le ltc-test-ci2 4.13.0-041300rc2-generic
MKFS_OPTIONS -- /dev/loop1
MOUNT_OPTIONS -- -o acl,user_xattr /dev/loop1 /mnt/scratch

generic/044
- output mismatch (see /root/harish/xfstests-dev/results//generic/044.out.bad)
--- tests/generic/044.out 2017-07-20 07:57:37.069755021 -0400
+++ /root/harish/xfstests-dev/results//generic/044.out.bad 2017-07-24 02:10:00.860000000 -0400
@@ -1 +1,1000 @@
QA output created by 044
+corrupt file /mnt/scratch/1 - non-zero size but no extents
+corrupt file /mnt/scratch/2 - non-zero size but no extents
+corrupt file /mnt/scratch/3 - non-zero size but no extents
+corrupt file /mnt/scratch/4 - non-zero size but no extents
+corrupt file /mnt/scratch/5 - non-zero size but no extents
+corrupt file /mnt/scratch/6 - non-zero size but no extents
...
(Run 'diff -u tests/generic/044.out /root/harish/xfstests-dev/results//generic/044.out.bad' to see the entire diff)
Ran: generic/044
Failures: generic/044
Failed 1 of 1 tests

Thanks,
Harish S

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Incomplete
tags: added: triage-g
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
status: Incomplete → Triaged
Revision history for this message
Manoj Iyer (manjo) wrote :

Kernel team, what is the next step on this bug? Should IBM open a bugzilla for this issue?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a kernel version where you were not having this particular problem? This will help determine if the problem you are seeing is the result of a regression, and when this regression was introduced. If this is a regression, we can perform a kernel bisect to identify the commit that introduced the problem.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Also, it might be worth testing v4.14-rc1:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc1/

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-20 23:29 EDT-------
(In reply to comment #12)
> Also, it might be worth testing v4.14-rc1:
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc1/

Looks like build failed for ppc, can you check?

Harish

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It looks like the mainline ppc builds have been failing since v4.13-rc5. The -rc4 kernel did build:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc4/

tags: added: kernel-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'll investigate the mainline build failures.

tags: removed: kernel-da-key
tags: added: kernel-da-key
removed: kernel-key
tags: added: triage-a
removed: triage-g
tags: added: triage-r
removed: triage-a
Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → Colin Ian King (colin-king)
Changed in linux (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :

Before 4.11 this xfstest would be skipped over because ext4 did not support the functionality required to run this test. Since 4.11 this test has always failed, even right up to the present 4.15-rc1. I've checked this out on a couple of other architectures and this is test fails across the board for ext4.

Revision history for this message
Colin Ian King (colin-king) wrote :

Tests 044, 045, 046 are known to upstream as tests that will currently fail on ext4 as reported back in February 27th, 2017: https://lkml.org/lkml/2017/2/27/674

The upstream maintainer followed up the report with: https://lkml.org/lkml/2017/2/28/4

"On Tue, Feb 28, 2017 at 11:25:56AM +0800, Xiong Zhou wrote:
>
> On latest Linus tree, xfstests generic/04{4,5,6} fails.

Yes, that's known issue. generic/04[456] were originally XFS specific
tests, and they have have assumptions about the implementation of the
underlying file system.

We have a few of those at the moment in kvm-xfstests/gce-xfstests's
exclude file:

https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/test-appliance/files/root/fs/ext4/exclude

# generic/042 and generic/392 are failing because ext4 forces the
# resolution of all delayed allocation writes before allowing the
# punch operation to proceed. We probably want to see if we can avoid
# this for the future, but what ext4 is doing is legal, so just skip
# the test for now
generic/042
generic/392

# generic/04[456] tests how truncate and delayed allocation works
# ext4 uses the data=ordered to avoid exposing stale data, and
# so it uses a different mechanism than xfs. So these tests will fail
generic/044
generic/045
generic/046

# generic/223 tests file alignment, which works on ext4 only by
# accident because we're not RAID stripe aware yet, and works at all
# because we have bias towards aligning on power-of-two block numbers.
# It is a flaky test for some configurations, so skip it.
generic/223

# ext4/304 fails for all configurations, and this appears to be at
# test or fio bug.
#
ext4/304"

---

So I think these failures are false positive results for ext4.

Changed in ubuntu-power-systems:
status: Triaged → In Progress
tags: added: triage-g
removed: triage-r
Revision history for this message
Colin Ian King (colin-king) wrote :

Since I think these reported errors are false positives for ext4 at present, should this be marked as "Won't Fix"?

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-12-11 11:08 EDT-------
(In reply to comment #17)
> Since I think these reported errors are false positives for ext4 at present,
> should this be marked as "Won't Fix"?

That is Ok for me. If we don't hear anything from the other team, let's proceed with the Won't Fix path.

Manoj Iyer (manjo)
Changed in linux (Ubuntu):
status: In Progress → Won't Fix
Changed in ubuntu-power-systems:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.