XFS driver crash

Bug #1971201 reported by Nemir
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

Hi,

I'm experiencing a driver crash in XFS when doing a very specific action. (See attachment for storage layout).

Here's what I do to reproduce the issue:
Start a md check: echo check > /sys/block/md0/md/sync_action
Then I trigger an action (free, write, alloc) on the XFS mount and I experience a freeze of the XFS driver, BTRFS disks and ext4 disks are still working and can be accessed successfully.

lsb_release -rd
Description: Ubuntu 20.04.4 LTS
Release: 20.04

Revision history for this message
Nemir (nemirtingas) wrote :
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1971201

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (5.5 KiB)

Hi Nemir,

Thanks for reporting a bug and the detailed zip file.

Could you please test 5.4.0-100 [1] from focal-proposed? [2]

...

One of the stack traces in your `kernel_trace.txt` file [3]
similarly matches a stack trace from bug 1966803 comment 2 [4]
(from iput() to xfs_buf_lock()) that is fixed in that version.

I haven't checked the underlying mdraid/luks/crypt relation,
but the above is a quick next step forward.

cheers,
Mauricio

[1] buntu/+source/linux/5.4.0-110.124

[2] https://wiki.ubuntu.com/Testing/EnableProposed

[3]
mai 02 21:59:50 serverprive kernel: INFO: task minio:74769 blocked for more than 120 seconds.
mai 02 21:59:50 serverprive kernel: Not tainted 5.4.0-109-generic #123-Ubuntu
mai 02 21:59:50 serverprive kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 02 21:59:50 serverprive kernel: minio D 0 74769 74361 0x00000320
mai 02 21:59:50 serverprive kernel: Call Trace:
mai 02 21:59:50 serverprive kernel: __schedule+0x2e3/0x740
mai 02 21:59:50 serverprive kernel: schedule+0x42/0xb0
mai 02 21:59:50 serverprive kernel: schedule_timeout+0x10e/0x160
mai 02 21:59:50 serverprive kernel: __down+0x82/0xd0
mai 02 21:59:50 serverprive kernel: ? xfs_buf_find.isra.0+0x3bf/0x610 [xfs]
mai 02 21:59:50 serverprive kernel: down+0x47/0x60
mai 02 21:59:50 serverprive kernel: xfs_buf_lock+0x37/0xf0 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_buf_find.isra.0+0x3bf/0x610 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_buf_get_map+0x43/0x2b0 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_buf_read_map+0x2f/0x1d0 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_trans_read_buf_map+0xca/0x350 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_imap_to_bp+0x66/0xd0 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_iunlink_update_inode+0x55/0x110 [xfs]
mai 02 21:59:50 serverprive kernel: ? xfs_read_agi+0xcb/0x140 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_iunlink_remove+0x135/0x260 [xfs]
mai 02 21:59:50 serverprive kernel: ? xfs_trans_reserve+0x17a/0x1e0 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_ifree+0x45/0x160 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_inactive_ifree+0xae/0x1c0 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_inactive+0xaf/0x160 [xfs]
mai 02 21:59:50 serverprive kernel: xfs_fs_destroy_inode+0xad/0x1d0 [xfs]
mai 02 21:59:50 serverprive kernel: destroy_inode+0x41/0x80
mai 02 21:59:50 serverprive kernel: evict+0x14c/0x1b0
mai 02 21:59:50 serverprive kernel: iput+0x148/0x210
mai 02 21:59:50 serverprive kernel: dentry_unlink_inode+0xc6/0x110
mai 02 21:59:50 serverprive kernel: __dentry_kill+0xdf/0x180
mai 02 21:59:50 serverprive kernel: dput+0x150/0x2f0
mai 02 21:59:50 serverprive kernel: do_renameat2+0x3ad/0x570
mai 02 21:59:50 serverprive kernel: __x64_sys_renameat+0x21/0x30
mai 02 21:59:50 serverprive kernel: do_syscall_64+0x57/0x190
mai 02 21:59:50 serverprive kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 02 21:59:50 serverprive kernel: RIP: 0033:0x48290a

[4]
Mar 4 05:41:40 host kernel: [291932.970796] INFO: task perl:50597 blocked for more than 120 seconds.
Mar 4 05:41:40 host kernel: [291932.970952] Tainted: P OE 5.4.0-100-generic #113~18.04.1-U...

Read more...

Revision history for this message
Nemir (nemirtingas) wrote :

Hi Mauricio,

I can't test right now, I'm running integrity checks on my disks because of the crashs. But I will test your proposed solution as soon as I can.

Does a newer kernel also has the fix you proposed ? Cause I reproduced the exact same conditions on a Virtual Box VM with Ubuntu 20 and kernel linux-image-5.13.0-39-generic but couldn't reproduce the crash.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

No hurry, and thanks.

Yes, the fix is already applied since/in v5.5 and later.

~/git/linux$ git describe --contains 93597ae8dac0149b5c00b787cba6bf7ba213e666
v5.5-rc1~8^2~31

Revision history for this message
Nemir (nemirtingas) wrote :

Ok, should I still try 5.4.0-110.124 or can I upgrade to a newer kernel (like linux-image-5.13.0-39-generic) ?

Thats a home computer for NAS purpose. I have a full access to it.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

If you could test 5.4.0-110 that would be nice to confirm whether or not it's the same issue, and if so, and mark it as duplicate.

Theoretically this might be a different issue also fixed in 5.13.

Revision history for this message
Nemir (nemirtingas) wrote :

Hi Mauricio,

So, after my disks checks, I've installed:
linux-image-5.4.0-110-generic
linux-modules-5.4.0-110-generic
linux-modules-extra-5.4.0-110-generic

I restarted a raid check (echo check > /sys/block/md0/md/sync_action) and copied/moved/removed a 5GB file and I didn't encounter the driver crash.
I also triied an intensive file creation/deletion.

Seems fine to me for now.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hey Nemir,

That's great news; thanks for testing!

I'll set this bug as a dup of 1966803.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.