linux 5.3.0-46 and 5.4.0-24: XFS with reflinks and duperemove: in-memory data corruption

Bug #1873555 reported by Török Edwin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have reproduced this multiple times on both Ubuntu 19.10 (5.3.0-16) and Ubuntu 20.04 beta (5.4.0-24)

sudo duperemove --lookup-extents=yes -d -r --hashfile /mnt/storage/hashes
[...]
0x556dfa962980] Dedupe 2 extents (id: b285a8ff) with target: (0, 2255716352), "/mnt/storage/from-restic/10350278/tmp/fast/c14-sd-92016-2-home/2017-06-17-233341/2017-06-17-233341/edwin/.mu/xapian/postlist.DB"
[0x556de5dc7300] (154238/154243) Try to dedupe extents with id 3fc7b31b
[0x556de5dc7300] Dedupe 3 extents (id: 3fc7b31b) with target: (0, 2463842304), "/mnt/storage/restore/25147310-89a6-4a93-9cd8-a582125c12e7/home/edwin/Downloads/ubuntu-19.10-desktop-amd64.iso"
[0x556dfa962860] (154239/154243) Try to dedupe extents with id f5615456
[0x556dfa962860] Dedupe 1 extents (id: f5615456) with target: (0, 5477811089), "/mnt/storage/restore/25147310-89a6-4a93-9cd8-a582125c12e7/home/edwin/nokia/backup.ab"
[0x556dfa92bb00] (154240/154243) Try to dedupe extents with id a9595eaa
[0x556dfa92bb00] Dedupe 1 extents (id: a9595eaa) with target: (0, 11050147840), "/mnt/storage/restore/25147310-89a6-4a93-9cd8-a582125c12e7/home/edwin/laptop-backup.squashfs"
[0x556dfa962920] (154241/154243) Try to dedupe extents with id 229145cb
[0x556dfa962920] Dedupe 1 extents (id: 229145cb) with target: (0, 13606766116), "/mnt/storage/from-restic/0024a369/mnt/fast/listing-sorted0"
[0x556de5dc7180] (154242/154243) Try to dedupe extents with id 2eee56f0
[0x556de5dc7180] Dedupe 2 extents (id: 2eee56f0) with target: (0, 13606766163), "/mnt/storage/from-restic/0024a369/mnt/fast/listing-sorted"
[0x556dfa92bb60] (154243/154243) Try to dedupe extents with id b6a1de37
[0x556dfa92bb60] Dedupe 1 extents (id: b6a1de37) with target: (0, 13890395789), "/mnt/storage/from-restic/0024a369/mnt/fast/listing0"
[0x556dfa962980] Dedupe for file "/mnt/storage/from-restic/0fcf01c9/tmp/fast/c14-sd-92016-2-home/2017-05-23-233346/2017-05-23-233346/edwin/.mu/xapian/postlist.DB" had status (-5) "Input/output error".
[0x556dfa962980] Dedupe for file "/mnt/storage/from-restic/04683ed4/tmp/fast/c14-sd-92016-2-home/2017-06-07-233355/2017-06-07-233355/edwin/.mu/xapian/postlist.DB" had status (-5) "Input/output error".
Error 5: Input/output error while opening "/mnt/storage/from-restic/10350278/tmp/fast/c14-sd-92016-2-home/2017-06-17-233341/2017-06-17-233341/edwin/.mu/xapian/postlist.DB" (write=0)
Error 5: Input/output error while opening "/mnt/storage/from-restic/04683ed4/tmp/fast/c14-sd-92016-2-home/2017-06-07-233355/2017-06-07-233355/edwin/.mu/xapian/postlist.DB" (write=0)
Error 5: Input/output error while opening "/mnt/storage/from-restic/0fcf01c9/tmp/fast/c14-sd-92016-2-home/2017-05-23-233346/2017-05-23-233346/edwin/.mu/xapian/postlist.DB" (write=0)
[0x556de5dc7240] Dedupe for file "/mnt/storage/restore/root.squashfs/home/edwin/cycles_benchmark_20160228.zip" had status (-5) "Input/output error".
[0x556de5dc7240] Dedupe for file "/mnt/storage/restore/root.squashfs/home/edwin/.phoronix-test-suite/installed-tests/pts/blender-1.4.1/cycles_benchmark_20160228.zip" had status (-5) "Input/output error".
[0x556de5dc7240] Dedupe for file "/mnt/storage/restore/root.squashfs/home/edwin/.phoronix-test-suite/installed-tests/pts/blender-1.7.0/cycles_benchmark_20160228.zip" had status (-5) "Input/output error".
[0x556dfa9629e0] Dedupe for file "/mnt/storage/from-restic/0024a369/mnt/fast/photos/Camera/VID_20180825_121050.mp4" had status (-5) "Input/output error".
[...]

The kernel shows that XFS data structures got corrupted and it unmounted itself to protect it from further damage:

[104763.870005] INFO: task pool:97746 blocked for more than 724 seconds.
[104763.870008] Tainted: G W 5.4.0-24-generic #28-Ubuntu
[104763.870009] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[104763.870011] pool D 0 97746 47956 0x00000000
[104763.870013] Call Trace:
[104763.870019] __schedule+0x2e3/0x740
[104763.870021] schedule+0x42/0xb0
[104763.870022] rwsem_down_read_slowpath+0x16c/0x4a0
[104763.870051] ? xfs_vn_fiemap+0x32/0x80 [xfs]
[104763.870052] down_read+0x85/0xa0
[104763.870071] xfs_ilock+0x8a/0x100 [xfs]
[104763.870088] xfs_vn_fiemap+0x32/0x80 [xfs]
[104763.870090] do_vfs_ioctl+0x2c0/0x670
[104763.870091] ksys_ioctl+0x67/0x90
[104763.870093] __x64_sys_ioctl+0x1a/0x20
[104763.870094] do_syscall_64+0x57/0x190
[104763.870096] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[104763.870097] RIP: 0033:0x7fe0d575637b
[104763.870101] Code: Bad RIP value.
[104763.870102] RSP: 002b:00007fe0c9441bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[104763.870103] RAX: ffffffffffffffda RBX: 00007fe090006370 RCX: 00007fe0d575637b
[104763.870104] RDX: 00007fe0c9441c30 RSI: 00000000c020660b RDI: 0000000000000012
[104763.870104] RBP: 0000556debad9350 R08: 0000000000000000 R09: 00007fe09004bf28
[104763.870105] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe09004bf28
[104763.870105] R13: 0000000000000000 R14: 0000556de3ef0b46 R15: 0000000012c60000
[108129.210852] XFS (dm-4): Internal error xfs_trans_cancel at line 1048 of file fs/xfs/xfs_trans.c. Caller xfs_reflink_remap_extent+0x100/0x560 [xfs]
[108129.210856] CPU: 15 PID: 97752 Comm: pool Tainted: G W 5.4.0-24-generic #28-Ubuntu
[108129.210857] Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS, BIOS 1405 11/19/2019
[108129.210858] Call Trace:
[108129.210862] dump_stack+0x6d/0x9a
[108129.210883] xfs_error_report+0x36/0x40 [xfs]
[108129.210902] ? xfs_reflink_remap_extent+0x100/0x560 [xfs]
[108129.210920] xfs_trans_cancel+0x134/0x160 [xfs]
[108129.210937] xfs_reflink_remap_extent+0x100/0x560 [xfs]
[108129.210954] ? xfs_bmapi_read+0x202/0x340 [xfs]
[108129.210971] xfs_reflink_remap_blocks+0x11e/0x300 [xfs]
[108129.210989] xfs_file_remap_range+0x110/0x2c0 [xfs]
[108129.210992] vfs_dedupe_file_range_one+0x12d/0x150
[108129.210993] vfs_dedupe_file_range+0x156/0x1e0
[108129.210994] do_vfs_ioctl+0x340/0x670
[108129.210996] ksys_ioctl+0x67/0x90
[108129.210997] __x64_sys_ioctl+0x1a/0x20
[108129.210998] do_syscall_64+0x57/0x190
[108129.211000] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[108129.211001] RIP: 0033:0x7fe0d575637b
[108129.211003] Code: 0f 1e fa 48 8b 05 15 3b 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 3a 0d 00 f7 d8 64 89 01 48
[108129.211004] RSP: 002b:00007fe089ffac38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[108129.211005] RAX: ffffffffffffffda RBX: 00007fe07004adc0 RCX: 00007fe0d575637b
[108129.211006] RDX: 00007fe07000c690 RSI: 00000000c0189436 RDI: 0000000000000017
[108129.211006] RBP: 00007fe070084920 R08: 0000000000000002 R09: 00007fe070084970
[108129.211007] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe070084958
[108129.211007] R13: 00007fe070084960 R14: 00007fe070084970 R15: 00007fe07000c690
[108129.230787] XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 1049 of file fs/xfs/xfs_trans.c. Return address = 00000000e903a147
[108129.230789] XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem
[108129.230790] XFS (dm-4): Please unmount the filesystem and rectify the problem(s)
[124069.735928] XFS (dm-4): Unmounting Filesystem
[124073.911647] XFS (dm-4): Mounting V5 Filesystem
[124074.026243] XFS (dm-4): Starting recovery (logdev: internal)
[124074.382784] XFS (dm-4): Ending recovery (logdev: internal)
[124074.632650] xfs filesystem being mounted at /mnt/storage supports timestamps until 2038 (0x7fffffff)
[124076.224530] XFS (dm-4): Unmounting Filesystem

I've got a filesystem where I've unpacked some of my Restic backups to test the feasibility of using XFS+reflinks as a replacement for Restic.

xfs_info /mnt/storage:
meta-data=/dev/mapper/storage-backup isize=512 agcount=4, agsize=183141376 blks
         = sectsz=4096 attr=2, projid32bit=1
         = crc=1 finobt=1, sparse=1, rmapbt=0
         = reflink=1
data = bsize=4096 blocks=732565504, imaxpct=5
         = sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=357698, version=2
         = sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0

Initially I had a write-through lvmcache device in front of the above, but I was able to reproduce it even after removing it and running xfs_repair and repeating the operation.
Takes about a day or so to reproduce the issue by running duperemove again, but so far it reproduces 100%.

The latest xfs_repair output looks like this (initially it had a lot more 'clearing reflink flag on inode N' output:
sudo xfs_repair /dev/mapper/storage-backup 2>&1| tee dedupe-bug/dmesg-5
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 3
        - agno = 0
        - agno = 2
        - agno = 1
clearing reflink flag on inodes when possible
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

I'll also report this upstream but wanted to report here first.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-24-generic 5.4.0-24.28
ProcVersionSignature: Ubuntu 5.4.0-24.28-generic 5.4.30
Uname: Linux 5.4.0-24-generic x86_64
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: edwin 2183 F.... pulseaudio
 /dev/snd/controlC2: edwin 2183 F.... pulseaudio
 /dev/snd/controlC0: edwin 2183 F.... pulseaudio
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Sat Apr 18 10:30:00 2020
InstallationDate: Installed on 2020-03-26 (22 days ago)
InstallationMedia: Ubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017)
IwConfig:
 enp6s0 no wireless extensions.

 lo no wireless extensions.
MachineType: System manufacturer System Product Name
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-24-generic root=/dev/mapper/storage-ubuntu--1910--root ro quiet splash audit=0 vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-24-generic N/A
 linux-backports-modules-5.4.0-24-generic N/A
 linux-firmware 1.187
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to focal on 2020-04-16 (1 days ago)
dmi.bios.date: 11/19/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1405
dmi.board.asset.tag: Default string
dmi.board.name: TUF GAMING X570-PLUS
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1405:bd11/19/2019:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnTUFGAMINGX570-PLUS:rvrRevX.0x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
Török Edwin (edwintorok) wrote :
Revision history for this message
Török Edwin (edwintorok) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Török Edwin (edwintorok) wrote :

Reported upstream: (didn't show up on linux-xfs mailing list yet)
https://lore<email address hidden>/#r

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.