linux 5.3.0-46 and 5.4.0-24: XFS with reflinks and duperemove: in-memory data corruption
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I have reproduced this multiple times on both Ubuntu 19.10 (5.3.0-16) and Ubuntu 20.04 beta (5.4.0-24)
sudo duperemove --lookup-
[...]
0x556dfa962980] Dedupe 2 extents (id: b285a8ff) with target: (0, 2255716352), "/mnt/storage/
[0x556de5dc7300] (154238/154243) Try to dedupe extents with id 3fc7b31b
[0x556de5dc7300] Dedupe 3 extents (id: 3fc7b31b) with target: (0, 2463842304), "/mnt/storage/
[0x556dfa962860] (154239/154243) Try to dedupe extents with id f5615456
[0x556dfa962860] Dedupe 1 extents (id: f5615456) with target: (0, 5477811089), "/mnt/storage/
[0x556dfa92bb00] (154240/154243) Try to dedupe extents with id a9595eaa
[0x556dfa92bb00] Dedupe 1 extents (id: a9595eaa) with target: (0, 11050147840), "/mnt/storage/
[0x556dfa962920] (154241/154243) Try to dedupe extents with id 229145cb
[0x556dfa962920] Dedupe 1 extents (id: 229145cb) with target: (0, 13606766116), "/mnt/storage/
[0x556de5dc7180] (154242/154243) Try to dedupe extents with id 2eee56f0
[0x556de5dc7180] Dedupe 2 extents (id: 2eee56f0) with target: (0, 13606766163), "/mnt/storage/
[0x556dfa92bb60] (154243/154243) Try to dedupe extents with id b6a1de37
[0x556dfa92bb60] Dedupe 1 extents (id: b6a1de37) with target: (0, 13890395789), "/mnt/storage/
[0x556dfa962980] Dedupe for file "/mnt/storage/
[0x556dfa962980] Dedupe for file "/mnt/storage/
Error 5: Input/output error while opening "/mnt/storage/
Error 5: Input/output error while opening "/mnt/storage/
Error 5: Input/output error while opening "/mnt/storage/
[0x556de5dc7240] Dedupe for file "/mnt/storage/
[0x556de5dc7240] Dedupe for file "/mnt/storage/
[0x556de5dc7240] Dedupe for file "/mnt/storage/
[0x556dfa9629e0] Dedupe for file "/mnt/storage/
[...]
The kernel shows that XFS data structures got corrupted and it unmounted itself to protect it from further damage:
[104763.870005] INFO: task pool:97746 blocked for more than 724 seconds.
[104763.870008] Tainted: G W 5.4.0-24-generic #28-Ubuntu
[104763.870009] "echo 0 > /proc/sys/
[104763.870011] pool D 0 97746 47956 0x00000000
[104763.870013] Call Trace:
[104763.870019] __schedule+
[104763.870021] schedule+0x42/0xb0
[104763.870022] rwsem_down_
[104763.870051] ? xfs_vn_
[104763.870052] down_read+0x85/0xa0
[104763.870071] xfs_ilock+
[104763.870088] xfs_vn_
[104763.870090] do_vfs_
[104763.870091] ksys_ioctl+
[104763.870093] __x64_sys_
[104763.870094] do_syscall_
[104763.870096] entry_SYSCALL_
[104763.870097] RIP: 0033:0x7fe0d575637b
[104763.870101] Code: Bad RIP value.
[104763.870102] RSP: 002b:00007fe0c9
[104763.870103] RAX: ffffffffffffffda RBX: 00007fe090006370 RCX: 00007fe0d575637b
[104763.870104] RDX: 00007fe0c9441c30 RSI: 00000000c020660b RDI: 0000000000000012
[104763.870104] RBP: 0000556debad9350 R08: 0000000000000000 R09: 00007fe09004bf28
[104763.870105] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe09004bf28
[104763.870105] R13: 0000000000000000 R14: 0000556de3ef0b46 R15: 0000000012c60000
[108129.210852] XFS (dm-4): Internal error xfs_trans_cancel at line 1048 of file fs/xfs/xfs_trans.c. Caller xfs_reflink_
[108129.210856] CPU: 15 PID: 97752 Comm: pool Tainted: G W 5.4.0-24-generic #28-Ubuntu
[108129.210857] Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS, BIOS 1405 11/19/2019
[108129.210858] Call Trace:
[108129.210862] dump_stack+
[108129.210883] xfs_error_
[108129.210902] ? xfs_reflink_
[108129.210920] xfs_trans_
[108129.210937] xfs_reflink_
[108129.210954] ? xfs_bmapi_
[108129.210971] xfs_reflink_
[108129.210989] xfs_file_
[108129.210992] vfs_dedupe_
[108129.210993] vfs_dedupe_
[108129.210994] do_vfs_
[108129.210996] ksys_ioctl+
[108129.210997] __x64_sys_
[108129.210998] do_syscall_
[108129.211000] entry_SYSCALL_
[108129.211001] RIP: 0033:0x7fe0d575637b
[108129.211003] Code: 0f 1e fa 48 8b 05 15 3b 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 3a 0d 00 f7 d8 64 89 01 48
[108129.211004] RSP: 002b:00007fe089
[108129.211005] RAX: ffffffffffffffda RBX: 00007fe07004adc0 RCX: 00007fe0d575637b
[108129.211006] RDX: 00007fe07000c690 RSI: 00000000c0189436 RDI: 0000000000000017
[108129.211006] RBP: 00007fe070084920 R08: 0000000000000002 R09: 00007fe070084970
[108129.211007] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe070084958
[108129.211007] R13: 00007fe070084960 R14: 00007fe070084970 R15: 00007fe07000c690
[108129.230787] XFS (dm-4): xfs_do_
[108129.230789] XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem
[108129.230790] XFS (dm-4): Please unmount the filesystem and rectify the problem(s)
[124069.735928] XFS (dm-4): Unmounting Filesystem
[124073.911647] XFS (dm-4): Mounting V5 Filesystem
[124074.026243] XFS (dm-4): Starting recovery (logdev: internal)
[124074.382784] XFS (dm-4): Ending recovery (logdev: internal)
[124074.632650] xfs filesystem being mounted at /mnt/storage supports timestamps until 2038 (0x7fffffff)
[124076.224530] XFS (dm-4): Unmounting Filesystem
I've got a filesystem where I've unpacked some of my Restic backups to test the feasibility of using XFS+reflinks as a replacement for Restic.
xfs_info /mnt/storage:
meta-data=
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=732565504, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=357698, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Initially I had a write-through lvmcache device in front of the above, but I was able to reproduce it even after removing it and running xfs_repair and repeating the operation.
Takes about a day or so to reproduce the issue by running duperemove again, but so far it reproduces 100%.
The latest xfs_repair output looks like this (initially it had a lot more 'clearing reflink flag on inode N' output:
sudo xfs_repair /dev/mapper/
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 3
- agno = 0
- agno = 2
- agno = 1
clearing reflink flag on inodes when possible
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
I'll also report this upstream but wanted to report here first.
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-
ProcVersionSign
Uname: Linux 5.4.0-24-generic x86_64
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/
/dev/snd/
/dev/snd/
CasperMD5CheckR
CurrentDesktop: ubuntu:GNOME
Date: Sat Apr 18 10:30:00 2020
InstallationDate: Installed on 2020-03-26 (22 days ago)
InstallationMedia: Ubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017)
IwConfig:
enp6s0 no wireless extensions.
lo no wireless extensions.
MachineType: System manufacturer System Product Name
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 1.187
RfKill:
SourcePackage: linux
UpgradeStatus: Upgraded to focal on 2020-04-16 (1 days ago)
dmi.bios.date: 11/19/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1405
dmi.board.
dmi.board.name: TUF GAMING X570-PLUS
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.
dmi.sys.vendor: System manufacturer
This change was made by a bot.