XFS corruption on machine which never suffered a hard reset or disk failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
High
|
Unassigned |
Bug Description
Using Ubuntu 12.04 server, we installed a machine using the following disk layout:
XFS => dm-crypt => RAID5.
A *complete* list of ALL configuration of the machine including the setup can be provided if you need it, we documented everything.
The harddisks are tested weekly with a full SMART test and they are okay.
The machine is attached to a UPS and therefore never suffered a hard reset.
Also, the memory was tested with memtest86+.
Nevertheless, the kernel reports XFS problems:
Sep 10 10:01:00 server kernel: [379001.376989] XFS (dm-0): xfs_da_do_buf: bno 0 dir: inode 3045868
Sep 10 10:01:00 server kernel: [379001.377011] XFS (dm-0): [00] br_startoff 0 br_startblock -2 br_blockcount 1 br_state 0
Sep 10 10:01:00 server kernel: [379001.377032] XFS (dm-0): Internal error xfs_da_do_buf(1) at line 2011 of file /build/
Sep 10 10:01:00 server kernel: [379001.377033]
Sep 10 10:01:00 server kernel: [379001.377069] Pid: 26624, comm: updatedb.mlocat Tainted: G C 3.2.0-30-generic #48-Ubuntu
Sep 10 10:01:00 server kernel: [379001.377071] Call Trace:
Sep 10 10:01:00 server kernel: [379001.377089] [<ffffffffa01cb
Sep 10 10:01:00 server kernel: [379001.377099] [<ffffffffa01fe
Sep 10 10:01:00 server kernel: [379001.377108] [<ffffffffa01fe
Sep 10 10:01:00 server kernel: [379001.377117] [<ffffffffa01fe
Sep 10 10:01:00 server kernel: [379001.377124] [<ffffffffa01cb
Sep 10 10:01:00 server kernel: [379001.377127] [<ffffffff81175
Sep 10 10:01:00 server kernel: [379001.377133] [<ffffffffa01cb
Sep 10 10:01:00 server kernel: [379001.377136] [<ffffffff8129c
Sep 10 10:01:00 server kernel: [379001.377138] [<ffffffff81183
Sep 10 10:01:00 server kernel: [379001.377139] [<ffffffff81176
Sep 10 10:01:00 server kernel: [379001.377141] [<ffffffff81177
Sep 10 10:01:00 server kernel: [379001.377143] [<ffffffff81186
Sep 10 10:01:00 server kernel: [379001.377144] [<ffffffff81187
Sep 10 10:01:00 server kernel: [379001.377146] [<ffffffff81183
Sep 10 10:01:00 server kernel: [379001.377147] [<ffffffff81187
Sep 10 10:01:00 server kernel: [379001.377149] [<ffffffff81187
Sep 10 10:01:00 server kernel: [379001.377152] [<ffffffff81319
Sep 10 10:01:00 server kernel: [379001.377153] [<ffffffff81182
Sep 10 10:01:00 server kernel: [379001.377156] [<ffffffff8165a
Sep 10 10:01:00 server kernel: [379001.377158] [<ffffffff81194
Sep 10 10:01:00 server kernel: [379001.377159] [<ffffffff81177
Sep 10 10:01:00 server kernel: [379001.377161] [<ffffffff81177
Sep 10 10:01:00 server kernel: [379001.377163] [<ffffffff81662
Sep 10 10:01:00 server kernel: [379001.377170] BUG: unable to handle kernel paging request at 0000000001000008
Sep 10 10:01:00 server kernel: [379001.377197] IP: [<ffffffff81122
Sep 10 10:01:00 server kernel: [379001.377215] PGD 176937067 PUD 20eb89067 PMD 0
Sep 10 10:01:00 server kernel: [379001.377230] Oops: 0000 [#1] SMP
Sep 10 10:01:00 server kernel: [379001.377241] CPU 2
Sep 10 10:01:00 server kernel: [379001.377247] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usb_storage uas nfsd nfs lockd fscache binfmt_misc auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw mei(C) mac_hid lp parport xfs dm_crypt raid10 raid0 multipath linear aesni_intel cryptd aes_x86_64 usbhid hid raid1 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx i915 drm_kms_helper drm i2c_algo_bit video e1000e
Sep 10 10:01:00 server kernel: [379001.377384]
Sep 10 10:01:00 server kernel: [379001.377390] Pid: 26624, comm: updatedb.mlocat Tainted: G C 3.2.0-30-generic #48-Ubuntu /DH67GD
Sep 10 10:01:00 server kernel: [379001.377419] RIP: 0010:[<
Sep 10 10:01:00 server kernel: [379001.377441] RSP: 0018:ffff8801d6
Sep 10 10:01:00 server kernel: [379001.377454] RAX: ffff880073981bc5 RBX: ffff880157dde800 RCX: 0000000000000001
Sep 10 10:01:00 server kernel: [379001.377471] RDX: 0000000000000001 RSI: 0000000000ffff88 RDI: ffff880157dde870
Sep 10 10:01:00 server kernel: [379001.377489] RBP: ffff8801d6a35c98 R08: 000000000000000a R09: 0000000000000000
Sep 10 10:01:00 server kernel: [379001.377506] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d6a35e38
Sep 10 10:01:00 server kernel: [379001.377523] R13: ffff880073981bc0 R14: ffff88013113d000 R15: 0000000000000000
Sep 10 10:01:00 server kernel: [379001.377541] FS: 00007fbaf36b670
Sep 10 10:01:00 server kernel: [379001.377560] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 10 10:01:00 server kernel: [379001.377575] CR2: 0000000001000008 CR3: 00000001e6fe3000 CR4: 00000000000406e0
Sep 10 10:01:00 server kernel: [379001.377592] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 10 10:01:00 server kernel: [379001.377609] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 10 10:01:00 server kernel: [379001.377626] Process updatedb.mlocat (pid: 26624, threadinfo ffff8801d6a34000, task ffff88020de15c00)
Sep 10 10:01:00 server kernel: [379001.377647] Stack:
Sep 10 10:01:00 server kernel: [379001.377654] ffff8801d6a35cf8 ffffffff81175bf9 ffffffffa01cbd60 ffffffff8129cdbc
Sep 10 10:01:00 server kernel: [379001.377676] ffff8801de5d4600 ffffffff8118389a ffff8801d6a35d18 ffff8801d6a35e38
Sep 10 10:01:00 server kernel: [379001.377698] 0000000000058000 0000000000000000 ffff88013113d000 0000000000000000
Sep 10 10:01:00 server kernel: [379001.377719] Call Trace:
Sep 10 10:01:00 server kernel: [379001.377727] [<ffffffff81175
Sep 10 10:01:00 server kernel: [379001.377747] [<ffffffffa01cb
Sep 10 10:01:00 server kernel: [379001.377763] [<ffffffff8129c
Sep 10 10:01:00 server kernel: [379001.377780] [<ffffffff81183
Sep 10 10:01:00 server kernel: [379001.377794] [<ffffffff81176
Sep 10 10:01:00 server kernel: [379001.377807] [<ffffffff81177
Sep 10 10:01:00 server kernel: [379001.377822] [<ffffffff81186
Sep 10 10:01:00 server kernel: [379001.377835] [<ffffffff81187
Sep 10 10:01:00 server kernel: [379001.377849] [<ffffffff81183
Sep 10 10:01:00 server kernel: [379001.377862] [<ffffffff81187
Sep 10 10:01:00 server kernel: [379001.377878] [<ffffffff81187
Sep 10 10:01:00 server kernel: [379001.377892] [<ffffffff81319
Sep 10 10:01:00 server kernel: [379001.377907] [<ffffffff81182
Sep 10 10:01:00 server kernel: [379001.377921] [<ffffffff8165a
Sep 10 10:01:00 server kernel: [379001.377935] [<ffffffff81194
Sep 10 10:01:00 server kernel: [379001.377949] [<ffffffff81177
Sep 10 10:01:00 server kernel: [379001.377963] [<ffffffff81177
Sep 10 10:01:00 server kernel: [379001.377975] [<ffffffff81662
Sep 10 10:01:00 server kernel: [379001.377990] Code: ff ff 48 c7 c2 e0 b7 c3 81 e9 d7 fe ff ff 48 8b 73 30 e9 65 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 66 66 66 66 90 <48> 8b 86 80 00 00 00 5d 48 8b 40 10 48 c7 47 18 ff ff ff ff 89
Sep 10 10:01:00 server kernel: [379001.378112] RIP [<ffffffff81122
Sep 10 10:01:00 server kernel: [379001.378129] RSP <ffff8801d6a35c98>
Sep 10 10:01:00 server kernel: [379001.378138] CR2: 0000000001000008
Sep 10 10:01:01 server kernel: [379001.501110] ---[ end trace 2e597406c2d3462c ]---
---
AlsaDevices:
total 0
crw-rw---T 1 root audio 116, 1 Sep 16 21:55 seq
crw-rw---T 1 root audio 116, 33 Sep 16 21:55 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu13
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 12.04
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
LANGUAGE=en_US:en
TERM=linux
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.79.1
RfKill: Error: [Errno 2] No such file or directory
StagingDrivers: mei
Tags: precise staging
Uname: Linux 3.2.0-30-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
dmi.bios.date: 06/15/2012
dmi.bios.vendor: Intel Corp.
dmi.bios.version: BLH6710H.
dmi.board.
dmi.board.name: DH67GD
dmi.board.vendor: Intel Corporation
dmi.board.version: AAG10206-210
dmi.chassis.type: 3
dmi.modalias: dmi:bvnIntelCor
affects: | ubuntu → linux (Ubuntu) |
After that, what happened very often is the following:
Sep 10 11:58:11 server kernel: [386031.913144] BUG: soft lockup - CPU#0 stuck for 23s! [kswapd0:35] ffffffff8103dc4 d>] [<ffffffff8103d c4d>] __ticket_ spin_lock+ 0xd/0x30 911b80 EFLAGS: 00000286 0(0000) GS:ffff88021f20 0000(0000) knlGS:000000000 0000000
Sep 10 11:58:11 server kernel: [386031.913200] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usb_storage uas nfsd nfs lockd fscache binfmt_misc auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw mei(C) mac_hid lp parport xfs dm_crypt raid10 raid0 multipath linear aesni_intel cryptd aes_x86_64 usbhid hid raid1 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx i915 drm_kms_helper drm i2c_algo_bit video e1000e
Sep 10 11:58:11 server kernel: [386031.913512] CPU 0
Sep 10 11:58:11 server kernel: [386031.913526] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usb_storage uas nfsd nfs lockd fscache binfmt_misc auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw mei(C) mac_hid lp parport xfs dm_crypt raid10 raid0 multipath linear aesni_intel cryptd aes_x86_64 usbhid hid raid1 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx i915 drm_kms_helper drm i2c_algo_bit video e1000e
Sep 10 11:58:11 server kernel: [386031.921028]
Sep 10 11:58:11 server kernel: [386031.923600] Pid: 35, comm: kswapd0 Tainted: G D C 3.2.0-30-generic #48-Ubuntu /DH67GD
Sep 10 11:58:11 server kernel: [386031.926215] RIP: 0010:[<
Sep 10 11:58:11 server kernel: [386031.928810] RSP: 0018:ffff88020f
Sep 10 11:58:11 server kernel: [386031.931399] RAX: 00000000ed7ded7d RBX: ffff88021f20ec40 RCX: ffff880073983d80
Sep 10 11:58:11 server kernel: [386031.933958] RDX: ffff88013113d740 RSI: 0000000000000001 RDI: ffff88013113d71c
Sep 10 11:58:11 server kernel: [386031.936490] RBP: ffff88020f911b80 R08: 0000000000000001 R09: dead000000200200
Sep 10 11:58:11 server kernel: [386031.939047] R10: 0000000000000000 R11: dead000000200200 R12: 0000000000000000
Sep 10 11:58:11 server kernel: [386031.941586] R13: 0000000000000000 R14: 0000000000000020 R15: ffffffff8112a74f
Sep 10 11:58:11 server kernel: [386031.944133] FS: 000000000000000
Sep 10 11:58:11 server kernel: [386031.946716] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 10 11:58:11 server kernel: [386031.949245] CR2: 00007f6cd7267400 CR3: 0000000001c05000 CR4: 00000000000406f0
Sep 10 11:58:11 server kernel: [386031.951695] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 10 11:58:11 server kernel: [386031.954102] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 10 11:58:11 server kernel: [386031.956484] Process kswapd0 (pid: 35, threadinfo ffff88020f910000, task ffff88020f908000)
Sep 10 11:58:11 server kernel: [386031.958863] Stack:
Sep 10 11:58:11 server kernel: [386031.961214] ffff88020f911b90 ffffffff8165a41e ffff88020f911c00 ffffffff8118eadf
Sep 10 11:58:11 server kernel: [386031.963586] ffff88020c7f1000 ffff880073983d80 ffff88013113d740 ffff88018d115600
Sep 10 11:58:11 server kernel: [386031.965963] ffff88020f911bd0 ffff88013113d740 ffff88020f911c30 ffff8801765034dc
Sep 10 11:58:11 server kerne...