XFS: kernel hangs multiple times in xlog_cil_force_lsn

Bug #1495442 reported by B.
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned

Bug Description

Ubuntu 14.04.3 LTS with Kernel 3.19.0-26-generic

One of our servers running rsnapshot and nfs-kernel-server on XFS partitions
crashed or hanged two sundays in a raw.

Kernel 3.19.0-26-generic hangs multiple times (same time) in xlog_cil_force_lsn (XFS).
I assume this bug is not related with bug report 979498.

I had also this issue with kernel 3.16.0-46-generic or 3.13.0-57-generic but I have
no log during the crash to determine which of the above kernel was running.
I switched to kernel 3.19.0-26-generic because 3.16.0-48-generic was not
working well with Intel Gigabit -- tx hang (bug 1492146)

 INFO: task kthreadd:2 blocked for more than 120 seconds.
       Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kthreadd D ffff8802362eb3a8 0 2 0 0x00000000
  ffff8802362eb3a8 ffff8802362c09d0 0000000000013e80 ffff8802362ebfd8
  0000000000013e80 ffff880235c28000 ffff8802362c09d0 ffff880200000001
  ffff8802362eb500 7fffffffffffffff ffff8802362eb4f8 ffff8802362c09d0
 Call Trace:
  [<ffffffff817b24f9>] schedule+0x29/0x70
  [<ffffffff817b55bc>] schedule_timeout+0x20c/0x280
  [<ffffffff810aa4a4>] ? update_curr+0xe4/0x180
  [<ffffffff817b3224>] wait_for_completion+0xa4/0x170
  [<ffffffff810a0a70>] ? wake_up_state+0x20/0x20
  [<ffffffff8108e7ed>] flush_work+0xed/0x1c0
  [<ffffffff8108ac40>] ? destroy_worker+0x90/0x90
  [<ffffffffc027451e>] xlog_cil_force_lsn+0x7e/0x1f0 [xfs]
  [<ffffffff8117d667>] ? __free_pages+0x27/0x30
  [<ffffffff811cd1a0>] ? __free_slab+0xd0/0x1f0
  [<ffffffffc0272c7e>] _xfs_log_force_lsn+0x5e/0x2d0 [xfs]
  [<ffffffffc0272f1e>] xfs_log_force_lsn+0x2e/0x90 [xfs]
  [<ffffffffc0264e79>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
  [<ffffffffc026198d>] __xfs_iunpin_wait+0x8d/0x120 [xfs]
  [<ffffffff810b4e50>] ? autoremove_wake_function+0x40/0x40
  [<ffffffffc0264e79>] xfs_iunpin_wait+0x19/0x20 [xfs]
  [<ffffffffc025a18c>] xfs_reclaim_inode+0x7c/0x340 [xfs]
  [<ffffffffc025a6a7>] xfs_reclaim_inodes_ag+0x257/0x370 [xfs]
  [<ffffffffc025b1e3>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
  [<ffffffffc0269f25>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
  [<ffffffff811efc79>] super_cache_scan+0x169/0x170
  [<ffffffff81188346>] shrink_node_slabs+0x1d6/0x370
  [<ffffffff8118af9a>] shrink_zone+0x20a/0x240
  [<ffffffff8118b335>] do_try_to_free_pages+0x155/0x440
  [<ffffffff8117b74f>] ? zone_watermark_ok+0x1f/0x30
  [<ffffffff8118b6da>] try_to_free_pages+0xba/0x150
  [<ffffffff8117f21b>] __alloc_pages_nodemask+0x61b/0xa60
  [<ffffffff8117f72a>] alloc_kmem_pages_node+0x6a/0x130
  [<ffffffff81072263>] ? copy_process.part.26+0xf3/0x1c00
  [<ffffffff81072283>] copy_process.part.26+0x113/0x1c00
  [<ffffffff810ab2ae>] ? dequeue_task_fair+0x44e/0x660
  [<ffffffff810ac001>] ? put_prev_entity+0x31/0x3f0
  [<ffffffff810af20c>] ? pick_next_task_fair+0x19c/0x880
  [<ffffffff81093730>] ? kthread_create_on_node+0x1c0/0x1c0
  [<ffffffff810ac3ef>] ? put_prev_task_fair+0x2f/0x50
  [<ffffffff81073f25>] do_fork+0xd5/0x340
  [<ffffffff810741b6>] kernel_thread+0x26/0x30
  [<ffffffff810941ea>] kthreadd+0x15a/0x1c0
  [<ffffffff81094090>] ? kthread_create_on_cpu+0x60/0x60
  [<ffffffff817b67d8>] ret_from_fork+0x58/0x90
  [<ffffffff81094090>] ? kthread_create_on_cpu+0x60/0x60

10:17:01 xyz CRON[59020]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
11:17:01 xyz CRON[59177]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
12:00:01 xyz CRON[59282]: (root) CMD (/usr/bin/rsnapshot -c /etc/rsnapshot.conf hourly)
12:03:05 xyz kernel: [616881.456247] INFO: task kthreadd:2 blocked for more than 120 seconds.
12:03:05 xyz kernel: [616881.456587] Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
...
12:03:05 xyz kernel: [616881.457934] INFO: task kswapd0:57 blocked for more than 120 seconds.
12:03:05 xyz kernel: [616881.458271] Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
...
12:03:05 xyz kernel: [616881.485629] INFO: task nfsd:2002 blocked for more than 120 seconds.
12:03:05 xyz kernel: [616881.492761] Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
...
12:03:05 xyz kernel: [616881.523050] INFO: task snmpd:2155 blocked for more than 120 seconds.
12:03:05 xyz kernel: [616881.530783] Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
...
12:03:05 xyz kernel: [616881.562789] INFO: task kworker/1:0:59030 blocked for more than 120 seconds.
12:03:05 xyz kernel: [616881.571082] Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
...
12:03:05 xyz kernel: [616881.605747] INFO: task kworker/3:1H:59110 blocked for more than 120 seconds.
12:03:05 xyz kernel: [616881.615112] Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
...
12:03:05 xyz kernel: [616881.652107] INFO: task rm:59284 blocked for more than 120 seconds.
12:03:05 xyz kernel: [616881.661441] Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
...
12:05:05 xyz kernel: [617001.782228] INFO: task kswapd0:57 blocked for more than 120 seconds.
12:05:05 xyz kernel: [617001.791580] Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
...
12:05:05 xyz kernel: [617001.828636] INFO: task irqbalance:1353 blocked for more than 120 seconds.
12:05:05 xyz kernel: [617001.838019] Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
...
12:05:05 xyz kernel: [617001.875425] INFO: task nfsd:2002 blocked for more than 120 seconds.
12:05:05 xyz kernel: [617001.884799] Tainted: G C 3.19.0-26-generic #28~14.04.1-Ubuntu
...

REFERENCES
http://marc.info/?l=linux-xfs&w=2&r=1&s=xlog_cil_force_lsn%2B0x7e%2F0x1f0&q=b
http://marc.info/?l=linux-xfs&w=2&r=1&s=xlog_cil_force_lsn&q=b
http://oss.sgi.com/cgi-bin/namazu.cgi?query=xlog_cil_force_lsn&submit=Search!&idxname=xfs&max=100&result=short&sort=date%3Alate
http://oss.sgi.com/archives/xfs/2015-03/msg00174.html
http://oss.sgi.com/archives/xfs/2014-09/msg00102.html
http://marc.info/?t=133455915600002&r=1&w=2
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Sep 14 07:30 seq
 crw-rw---- 1 root audio 116, 33 Sep 14 07:30 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.13
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=/dev/mapper/mainvg-swap
IwConfig: Error: [Errno 2] No such file or directory
MachineType: HP ProLiant DL380e Gen8
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_US:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.19.0-28-generic root=/dev/mapper/mainvg-root ro elevator=deadline nohz=off transparent_hugepage=always intel_iommu=on acpi_irq_nobalance cgroup_enable=memory swapaccount=1 nomdmonddf nomdmonisw
ProcVersionSignature: Ubuntu 3.19.0-28.30~14.04.1-generic 3.19.8-ckt5
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-28-generic N/A
 linux-backports-modules-3.19.0-28-generic N/A
 linux-firmware 1.127.15
RfKill: Error: [Errno 2] No such file or directory
StagingDrivers: visorutil
Tags: trusty staging
Uname: Linux 3.19.0-28-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 02/10/2014
dmi.bios.vendor: HP
dmi.bios.version: P73
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrP73:bd02/10/2014:svnHP:pnProLiantDL380eGen8:pvr:cvnHP:ct23:cvr:
dmi.product.name: ProLiant DL380e Gen8
dmi.sys.vendor: HP

Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1495442

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: utopic
Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote : BootDmesg.txt

apport information

tags: added: apport-collected staging trusty
description: updated
Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote : CurrentDmesg.txt

apport information

Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote : Lspci.txt

apport information

Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote : Lsusb.txt

apport information

Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote : ProcInterrupts.txt

apport information

Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote : ProcModules.txt

apport information

Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote : UdevDb.txt

apport information

Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote : UdevLog.txt

apport information

Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote : WifiSyslog.txt

apport information

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.2 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2-unstable/

Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Scottix (scottix) wrote :

We are also running into this bug on a couple of machines as well, which is causing a DOS on our machine since it blocks all IO and requires a reboot.
We are running 3.19.0-28-generic also had the tx hang bug so we can't use the 3.16 kernel.

I am not sure what triggers it, since it happens randomly and unpredictably. These are production machines so I can't do much debugging on it but want to confirm something seems to break.

Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote :

The bug doesn't seem to impact 3.13.0-63-generic.

(the server is in production we don't really want to test 4.2 unstable)

Revision history for this message
Scottix (scottix) wrote :

https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1492146 has been fixed in 3.16 so we have downgraded until we are confident the XFS bugs have been resolved. I know a bunch of work has been done in 3.19 for XFS and more work has been done in 4.1, so may just have to will for next stable.

I did try to run xfstest on a dev machine but didn't hit the bug we are talking about, so any idea on how to reproduce would be good.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.