[BTRFS] hard lockup on filserver
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | linux (Ubuntu) |
High
|
Unassigned | ||
Bug Description
Hi,
My Core i7 18TB BTRFS file server has been upgraded to 13.10 and since doing so locks up frequently. Previously it never crashed and was only taken down for maintenance, Monday we upgraded to 13.10 and since then it locks up ever 20hrs or so. We were doing a large backup last night and it locked up within 6 or so hrs.
On the 1st occurrence absolutely nothing was in the logs, on the 2nd occurrence there was some NFS hang messages, when system comes back up I will check and post any logging information gleaned.
| Changed in linux (Ubuntu): | |
| status: | New → Incomplete |
Output from:
ubuntu-bug linux
| Changed in linux (Ubuntu): | |
| status: | Incomplete → Confirmed |
| Sean Clarke (sean-clarke) wrote : | #3 |
Last nights/this mornings lock up had nothing logged:
Oct 9 20:20:48 enterprise sm-notify[867]: Unable to notify starbug.
Oct 9 20:20:48 enterprise sm-notify[867]: Unable to notify subversion.
Oct 9 20:27:49 enterprise kernel: [ 1391.065867] perf samples too long (2559 > 2500), lowering kernel.
Oct 9 21:17:01 enterprise CRON[2416]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 9 22:17:01 enterprise CRON[2479]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 9 23:00:12 enterprise kernel: [10530.153330] perf samples too long (5096 > 5000), lowering kernel.
Oct 9 23:17:01 enterprise CRON[2602]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 10 00:17:01 enterprise CRON[2652]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 10 01:17:01 enterprise CRON[2738]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 10 10:03:24 enterprise kernel: imklog 5.8.11, log source = /proc/kmsg started.
Oct 10 10:03:24 enterprise rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="1119" x-info="http://
Oct 10 10:03:24 enterprise rsyslogd: rsyslogd's groupid changed to 103
Oct 10 10:03:24 enterprise rsyslogd: rsyslogd's userid changed to 101
Oct 10 10:03:24 enterprise rsyslogd-2039: Could not open output pipe '/dev/xconsole' [try http://
Oct 10 10:03:24 enterprise kernel: [ 0.000000] Initializing cgroup subsys cpuset
Oct 10 10:03:24 enterprise kernel: [ 0.000000] Initializing cgroup subsys cpu
Oct 10 10:03:24 enterprise kernel: [ 0.000000] Initializing cgroup subsys cpuacct
Oct 10 10:03:24 enterprise kernel: [ 0.000000] Linux version 3.11.0-12-generic (buildd@allspice) (gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu7) ) #18-Ubuntu SMP Tue Oct 8 20:51:28 UTC 2013 (Ubuntu 3.11.0-
| Joseph Salisbury (jsalisbury) wrote : | #4 |
Would it be possible for you to test the latest upstream kernel? Refer to https:/
If this bug is fixed in the mainline kernel, please add the following tag 'kernel-
If the mainline kernel does not fix this bug, please add the tag: 'kernel-
If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".
Thanks in advance.
[0] http://
| tags: | added: saucy |
| Changed in linux (Ubuntu): | |
| importance: | Undecided → High |
| tags: | added: needs-bisect regression-release |
| Changed in linux (Ubuntu): | |
| status: | Confirmed → Incomplete |
| Sean Clarke (sean-clarke) wrote : | #5 |
Now running mainline kernel (3.12.0-999-generic #201310090426) and restesting
| Sean Clarke (sean-clarke) wrote : | #6 |
OK, system has run perfectly since I installed the 3.12.0-999-generic #201310090426 kernel.
Changing status as instructed
| Sean Clarke (sean-clarke) wrote : | #7 |
Not status (sorry) adding tag.
| tags: | added: kernel-fixed-upstream |
| Changed in linux (Ubuntu): | |
| status: | Incomplete → Confirmed |
| Sean Clarke (sean-clarke) wrote : | #8 |
OK, after a week of further testing I don't think the issue is resolved. I am moving about 150GB of data around and the system gives hard locks.
Nothing in the syslog, however I had a top running at failure and the BTRFS processes went through the roof:
0 0 0 R 100.0 0.0 0:51.50 [btrfs-transacti]
0 0 0 R 72.1 0.0 0:11.68 [btrfs-flush_del]
0 0 0 S 72.1 0.0 0:17.36 [btrfs-flush_del]
0 0 0 R 57.8 0.0 0:12.34 [btrfs-flush_del]
0 0 0 S 57.5 0.0 0:14.28 [btrfs-flush_del]
I will remove tag
| tags: | removed: kernel-fixed-upstream |
| Sean Clarke (sean-clarke) wrote : | #9 |
Installed 3.12.0-999-generic #201310170405 and retrying.
| Sean Clarke (sean-clarke) wrote : | #10 |
OK, it is reproducible - I have a filserver with 6x 3TB in a BTRFS RAID 1+0 configuration.
From a client (and using NFS) I copy a 95GB tar file from the fileserver to a USB HD.
It seems at the very end (when BTRFS deletes the 95GB file on the server it falls over - btrfs-transaction and btrfs-flush_del using 3 to 5 cores at 50 to 100%.
| Sean Clarke (sean-clarke) wrote : | #11 |
Upgraded to 3.12.0-999-generic #201310210405 and still get lockups and still with absolutely nothing on the kernel logs.
| Sean Clarke (sean-clarke) wrote : | #12 |
Situation is intolerable, crashed 2nd time in 12 hrs:
top - 12:06:37 up 1:51, 1 user, load average: 3.33, 0.87, 0.33
Tasks: 226 total, 5 running, 221 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 48.5 sy, 0.0 ni, 50.9 id, 0.1 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem: 12295372 total, 12138296 used, 157076 free, 312 buffers
KiB Swap: 7831536 total, 56 used, 7831480 free, 24644 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1005 root 20 0 0 0 0 R 100.1 0.0 0:50.16 btrfs-transacti
2462 root 20 0 0 0 0 R 75.8 0.0 0:06.66 btrfs-flush_del
2459 root 20 0 0 0 0 S 72.5 0.0 0:10.18 btrfs-flush_del
2463 root 20 0 0 0 0 S 70.5 0.0 0:14.41 btrfs-flush_del
2457 root 20 0 0 0 0 R 38.6 0.0 0:28.35 btrfs-flush_del
2458 root 20 0 0 0 0 S 17.0 0.0 0:39.31 btrfs-flush_del
1959 root 20 0 0 0 0 R 9.6 0.0 0:31.68 btrfs-flush_del
100 root 20 0 0 0 0 S 7.6 0.0 0:00.31 kswap
| Sean Clarke (sean-clarke) wrote : | #13 |
Filled a bug on Kernel.Org - https:/
| summary: |
- hard lockup on filserver + [BTRFS] hard lockup on filserver |
| Sean Clarke (sean-clarke) wrote : | #14 |
Now running 3.13.0-
Will keep and eye on the situation, but I also have had backups running and so far it looks stable.
Will continue to monitor.
| Sean Clarke (sean-clarke) wrote : | #15 |
Now over 2 weeks without a failure, and up until this kernel it never lasted anywhere near this.
| Sean Clarke (sean-clarke) wrote : | #16 |
Happy to close as fixed in 3.13
| Sean Clarke (sean-clarke) wrote : | #17 |
Fixed in 14.04 (3.13 kernel)
| Changed in linux (Ubuntu): | |
| status: | Confirmed → Invalid |
| status: | Invalid → Fix Released |


This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1237794
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.