stacktrace in ext4: /build/buildd/linux-3.13.0/fs/ext4/ext4_jbd2.c:259 __ext4_handle_dirty_metadata+0x1a2/0x1c0()

Bug #1298972 reported by Simon Déziel
282
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

I was running an "apt-get dist-upgrade" when the FS suddenly remounted to RO. The 2 audit lines are from apt-dater running successively "apt-get update" and "apt-get dist-upgrade":

[153439.007924] type=1006 audit(1396010484.706:46): pid=16029 uid=0 old auid=4294967295 new auid=104 old ses=4294967295 new ses=9 res=1
[153502.372488] type=1006 audit(1396010548.070:47): pid=16086 uid=0 old auid=4294967295 new auid=104 old ses=4294967295 new ses=10 res=1
[153523.874714] EXT4-fs error (device vda1): ext4_mb_generate_buddy:756: group 1, 6475 clusters in bitmap, 6473 in gd; block bitmap corrupt.
[153523.876361] Aborting journal on device vda1-8.
[153523.889719] EXT4-fs error (device vda1): ext4_journal_check_start:56: Detected aborted journal
[153523.890881] EXT4-fs (vda1): Remounting filesystem read-only
[153523.891574] EXT4-fs (vda1): Remounting filesystem read-only
[153523.892301] ------------[ cut here ]------------
[153523.892341] WARNING: CPU: 0 PID: 16129 at /build/buildd/linux-3.13.0/fs/ext4/ext4_jbd2.c:259 __ext4_handle_dirty_metadata+0x1a2/0x1c0()
[153523.892342] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_filter ip_tables x_tables psmouse serio_raw floppy
[153523.892400] CPU: 0 PID: 16129 Comm: dpkg Not tainted 3.13.0-19-generic #40-Ubuntu
[153523.892401] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[153523.892406] 0000000000000009 ffff880000a97ad0 ffffffff81711075 0000000000000000
[153523.892408] ffff880000a97b08 ffffffff810662cd ffff88000aca3d00 0000000000000000
[153523.892410] ffff88000a653000 ffffffff81835280 00000000000012ea ffff880000a97b18
[153523.892412] Call Trace:
[153523.892444] [<ffffffff81711075>] dump_stack+0x45/0x56
[153523.892461] [<ffffffff810662cd>] warn_slowpath_common+0x7d/0xa0
[153523.892463] [<ffffffff810663aa>] warn_slowpath_null+0x1a/0x20
[153523.892466] [<ffffffff81268112>] __ext4_handle_dirty_metadata+0x1a2/0x1c0
[153523.892483] [<ffffffff81240eca>] ? ext4_dirty_inode+0x2a/0x60
[153523.892486] [<ffffffff812707d6>] ext4_free_blocks+0x646/0xbf0
[153523.892488] [<ffffffff81261ec5>] ext4_ext_rm_leaf+0x4a5/0x860
[153523.892490] [<ffffffff81260f87>] ? __ext4_ext_check+0x197/0x330
[153523.892492] [<ffffffff812645e0>] ? ext4_ext_remove_space+0xc0/0x7e0
[153523.892494] [<ffffffff8126483c>] ext4_ext_remove_space+0x31c/0x7e0
[153523.892496] [<ffffffff81266bb0>] ext4_ext_truncate+0xb0/0xe0
[153523.892498] [<ffffffff8123f339>] ext4_truncate+0x379/0x3c0
[153523.892500] [<ffffffff8123fef1>] ext4_evict_inode+0x491/0x4f0
[153523.892514] [<ffffffff811d3ac0>] evict+0xb0/0x1b0
[153523.892516] [<ffffffff811d42d5>] iput+0xf5/0x180
[153523.892525] [<ffffffff811c8cfe>] do_unlinkat+0x18e/0x2b0
[153523.892537] [<ffffffff81020d25>] ? syscall_trace_enter+0x145/0x250
[153523.892539] [<ffffffff811c9c56>] SyS_unlink+0x16/0x20
[153523.892545] [<ffffffff81721c7f>] tracesys+0xe1/0xe6
[153523.892547] ---[ end trace 9b25f1487620e754 ]---
[153523.892582] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[153523.893579] IP: [<ffffffff81256fa1>] __ext4_error_inode+0x31/0x160
[153523.894350] PGD a06c067 PUD 819d067 PMD 0
[153523.894933] Oops: 0000 [#1] SMP
[153523.895397] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_filter ip_tables x_tables psmouse serio_raw floppy
[153523.896276] CPU: 0 PID: 16129 Comm: dpkg Tainted: G W 3.13.0-19-generic #40-Ubuntu
[153523.896276] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[153523.896276] task: ffff880009455fc0 ti: ffff880000a96000 task.ti: ffff880000a96000
[153523.896276] RIP: 0010:[<ffffffff81256fa1>] [<ffffffff81256fa1>] __ext4_error_inode+0x31/0x160
[153523.896276] RSP: 0018:ffff880000a97a88 EFLAGS: 00010296
[153523.896276] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000082
[153523.896276] RDX: 00000000000012ea RSI: ffffffff81a6a3e9 RDI: 0000000000000000
[153523.896276] RBP: ffff880000a97b18 R08: ffffffff81a74110 R09: 0000000000000005
[153523.896276] R10: 00000000ffffffe2 R11: ffff880000a977fe R12: 0000000000000082
[153523.896276] R13: ffffffff81835280 R14: 00000000000012ea R15: ffffffff81a74110
[153523.896276] FS: 00007ff43c02b840(0000) GS:ffff88000b800000(0000) knlGS:0000000000000000
[153523.896276] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[153523.896276] CR2: 0000000000000028 CR3: 0000000008df7000 CR4: 00000000000006f0
[153523.896276] Stack:
[153523.896276] ffff880000a97a90 0000000000000103 9b25f1487620e754 0000000000002e42
[153523.896276] 000000000000103e 0000000000000092 00000000000001fd ffff88000a653000
[153523.896276] ffff880000a97b18 ffffffff81267c22 ffffffff810662df ffff88000aca3d00
[153523.896276] Call Trace:
[153523.896276] [<ffffffff81267c22>] ? ext4_journal_abort_handle+0x42/0xc0
[153523.896276] [<ffffffff810662df>] ? warn_slowpath_common+0x8f/0xa0
[153523.896276] [<ffffffff8126807f>] __ext4_handle_dirty_metadata+0x10f/0x1c0
[153523.896276] [<ffffffff812707d6>] ext4_free_blocks+0x646/0xbf0
[153523.896276] [<ffffffff81261ec5>] ext4_ext_rm_leaf+0x4a5/0x860
[153523.896276] [<ffffffff81260f87>] ? __ext4_ext_check+0x197/0x330
[153523.896276] [<ffffffff812645e0>] ? ext4_ext_remove_space+0xc0/0x7e0
[153523.896276] [<ffffffff8126483c>] ext4_ext_remove_space+0x31c/0x7e0
[153523.896276] [<ffffffff81266bb0>] ext4_ext_truncate+0xb0/0xe0
[153523.896276] [<ffffffff8123f339>] ext4_truncate+0x379/0x3c0
[153523.896276] [<ffffffff8123fef1>] ext4_evict_inode+0x491/0x4f0
[153523.896276] [<ffffffff811d3ac0>] evict+0xb0/0x1b0
[153523.896276] [<ffffffff811d42d5>] iput+0xf5/0x180
[153523.896276] [<ffffffff811c8cfe>] do_unlinkat+0x18e/0x2b0
[153523.896276] [<ffffffff81020d25>] ? syscall_trace_enter+0x145/0x250
[153523.896276] [<ffffffff811c9c56>] SyS_unlink+0x16/0x20
[153523.896276] [<ffffffff81721c7f>] tracesys+0xe1/0xe6
[153523.896276] Code: 48 89 e5 41 57 4d 89 c7 41 56 41 89 d6 41 55 49 89 f5 48 c7 c6 e9 a3 a6 81 41 54 49 89 cc 53 48 89 fb 48 83 ec 68 4c 89 4c 24 60 <48> 8b 47 28 48 8b 57 40 48 8b 80 f8 02 00 00 48 8b 40 68 89 90
[153523.896276] RIP [<ffffffff81256fa1>] __ext4_error_inode+0x31/0x160
[153523.896276] RSP <ffff880000a97a88>
[153523.896276] CR2: 0000000000000028
[153523.933140] ---[ end trace 9b25f1487620e755 ]---

$ apt-cache policy linux-image-virtual linux-image-3.13.0-19-generic
linux-image-virtual:
  Installed: 3.13.0.19.23
  Candidate: 3.13.0.19.23
  Version table:
 *** 3.13.0.19.23 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
linux-image-3.13.0-19-generic:
  Installed: 3.13.0-19.40
  Candidate: 3.13.0-19.40
  Version table:
 *** 3.13.0-19.40 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Mar 28 04:55 seq
 crw-rw---- 1 root audio 116, 33 Mar 28 04:55 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.13.3-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: [Errno 2] No such file or directory
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
IwConfig: Error: [Errno 2] No such file or directory
Lspci: Error: [Errno 2] No such file or directory
Lsusb: Error: [Errno 2] No such file or directory
MachineType: Bochs Bochs
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: root=UUID=65dff546-44c1-4418-9703-57fc37ea3296 ro console=tty0 console=ttyS0,38400 quiet
ProcVersionSignature: Ubuntu 3.13.0-19.40-generic 3.13.6
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-19-generic N/A
 linux-backports-modules-3.13.0-19-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
Uname: Linux 3.13.0-19-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 01/01/2007
dmi.bios.vendor: Bochs
dmi.bios.version: Bochs
dmi.chassis.type: 1
dmi.chassis.vendor: Bochs
dmi.modalias: dmi:bvnBochs:bvrBochs:bd01/01/2007:svnBochs:pnBochs:pvr:cvnBochs:ct1:cvr:
dmi.product.name: Bochs
dmi.sys.vendor: Bochs

Revision history for this message
Simon Déziel (sdeziel) wrote :

I'm attaching the dmesg extract as it reads more easily (same content as what's in the issue description).

description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1298972

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Simon Déziel (sdeziel) wrote : BootDmesg.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Simon Déziel (sdeziel) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Simon Déziel (sdeziel) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Simon Déziel (sdeziel) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Simon Déziel (sdeziel) wrote : ProcModules.txt

apport information

Revision history for this message
Simon Déziel (sdeziel) wrote : UdevDb.txt

apport information

Revision history for this message
Simon Déziel (sdeziel) wrote : UdevLog.txt

apport information

Revision history for this message
Simon Déziel (sdeziel) wrote : WifiSyslog.txt

apport information

Simon Déziel (sdeziel)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Are you able to reproduce the oops? If so, would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.14 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14-trusty/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Russell Smith (mr-russ) wrote :

I see this with relative consistency on a number of 14.04 VM's. I will attempt to upgrade to 3.14 and see what happens.

Revision history for this message
Russell Smith (mr-russ) wrote :

https://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.14.2 contains ext4 patches related to the code in this stack dump;

ad6599ab3ac98a4474544086e048ce86ec15a4d1 specifically references incorrect freeing of the function.

I'm not able to reliably reproduce the stack dump and am not clear on whether the commit mentioned is the complete fix for this issue.

Any follow-up would be appreciated.

Changed in linux (Ubuntu):
status: Expired → Confirmed
tags: added: kernel-fixed-upstream
Revision history for this message
Simon Déziel (sdeziel) wrote : Re: [Bug 1298972] Re: stacktrace in ext4: /build/buildd/linux-3.13.0/fs/ext4/ext4_jbd2.c:259 __ext4_handle_dirty_metadata+0x1a2/0x1c0()

Hi Russell,

On 09/28/2014 01:43 AM, Russell Smith wrote:
> https://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.14.2 contains
> ext4 patches related to the code in this stack dump;
>
> ad6599ab3ac98a4474544086e048ce86ec15a4d1 specifically references
> incorrect freeing of the function.
>
> I'm not able to reliably reproduce the stack dump and am not clear on
> whether the commit mentioned is the complete fix for this issue.

Just to be sure I understand properly, you were able to reproduce the
issue reliably when using the default kernel (3.13) but cannot reproduce
now that you are on 3.14? So that's why you marked the issue as fixed
upstream?

> Any follow-up would be appreciated.

Well, I was never able to reproduce the issue even while sticking to the
3.13 kernel so thank you for you input on this.

Regards,
Simon

Revision history for this message
Russell Smith (mr-russ) wrote :

Hi,

I behaved lazily... and have been trying to collect more data before posting back. Sorry for the delay.

1. I can only reproduce the bug about once a week on 3.13 kernel that is running on ubuntu 14.04.
2. I was lazy and after reading patches and kernel stuff, I marked it as fixed upstream. I didn't want to go compiling my own kernel as I need all the virtio options.

I have 5 virtual machines running on top of a md10 + md1 raid all put into a single lvm volume. The host is running ubuntu 12.04 with a 3.2 kernel series. The guests are all virtio and running 14.04 as they have been upgraded. Most of the 5 instances have experienced a file system read-only event. One server has experienced 3. It is my package mirror running approx.

So after further investigation this may not be resolved upstream. If you can provide me with a virtio/ubuntu compatible newer kernel I can put it on a machine and try it out. But because I can't reproduce this at will it's very hard to give a reproduction case.

I found http://lists.openwall.net/linux-ext4/2014/05/21/2 which indicates there is a bug somewhere if we are seeing failures where we are. The result of that thread was more debugging put into the section of code that reports this. I'm happy to track and patch any amount of debug.

As the filesystem re-mounts readonly, I suspect I should move /var/log onto another filesystem to increase the chances I'll log everything.

All the instances I've seen this on are upgraded from at least 12.04. Some from 10.04. So there are ext3 -> ext4 -> ext4+ options involved in the filesystem makeup. Reading some of the bugs on the kernel list, it feel like this might be related to that an it will be complicated.

I've attached to examples from tune2fs to show that I have different features attached to the different servers that are experiencing this issue.

If you can provide any debugging advise that would be good, I've not really done kernel debugging before. I done a lot in the application space so I understand all the concepts.

Revision history for this message
Russell Smith (mr-russ) wrote :
Revision history for this message
Russell Smith (mr-russ) wrote :

Also the two servers are running;

Linux mirror 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

If there is any other information you want me to attach, please ask.

Thanks

Russell.

Revision history for this message
Simon Déziel (sdeziel) wrote :

Hi Russell,

On 10/02/2014 06:06 AM, Russell Smith wrote:
> I behaved lazily... and have been trying to collect more data before
> posting back. Sorry for the delay.
>
> 1. I can only reproduce the bug about once a week on 3.13 kernel that is running on ubuntu 14.04.
> 2. I was lazy and after reading patches and kernel stuff, I marked it as fixed upstream. I didn't want to go compiling my own kernel as I need all the virtio options.

Ubuntu provides upstream kernels here:

 http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D

And some documentation about it is available here:

 https://wiki.ubuntu.com/Kernel/MainlineBuilds

> I have 5 virtual machines running on top of a md10 + md1 raid all put
> into a single lvm volume. The host is running ubuntu 12.04 with a 3.2
> kernel series. The guests are all virtio and running 14.04 as they have
> been upgraded.

Interestingly I also had an hypervisor running 12.04 and possibly with
the 3.2 or the 3.11 kernel. IIRC, the guest was running a fresh install
of 14.04.

Since then, I reinstalled the hypervisor with 14.04 and the guest too. I
never experienced this problem after this.

> Most of the 5 instances have experienced a file system
> read-only event. One server has experienced 3. It is my package mirror
> running approx.

IIRC, my guest was an apt-cacher-ng instance.

> So after further investigation this may not be resolved upstream. If
> you can provide me with a virtio/ubuntu compatible newer kernel I can
> put it on a machine and try it out. But because I can't reproduce this
> at will it's very hard to give a reproduction case.

Indeed but since I haven't reproduce this issue once I'm afraid I can't
help you here.

> I found http://lists.openwall.net/linux-ext4/2014/05/21/2 which
> indicates there is a bug somewhere if we are seeing failures where we
> are. The result of that thread was more debugging put into the section
> of code that reports this. I'm happy to track and patch any amount of
> debug.
>
> As the filesystem re-mounts readonly, I suspect I should move /var/log
> onto another filesystem to increase the chances I'll log everything.

You can also send the logs to a remote syslog server.

Good luck and thanks!
Simon

Revision history for this message
Russell Smith (mr-russ) wrote :

I just upgraded my hypervisor to Ubuntu 14.04 in an attempt to resolve this issue as Simon experienced. However under heavy I/O of 'aptitude upgrade', I've had 3 more occurences of this with hours of upgrading the host to 14.04. So it appears not to be related to the hypervisor.

This leads me down the path that most of the affected systems have been upgraded from 10.04 -> 12.04 -> 14.04. Which would point to something in the upgrade path affecting this. As Simon re-installed all of his systems and didn't have the issue anymore. I could try random kernels on machines, but some information from a kernel developer on how to debug this would be very useful.

Revision history for this message
Russell Smith (mr-russ) wrote :

Hi,

I've been running the following kernel for a while and have reproduced the error in this bug with the following kernel.

Linux mirror 3.15.0-031500rc2-generic #201404201435 SMP Sun Apr 20 18:36:18 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Regards

Russell

Revision history for this message
Russell Smith (mr-russ) wrote :

The filesystem details are as follows. This may be related to an historic bug in the filesystem that is now causing ongoing corruption as the file system was created in 2010.

mr-russ@mirror:~$ sudo tune2fs -l /dev/vda1
tune2fs 1.42.9 (4-Feb-2014)
Filesystem volume name: <none>
Last mounted on: /
Filesystem UUID: 8c150a35-6a37-41c4-9155-a99e5acce610
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 1962240
Block count: 7863809
Reserved block count: 392018
Free blocks: 2843701
Free inodes: 1509317
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1022
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8176
Inode blocks per group: 511
Flex block group size: 16
Filesystem created: Sat May 22 20:51:25 2010
Last mount time: Fri Apr 3 09:34:17 2015
Last write time: Fri Apr 3 09:34:16 2015
Mount count: 1
Maximum mount count: 23
Last checked: Fri Apr 3 09:33:20 2015
Check interval: 15552000 (6 months)
Next check after: Wed Sep 30 08:33:20 2015
Lifetime writes: 310 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: af2303c6-1959-47cd-9fbf-3478192a400a
Journal backup: inode blocks

wynnie (choiwy)
information type: Public → Public Security
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.