Memory arena corruption with FUSE (was Memory allocation failure crashes kernel hard, presumably related to FUSE)

Bug #1505948 reported by Martin Gerhard Loschwitz on 2015-10-14
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Fedora)
Won't Fix
Critical
linux (Ubuntu)
High
Seth Forshee
Wily
High
Seth Forshee
Xenial
High
Seth Forshee

Bug Description

== SRU Justification ==

Impact: Races in fuse's synchronous io handling can result in use-after-free bugs which are causing kernel crashes.

Fix: Two commits from fuse-next, one which simply caches the result of a test to avoid a use-after-free and another which adds reference counting to the fuse_io_priv struct to get rid of some convoluted rules for determining when this structure can be freed.

Test case: Tested on LP #1505948.

---

Hello everybody,

Linux 4.1, 4.2 or 4.3-rc leads to an immediate kernel panic in our setup when trying to start a Qemu process on top of a fuse-based mount. Here is an example stacktrace:

[ 739.807817] BUG: unable to handle kernel paging request at ffff8800a4104ea0
[ 739.840201] IP: [<ffffffff811cc95a>] kmem_cache_alloc_trace+0x7a/0x1f0
[ 739.870309] PGD 2fee067 PUD 2fbf4dd063 PMD 0
[ 739.890418] Oops: 0000 [#1] SMP
[ 739.905265] Modules linked in: nbd vport_vxlan vport_gre gre ebtable_filter ebtables openvswitch ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter xt_CT iptable_raw ip_tables xt_tcpudp ip6t_REJECT nf_reject_ipv6 xt_limit nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables dm_crypt ipmi_ssif intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd kvm_intel kvm ipmi_devintf vhost_net vhost macvtap macvlan joydev input_leds dm_multipath scsi_dh bonding sb_edac 8021q garp hpilo mrp stp ipmi_si llc edac_core lpc_ich ioatdma 8250_fintek ipmi_msghandler lp shpchp acpi_power_meter mac_hid parport nls_iso8859_1 sch_fq_codel xfs libcrc32c btrfs xor raid6_pq ixgbe ses enclosure hid_generic dca vxlan usbhid ip6_udp_tunnel tg3 udp_tunnel ptp hid pps_core hpsa mdio wmi
[ 740.345300] CPU: 8 PID: 10550 Comm: qemu-system-x86 Not tainted 4.2.0-040200-generic #201508301530
[ 740.386879] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 05/06/2015
[ 740.416827] task: ffff882f8e958dc0 ti: ffff882f28c20000 task.ti: ffff882f28c20000
[ 740.451672] RIP: 0010:[<ffffffff811cc95a>] [<ffffffff811cc95a>] kmem_cache_alloc_trace+0x7a/0x1f0
[ 740.494047] RSP: 0018:ffff882f28c23c68 EFLAGS: 00010286
[ 740.518425] RAX: 0000000000000000 RBX: 00000000000000d0 RCX: 00000000000026b3
[ 740.551611] RDX: 00000000000026b2 RSI: 00000000000000d0 RDI: ffff882fbf407840
[ 740.584846] RBP: ffff882f28c23ca8 R08: 0000000000019920 R09: ffffe8d000200ab0
[ 740.618287] R10: ffffffff812e8dcd R11: ffffea00bca0ac00 R12: 00000000000000d0
[ 740.651320] R13: ffff882fbf407840 R14: ffff8800a4104ea0 R15: ffff882fbf407840
[ 740.684195] FS: 00007f2642ffd700(0000) GS:ffff882fbfa00000(0000) knlGS:0000000000000000
[ 740.722030] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 740.749469] CR2: ffff8800a4104ea0 CR3: 0000002f26f83000 CR4: 00000000001426e0
[ 740.783390] Stack:
[ 740.792577] ffffffff812e8dcd 0000000000000048 0000000000000002 ffff882f908c8468
[ 740.827003] 0000000001bef000 ffff882f928e4600 ffff882f28c23e48 ffff882f28c23d70
[ 740.860971] ffff882f28c23d38 ffffffff812e8dcd 0000000000000001 ffff882f908c8300
[ 740.894994] Call Trace:
[ 740.906211] [<ffffffff812e8dcd>] ? fuse_direct_IO+0xdd/0x280
[ 740.932940] [<ffffffff812e8dcd>] fuse_direct_IO+0xdd/0x280
[ 740.958866] [<ffffffff8117750e>] generic_file_direct_write+0x9e/0x150
[ 740.989318] [<ffffffff812e96bc>] fuse_file_write_iter+0x15c/0x2e0
[ 741.017725] [<ffffffff811e94a7>] __vfs_write+0xa7/0xf0
[ 741.041787] [<ffffffff811e9b09>] vfs_write+0xa9/0x190
[ 741.065307] [<ffffffff811ea9d9>] SyS_pwrite64+0x69/0xa0
[ 741.090141] [<ffffffff81085b57>] ? SyS_rt_sigprocmask+0x67/0xb0
[ 741.135924] [<ffffffff817a8e32>] entry_SYSCALL_64_fastpath+0x16/0x75
[ 741.183478] Code: 4c 03 05 32 d8 e3 7e 4d 8b 30 49 8b 40 10 4d 85 f6 0f 84 22 01 00 00 48 85 c0 0f 84 19 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
[ 741.306817] RIP [<ffffffff811cc95a>] kmem_cache_alloc_trace+0x7a/0x1f0

The problem has also been documented by somebody else in the Fedora bug tracker at https://bugzilla.redhat.com/show_bug.cgi?id=1254310

This behaviour is 100% reproducible. I have asked the fuse-devel mailinglist for advice, but up to this point with no success:

http://sourceforge.net/p/fuse/mailman/message/34537139/

We are still investigating if this issue is also happening with 4.0 and will add the information to this bug report once we have it. Any help on debugging will be greatly appreciated.

Download full text (4.7 KiB)

Description of problem:

After upgrading a node from F20 to F21, node crashes accessing glusterfs volume.
The remaining F20 nodes have no problem accessing the volume.

Aug 16 20:24:25 bagel kernel: [ 1810.077267] ------------[ cut here ]------------
Aug 16 20:24:25 bagel kernel: [ 1810.081945] kernel BUG at mm/slub.c:3413!
Aug 16 20:24:25 bagel kernel: [ 1810.085998] invalid opcode: 0000 [#1] SMP
Aug 16 20:24:25 bagel kernel: [ 1810.090177] Modules linked in: vhost_net vhost m
acvtap macvlan ebt_arp ebtable_nat fuse nfsv3 nfs_acl nfs lockd grace sunrpc fsca
che ebtable_filter ebtables ip6table_filter ip6_tables softdog scsi_transport_isc
si xt_physdev br_netfilter nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_connt
rack nf_conntrack vfat fat coretemp kvm_intel kvm bcache iTCO_wdt crct10dif_pclmu
l ipmi_devintf crc32_pclmul iTCO_vendor_support gpio_ich igb crc32c_intel ptp pps
_core lpc_ich ghash_clmulni_intel i2c_i801 mfd_core ipmi_si dca ipmi_msghandler i
2c_ismt tpm_tis shpchp tpm acpi_cpufreq ast i2c_algo_bit drm_kms_helper ttm drm 8
021q garp mrp tun bridge stp llc bonding
Aug 16 20:24:25 bagel kernel: [ 1810.149526] CPU: 1 PID: 4794 Comm: qemu-system-x
86 Not tainted 4.1.4-100.fc21.x86_64 #1
Aug 16 20:24:25 bagel kernel: [ 1810.157603] Hardware name: Supermicro A1SRM-2758
F/A1SRM-2758F, BIOS 1.2 02/16/2015
Aug 16 20:24:25 bagel kernel: [ 1810.165246] task: ffff88085a1313c0 ti: ffff8803b
09b4000 task.ti: ffff8803b09b4000
Aug 16 20:24:25 bagel kernel: [ 1810.172800] RIP: 0010:[<ffffffff81208532>] [<ff
ffffff81208532>] kfree+0x152/0x160
Aug 16 20:24:25 bagel kernel: [ 1810.180467] RSP: 0018:ffff8803b09b7c98 EFLAGS:
00010246
Aug 16 20:24:25 bagel kernel: [ 1810.185833] RAX: 005ffff80000002c RBX: ffff88020
08b9960 RCX: dead000000200200
Aug 16 20:24:25 bagel kernel: [ 1810.193032] RDX: 000077ff80000000 RSI: ffff88085
a1313c0 RDI: ffff8802008b9960
Aug 16 20:24:25 bagel kernel: [ 1810.200231] RBP: ffff8803b09b7cb8 R08: ffff8803b
09b7c80 R09: ffffea0008022e40
Aug 16 20:24:25 bagel kernel: [ 1810.207431] R10: 0000000000002fe4 R11: 000000000
0000000 R12: 0000000149928000
Aug 16 20:24:25 bagel kernel: [ 1810.214629] R13: ffffffffa02e5c8c R14: ffff8803b
09b7e50 R15: ffff8801009b5600
Aug 16 20:24:25 bagel kernel: [ 1810.221829] FS: 00007f35609ff700(0000) GS:ffff88087fc40000(0000) knlGS:0000000000000000
Aug 16 20:24:25 bagel kernel: [ 1810.229992] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 16 20:24:25 bagel kernel: [ 1810.235799] CR2: 00007fbf24022a98 CR3: 0000000100a81000 CR4: 00000000001027e0
Aug 16 20:24:25 bagel kernel: [ 1810.243001] Stack:
Aug 16 20:24:25 bagel kernel: [ 1810.245037] ffff8802008b9960 ffff8802008b9960 0000000149928000 ffff8803b09b7da8
Aug 16 20:24:25 bagel kernel: [ 1810.252590] ffff8803b09b7d48 ffffffffa02e5c8c 0000000000004800 ffff8806eea842c0
Aug 16 20:24:25 bagel kernel: [ 1810.260145] 0000000000004800 00000001f4000000 000000014992c800 0000000000000000
Aug 16 20:24:25 bagel kernel: [ 1810.267699] Call Trace:
Aug 16 20:24:25 bagel kernel: [ 1810.270189] [<ffffffffa02e5c8c>] fuse_direct_IO+0x20c/0x340 [fuse]
Aug 16 20:24:25 bagel kernel: [ 1810.276525] [<ffffffff811ac2fa>] generic_file_read_iter+0x4ca/0x6...

Read more...

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed

As already mentioned in my email to the fuse developer mailing list, we have also tried to create direct i/o traffic on the affected mount directly but were not able to reproduce the issue. The problem only ever occurs once Qemu starts to run stuff on top of the FUSE mount. Other reports of this issue (identical or similar) have mentioned Qemu or VMware-based emulation as well.

description: updated
summary: - Memory allocation failure crashes kernel hard
+ Memory allocation failure crashes kernel hard, presumably related to
+ FUSE

We can now confirm that the issue does not happen with 4.0.9. This leads to the assumption that the problem has either been fixed between 4.0 and 4.0.9, or, and I consider this much more likely, the problem was introduced between 4.0 and 4.1 on the main branch.

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key wily
Robert Doebbelin (2-robert-3) wrote :

Duplicating my post to the fuse developer mailing list here:

Hi all,

the kernel crash can be triggered if async direct IO is used which comes with Fuse 3.0_pre0 (i.e. current head). My workload was to install CentOS7 on a newly created qcow2 disk. The kernel (Fedora 21; 4.1.8-100.fc21.x86_64) crashed in 2/2 runs using qemu/kvm atop of ntfs-3g built against fuse3:

1) Build fuse3 from current head
2) Build ntfs-3g against fuse3 (feel free to use the attached patch. It assumes that pkg-config is able to find fuse3, so install fuse3.pc in a PKG_CONFIG_PATH)
3) ntfs-3g: ./configure --with-fuse=external; make
4) "src/lowntfs-3g --version" should now print 'lowntfs-3g 2015.3.14 external FUSE 30'

5) create and mount an NTFS volume
6) create a VM disk: qemu-img create -f qcow2 disk.qcow2 20G
7) make sure that the VM actually uses async direct io (cache='none' io='native')

In my case the kernel crashed around 12 minutes after the VM was started.

Regards,
Robert

This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora 'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 21 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Confirming that this still occurs on fresh Fedora 22 install with:

kernel-4.2.5-201.fc22.x86_64
glusterfs-fuse-3.7.5-1.fc22.x86_64
fuse-2.9.4-3.fc22.x86_64
fuse-libs-2.9.4-3.fc22.x86_64

Found a workaround (not a fix):

I recompiled kernel-4.2.5-201.fc22.x86_64 to use the older SLAB allocator instead of the default SLUB allocator. Problem avoided. No more crash when using glusterfs (fuse).

Now.. what the -bleep- is wrong with SLUB?

While using SLAB is a workaround (at least it seems to be working so far; knock-on-wood), I am uncertain what performance impacts it is going to have on my virtualization cluster. :-(

And without a run/boot-time method of switching between allocators, I am now going to have to compile my customized kernel from here on out.. not a big deal, but a nuisance.. and have to take extra care to make sure to never boot into a distro-built kernel by mistake and have everything come crashing down.

Andy Whitcroft (apw) on 2016-01-27
summary: - Memory allocation failure crashes kernel hard, presumably related to
- FUSE
+ Memory arena corruption with FUSE (was Memory allocation failure crashes
+ kernel hard, presumably related to FUSE)
Maik Zumstrull (m-zumstrull) wrote :

We've been able to confirm an out of bounds write in fuse_direct_io with the slub_debug boot option on linux-lts-wily.

Maik Zumstrull (m-zumstrull) wrote :

We've been able to confirm an out of bounds write in fuse_direct_io with the slub_debug boot option on linux-lts-wily.

Robert Doebbelin (2-robert-3) wrote :
Download full text (5.0 KiB)

Enabling KASAN on a Wily kernel prints the following:

Jan 27 12:02:05 ubuntu kernel: ==================================================================
Jan 27 12:02:05 ubuntu kernel: BUG: KASan: use after free in fuse_direct_IO+0xb1a/0xcc0 at addr ffff88036c414390
Jan 27 12:02:05 ubuntu kernel: Read of size 8 by task qemu-system-x86/2784
Jan 27 12:02:05 ubuntu kernel: =============================================================================
Jan 27 12:02:05 ubuntu kernel: BUG kmalloc-128 (Tainted: G I ): kasan: bad access detected
Jan 27 12:02:05 ubuntu kernel: -----------------------------------------------------------------------------
Jan 27 12:02:05 ubuntu kernel: Disabling lock debugging due to kernel taint
Jan 27 12:02:05 ubuntu kernel: INFO: Slab 0xffffea000db10500 objects=32 used=26 fp=0xffff88036c414e80 flags=0x2ffff0000000080
Jan 27 12:02:05 ubuntu kernel: INFO: Object 0xffff88036c414380 @offset=896 fp=0x (null)
Jan 27 12:02:05 ubuntu kernel: Bytes b4 ffff88036c414370: 18 00 00 00 40 27 a3 1f 3b 56 00 00 00 00 00 00 ....@'..;V......
Jan 27 12:02:05 ubuntu kernel: Object ffff88036c414380: 00 00 00 00 00 00 00 00 00 f0 75 35 00 00 00 00 ..........u5....
Jan 27 12:02:05 ubuntu kernel: Object ffff88036c414390: 80 27 67 81 ff ff ff ff 00 00 00 00 00 00 00 00 .'g.............
Jan 27 12:02:05 ubuntu kernel: Object ffff88036c4143a0: 05 00 00 00 00 00 00 00 80 82 44 ad 05 88 ff ff ..........D.....
Jan 27 12:02:05 ubuntu kernel: Object ffff88036c4143b0: 00 00 00 00 00 00 00 00 10 e1 bc 56 49 56 00 00 ...........VIV..
Jan 27 12:02:05 ubuntu kernel: Object ffff88036c4143c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Jan 27 12:02:05 ubuntu kernel: Object ffff88036c4143d0: 00 00 00 00 00 00 00 00 80 f6 85 6d 03 88 ff ff ...........m....
Jan 27 12:02:05 ubuntu kernel: Object ffff88036c4143e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Jan 27 12:02:05 ubuntu kernel: Object ffff88036c4143f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Jan 27 12:02:05 ubuntu kernel: CPU: 0 PID: 2784 Comm: qemu-system-x86 Tainted: G B I 4.2.0-25-generic 0000030
Jan 27 12:02:05 ubuntu kernel: Hardware name: IBM System x3550 M2 -[794654G]-/49Y6512 , BIOS -[D6E131CUS-1.05]- 11/25/2009
Jan 27 12:02:05 ubuntu kernel: ffff88036c414380 00000000d939cde9 ffff8805adf0f7c8 ffffffff828cafee
Jan 27 12:02:05 ubuntu kernel: 0000000000000080 ffff880373803680 ffff8805adf0f7f8 ffffffff81546759
Jan 27 12:02:05 ubuntu kernel: ffff880373803680 ffffea000db10500 ffff88036c414380 ffff8805ad56d600
Jan 27 12:02:05 ubuntu kernel: Call Trace:

Jan 27 12:02:05 ubuntu kernel: [< inline >] __dump_stack linux-4.2.0/lib/dump_stack.c:15
Jan 27 12:02:05 ubuntu kernel: [<ffffffff828cafee>] dump_stack+0x45/0x57 linux-4.2.0/lib/dump_stack.c:50
Jan 27 12:02:05 ubuntu kernel: [<ffffffff81546759>] print_trailer+0xf9/0x150 linux-4.2.0/mm/slub.c:650
Jan 27 12:02:05 ubuntu kernel: [<ffffffff8154b9c8>] object_err+0x38/0x50 linux-4.2.0/mm/slub.c:657
Jan 27 12:02:05 ubuntu kernel: [< inline >] print_address_description linux-4.2.0/mm/kasan/report.c:120
Jan 27 12:02:05 ubuntu kernel: [<ffffffff8154e3d8>] kasan_report_error+0x1e8/0x3f0 linux-4.2.0/...

Read more...

Andy Whitcroft (apw) wrote :

Interesting that implies that we submitted some kind of async IO, and the IO must have completed and free(io). This implies that the io->req count is getting out of sync with the world. A quick eyeball says we are handling them right, but something is exploding. To try and confirm this is correct I have built a test kernel with a debugging patch applied. This bumps the io->req from 1 (the pending report for the submission of the IO) to 100. If the theory is right the io->req should go to 99 or fewer. If that occurs we should be able to detect it and report the type of the IO in flight. I also have tried to correct for it in the case where that is possible.

Would you be able to test the kernel at the below URL and let me know what you see in dmesg. If the detection triggers we should see "fuse_direct_IO: io->reg would have gone negative" messages, and I would be interested in the content of those when it occurs:

    http://people.canonical.com/~apw/lp1505948-wily/

Builds will be there shortly. Please report any results back here.

Robert Doebbelin (2-robert-3) wrote :
Download full text (18.5 KiB)

The bug triggers with the debug kernel, however there is no message like "fuse_direct_IO: io->reg would have gone negative" in the journal:

Jan 29 16:22:18 ubuntu dnsmasq-dhcp[896]: DHCPREQUEST(virbr0) 192.168.122.93 52:54:00:45:1c:61
Jan 29 16:22:18 ubuntu dnsmasq-dhcp[896]: DHCPACK(virbr0) 192.168.122.93 52:54:00:45:1c:61
Jan 29 16:22:51 ubuntu kernel: BUG: unable to handle kernel paging request at ffff8800904b06c0
Jan 29 16:22:51 ubuntu kernel: IP: [<ffffffff811df264>] __kmalloc+0x94/0x250
Jan 29 16:22:51 ubuntu kernel: PGD 1ff0067 PUD 3738b6063 PMD 0
Jan 29 16:22:51 ubuntu kernel: Oops: 0000 [#1] SMP
Jan 29 16:22:51 ubuntu kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables nls_iso8859_1 ipmi_ssif ipmi_devintf gpio_ich coretemp kvm_intel serio_raw kvm input_leds cdc_ether usbnet mii lpc_ich i7core_edac ioatdma edac_core i5500_temp shpchp dca 8250_fintek ipmi_si mac_hid ipmi_msghandler sunrpc autofs4 hid_generic mptsas mptscsih usbhid mptbase psmouse hid pata_acpi scsi_transport_sas bnx2
Jan 29 16:22:51 ubuntu kernel: CPU: 4 PID: 21954 Comm: qemu-system-x86 Tainted: G I 4.2.0-27-generic #32lp1505948v201601281755
Jan 29 16:22:51 ubuntu kernel: Hardware name: IBM System x3550 M2 -[794654G]-/49Y6512 , BIOS -[D6E131CUS-1.05]- 11/25/2009
Jan 29 16:22:51 ubuntu kernel: task: ffff880380e98c80 ti: ffff8803811d4000 task.ti: ffff8803811d4000
Jan 29 16:22:51 ubuntu kernel: RIP: 0010:[<ffffffff811df264>] [<ffffffff811df264>] __kmalloc+0x94/0x250
Jan 29 16:22:51 ubuntu kernel: RSP: 0018:ffff8803811d79c8 EFLAGS: 00010286
Jan 29 16:22:51 ubuntu kernel: RAX: 0000000000000000 RBX: 00000000000000d0 RCX: 000000000009d36e
Jan 29 16:22:51 ubuntu kernel: RDX: 000000000009d36d RSI: 0000000000000000 RDI: 0000000000019aa0
Jan 29 16:22:51 ubuntu kernel: RBP: ffff8803811d7a08 R08: ffff88067fc19aa0 R09: ffffffff812f8d56
Jan 29 16:22:51 ubuntu kernel: R10: ffff8800904b06c0 R11: 000000000000081a R12: 00000000000000d0
Jan 29 16:22:51 ubuntu kernel: R13: 0000000000000058 R14: ffff8803738037c0 R15: ffff8803738037c0
Jan 29 16:22:51 ubuntu kernel: FS: 00007f384a78eb00(0000) GS:ffff88067fc00000(0000) knlGS:0000000000000000
Jan 29 16:22:51 ubuntu kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 29 16:22:51 ubuntu kernel: CR2: ffff8800904b06c0 CR3: 00000002da9d5000 CR4: 00000000000026e0
Jan 29 16:22:51 ubuntu kernel: Stack:
Jan 29 16:22:51 ubuntu kernel: ffff8803811d7a18 ffffffff812f8d56 ffff880371e2b200 ffff8805993ae0d0
Jan 29 16:22:51 ubuntu kernel: 000000000000000b 00000000000000d0 0000000000000058 ffff8805993ae210
Jan 29 16:22:51 ubuntu kernel: ffff8803811d7a58 ffffffff812f8d56 ffff8803811d7a38 ffff8805993ae0d0
Jan 29 16:22:51 ubuntu kernel: Call Trace:
Jan 29 16:22:51 ubuntu kernel: [<ffffffff812f8d56>] ? __fuse_request_alloc+0x56/0xd0
Jan 29 16:22:51 ubuntu kernel: [<ffffffff812f8d56>] __fuse_request_alloc+0x56/0xd0
Jan 29 16:22:51 ubuntu kernel: [<ffffffff812f9026>] _...

Download full text (4.3 KiB)

I am trying to get some traction on this bug, open for 6 months with no responses.

I have attempted to remove some variables from the equation to see what factors are potentially contributing to this kernel BUG.

First test:

I have replicated the issue on a host that does NOT run a glusterfsd, and thus only consumes a vm image from a separate server, eliminating any potential conflict from having both glusterfs server and client on the same node.

Also, the original hosts used when this bug was first reported were Supermicro Avoton Atom C2750/58. This new replication of the fault is on an older Dell PE2950 (Xeon E54xx), so the specific hardware does not seem to be a factor in the bug.

Reproduction steps:

- Fresh install of Fedora Server 22, minimal package set, with online updates.
- rpm -Uvh http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm.
- Add this node as a new host via oVirt WebAdmin.
- Start a VM on this new node, using a disk image that resides on a glusterfs storage domain.
- Boom!

kernel-4.3.4-200.fc22.x86_64
glusterfs-fuse-3.7.6-1.fc22.x86_64
fuse-2.9.4-3.fc22.x86_64
fuse-libs-2.9.4-3.fc22.x86_64

[ 316.458148] ------------[ cut here ]------------
[ 316.459052] kernel BUG at mm/slub.c:3517!
[ 316.459052] invalid opcode: 0000 [#1] SMP
[ 316.459052] Modules linked in: vhost_net vhost macvtap macvlan ebt_arp ebtable_nat tun nfsv3 nfs fscache fuse ebtable_filter ebtables ip6table_filter ip6_tables scsi_transport_iscsi xt_physdev br_netfilter nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack dm_service_time nf_conntrack coretemp kvm_intel iTCO_wdt ipmi_ssif iTCO_vendor_support gpio_ich kvm ipmi_devintf dcdbas bnx2 lpc_ich ipmi_si i5000_edac edac_core ipmi_msghandler i5k_amb shpchp fjes acpi_cpufreq tpm_tis tpm nfsd 8021q auth_rpcgss garp mrp bridge nfs_acl lockd stp grace llc sunrpc bonding dm_multipath amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper ttm drm ata_generic serio_raw pata_acpi megaraid_sas
[ 316.515263] CPU: 2 PID: 3055 Comm: qemu-system-x86 Not tainted 4.3.4-200.fc22.x86_64 #1
[ 316.515263] Hardware name: Dell Inc. PowerEdge 2950/0M332H, BIOS 2.7.0 10/30/2010
[ 316.515263] task: ffff88041cbbb980 ti: ffff880418e94000 task.ti: ffff880418e94000
[ 316.515263] RIP: 0010:[<ffffffff81203edc>] [<ffffffff81203edc>] kfree+0x12c/0x130
[ 316.515263] RSP: 0018:ffff880418e97cc8 EFLAGS: 00010246
[ 316.515263] RAX: 003ffff800000000 RBX: ffff88002a43fea0 RCX: dead000000000200
[ 316.515263] RDX: 000077ff80000000 RSI: ffff88041cbbb980 RDI: ffff88002a43fea0
[ 316.515263] RBP: ffff880418e97ce0 R08: ffff880418e97ca8 R09: ffffea0000a90fc0
[ 316.515263] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000006e30400
[ 316.515263] R13: ffffffffa054c60e R14: ffff88042b32e400 R15: ffff880418e97dc8
[ 316.515263] FS: 00007f1c4e3ff700(0000) GS:ffff88043fc80000(0000) knlGS:0000000000000000
[ 316.515263] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 316.515263] CR2: 0000000000000000 CR3: 0000000418c96000 CR4: 00000000000026e0
[ 316.515263] Stack:
[ 316.515263] ffff88002a43fea0 0000000006e30400 ffff880418e97e60 ffff880418e97d68
[ 316.515263] ffffffffa054c60e 0000000000007c00 ffff88041c944c...

Read more...

This also affects the Xenial Standard Kernel.

Seth Forshee (sforshee) wrote :

I've been looking at the code, but I haven't found anything aside from the two races mentioned on the mailing list thread. Those could explain the original problems, but I don't have any ideas about the problems seen with the fixes applied yet.

I'm trying to reproduce now using the steps you provided in xenial but am not having any luck. My vm installed just fine and has been running for half an hour now with some synthesized disk IO. Anything you might have forgot to mention in the steps - ntfs-3g mount options, sepcific version of ntfs-3g to use, etc?

Seth Forshee (sforshee) wrote :

I don't seem to be able to reproduce.

I did try making a patch though that you can try that adds a separate reference count to fuse_io_priv separate from the request count. I don't know if it fixes anything that moving spin_unlock() doesn't, but to me this seems more straightforward and less error prone than having the request count serve kind of as a reference count but not really.

A build with my patch and the iocb use-after-free fix are at http://people.canonical.com/~sforshee/lp1505948/.

Robert Doebbelin (2-robert-3) wrote :

Thank you Seth for taking a close look at the problem and my proposed fix. As mentioned on the mailing list my test runs fine now with the two fixes.

However, I prefer your fix as it prevents us from running into this issue again. Our test system is happily installing VMs for two hours now using your build. Please propose your patch.

On Fri, Mar 11, 2016 at 01:03:32PM -0000, Robert Doebbelin wrote:
> Thank you Seth for taking a close look at the problem and my proposed
> fix. As mentioned on the mailing list my test runs fine now with the two
> fixes.
>
> However, I prefer your fix as it prevents us from running into this
> issue again. Our test system is happily installing VMs for two hours now
> using your build. Please propose your patch.

I'm not subscribed to fuse-devel and hadn't refreshed the mailing list
thread so I didn't realize that you had discovered that the hang was
unrelated. That's good.

I'm happy to send the patches, I'll go ahead and send both my patch and
your iocb patch after I make sure it all applies/builds okay on 4.5.

Robert Doebbelin (2-robert-3) wrote :
Download full text (6.0 KiB)

Great, thanks!

Robert
Am 11.03.2016 15:01 schrieb "Seth Forshee" <email address hidden>:

> On Fri, Mar 11, 2016 at 01:03:32PM -0000, Robert Doebbelin wrote:
> > Thank you Seth for taking a close look at the problem and my proposed
> > fix. As mentioned on the mailing list my test runs fine now with the two
> > fixes.
> >
> > However, I prefer your fix as it prevents us from running into this
> > issue again. Our test system is happily installing VMs for two hours now
> > using your build. Please propose your patch.
>
> I'm not subscribed to fuse-devel and hadn't refreshed the mailing list
> thread so I didn't realize that you had discovered that the hang was
> unrelated. That's good.
>
> I'm happy to send the patches, I'll go ahead and send both my patch and
> your iocb patch after I make sure it all applies/builds okay on 4.5.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1505948
>
> Title:
> Memory arena corruption with FUSE (was Memory allocation failure
> crashes kernel hard, presumably related to FUSE)
>
> Status in linux package in Ubuntu:
> Confirmed
> Status in linux source package in Wily:
> Confirmed
> Status in linux package in Fedora:
> Unknown
>
> Bug description:
> Hello everybody,
>
> Linux 4.1, 4.2 or 4.3-rc leads to an immediate kernel panic in our
> setup when trying to start a Qemu process on top of a fuse-based
> mount. Here is an example stacktrace:
>
> [ 739.807817] BUG: unable to handle kernel paging request at
> ffff8800a4104ea0
> [ 739.840201] IP: [<ffffffff811cc95a>] kmem_cache_alloc_trace+0x7a/0x1f0
> [ 739.870309] PGD 2fee067 PUD 2fbf4dd063 PMD 0
> [ 739.890418] Oops: 0000 [#1] SMP
> [ 739.905265] Modules linked in: nbd vport_vxlan vport_gre gre
> ebtable_filter ebtables openvswitch ib_iser rdma_cm iw_cm ib_cm ib_sa
> ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
> ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter
> xt_CT iptable_raw ip_tables xt_tcpudp ip6t_REJECT nf_reject_ipv6 xt_limit
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport xt_conntrack nf_conntrack
> ip6table_filter ip6_tables x_tables dm_crypt ipmi_ssif intel_rapl iosf_mbi
> x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
> glue_helper ablk_helper cryptd kvm_intel kvm ipmi_devintf vhost_net vhost
> macvtap macvlan joydev input_leds dm_multipath scsi_dh bonding sb_edac
> 8021q garp hpilo mrp stp ipmi_si llc edac_core lpc_ich ioatdma 8250_fintek
> ipmi_msghandler lp shpchp acpi_power_meter mac_hid parport nls_iso8859_1
> sch_fq_codel xfs libcrc32c btrfs xor raid6_pq ixgbe ses enclosure
> hid_generic dca vxlan usbhid ip6_udp_tunnel tg3 udp_tunnel ptp hid pps_core
> hpsa mdio wmi
> [ 740.345300] CPU: 8 PID: 10550 Comm: qemu-system-x86 Not tainted
> 4.2.0-040200-generic #201508301530
> [ 740.386879] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 05/06/2015
> [ 740.416827] task: ffff882f8e958dc0 ti: ffff882f28c20000 task.ti:
> ffff882f28c20000
> [ 740.451672] RIP: 0010:[<ffffffff811cc...

Read more...

Created attachment 1137049
proposed patch #1

Created attachment 1137050
proposed patch #2

Could you please test with these two patches?

Miklos,

Those patches look promising. I will endeavour to test them ASAP. If not today, then by the end of the week.

In the interest of not introducing any additional variables into the tests at this point, I will switch my current in-production kernel (kernel-4.2.5-201.fc22.x86_64 recompiled to use SLAB) back to the default/broken SLUB-based allocator, with your two patches applied and test that.

Whether that works or not, I will then apply the patches against the latest kernel-4.4.4-200.fc22 and test that as well.

Thank you for your work on this. I am very pleased to see this bug finally get some attention.

Miklos,

Yahoo! The above two patches have allowed me to return to the SLUB allocator without fuse crashing. VMs started up with no problem, just as they should. This is with kernel 4.2.5.

Having one test node with the patches running VMs for only a few minutes now, I am tentatively calling this one a success. I'll will try the patches on 4.4.4 shortly, but I expect that to work as well.

What are the odds that the Fedora kernel team will incorporate these patches without waiting for it to hit mainline/stable upstream first?

Seth Forshee (sforshee) on 2016-03-22
description: updated
Changed in linux (Ubuntu Wily):
assignee: nobody → Seth Forshee (sforshee)
status: Confirmed → In Progress
Changed in linux (Ubuntu Xenial):
assignee: nobody → Seth Forshee (sforshee)
status: Confirmed → In Progress
Seth Forshee (sforshee) on 2016-03-22
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (4.2 KiB)

This bug was fixed in the package linux - 4.4.0-16.32

---------------
linux (4.4.0-16.32) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1561727

  * fix thermal throttling due to commit "Thermal: initialize thermal zone
    device correctly" (LP: #1561676)
    - Thermal: Ignore invalid trip points

  * Thinkpad T460: Trackpoint mouse buttons instantly generate "release" event
    on press (LP: #1553811)
    - SAUCE: (noup) Input: synaptics - handle spurious release of trackstick
      buttons, again

  * reading /sys/kernel/security/apparmor/profiles requires CAP_MAC_ADMIN
    (LP: #1560583)
    - SAUCE: apparmor: Allow ns_root processes to open profiles file
    - SAUCE: apparmor: Consult sysctl when reading profiles in a user ns

  * linux: sync virtualbox drivers to 5.0.16-dfsg-2 (LP: #1561492)
    - ubuntu: vbox -- update to 5.0.16-dfsg-2

  * s390/kconfig: CONFIG_NUMA without CONFIG_NUMA_EMU does not make any sense on
    s390x (LP: #1557690)
    - [Config] CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=n for s390x

  * spl/zfs fails to build on s390x (LP: #1519814)
    - [Config] s390x -- re-enable zfs
    - [Config] zfs -- disable powerpc until the test failures can be resolved

  * linux: sync to ZFS 0.6.5.6 stable release (LP: #1561483)
    - SAUCE: (noup) Update spl to 0.6.5.6-0ubuntu1, zfs to 0.6.5.6-0ubuntu1

  * zfs: enable zfs for 64bit powerpc kernels (LP: #1558871)
    - [Packaging] zfs -- handle rprovides via dpkg-gencontrol
    - [Config] powerpc -- convert zfs configuration to custom_override

  * Memory arena corruption with FUSE (was Memory allocation failure crashes
    kernel hard, presumably related to FUSE) (LP: #1505948)
    - SAUCE: (noup) fuse: do not use iocb after it may have been freed
    - SAUCE: (noup) fuse: Add reference counting for fuse_io_priv

  * cgroup namespaces: add a 'nsroot=' mountinfo field (LP: #1560489)
    - SAUCE: (noup) cgroup namespaces: add a 'nsroot=' mountinfo field

  * linux packaging: clear remaining redundant delta (LP: #1560445)
    - [Debian] Remove generated intermediate files on clean

  * arm64: guest hangs when ntpd is running (LP: #1549494)
    - Revert "hrtimer: Add support for CLOCK_MONOTONIC_RAW"
    - Revert "hrtimer: Catch illegal clockids"
    - Revert "KVM: arm/arm64: timer: Switch to CLOCK_MONOTONIC_RAW"

  * Need enough contiguous memory to support GICv3 ITS table (LP: #1558828)
    - [Config] CONFIG_FORCE_MAX_ZONEORDER=13 on arm64
    - SAUCE: (no-up) arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium
      ThunderX

  * update arcmsr to version v1.30.00.22-20151126 to fix card timeouts
    (LP: #1559609)
    - arcmsr: fixed getting wrong configuration data
    - arcmsr: fixes not release allocated resource
    - arcmsr: make code more readable
    - arcmsr: adds code to support new Areca adapter ARC1203
    - arcmsr: changes driver version number
    - arcmsr: more readability improvements
    - arcmsr: Split dma resource allocation to a new function
    - arcmsr: change driver version to v1.30.00.22-20151126

  * server image has no keyboard, desktop image works (LP: #1559692)
    - [Config] Rework input-modules (d-i) list

  * PMU sup...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Brad Figg (brad-figg) on 2016-03-29
Changed in linux (Ubuntu Wily):
status: In Progress → Fix Committed
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-wily

Done.

tags: added: verification-done-wily
removed: verification-needed-wily

Confirmed fixed on all nodes of my production cluster with the FUSE patches included in kernel-4.4.8-200.fc22.x86_64.

Launchpad Janitor (janitor) wrote :
Download full text (30.4 KiB)

This bug was fixed in the package linux - 4.2.0-36.41

---------------
linux (4.2.0-36.41) wily; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1571667

  [ Benjamin Tissoires ]

  * SAUCE: Input: synaptics - handle spurious release of trackstick
    buttons, again
    - LP: #1553811

  [ dann frazier ]

  * Revert "SAUCE: arm64, numa, dt: adding dt based numa support using dt
    node property arm, associativity"
    - LP: #1558828
  * Revert "SAUCE: Documentation: arm64/arm: dt bindings for numa."
    - LP: #1558828
  * Revert "SAUCE: arm64, numa: adding numa support for arm64 platforms."
    - LP: #1558828
  * Revert "[Config] Enable NUMA on ARM64"
    - LP: #1558828

  [ K. Y. Srinivasan ]

  * SAUCE: (noup): Drivers: hv: vmbus: Fix a bug in
    hv_need_to_signal_on_read()
    - LP: #1556264

  [ Kamal Mostafa ]

  * [debian] BugLink: close LP: bugs only for Launchpad urls
  * [Config] updateconfigs after v4.2.8-ckt7

  [ Upstream Kernel Changes ]

  * Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
    - LP: #1561677
  * tipc: fix connection abort during subscription cancel
    - LP: #1561677
  * tipc: fix nullptr crash during subscription cancel
    - LP: #1561677
  * s390/mm: four page table levels vs. fork
    - LP: #1561677
  * Input: aiptek - fix crash on detecting device without endpoints
    - LP: #1561677
  * wext: fix message delay/ordering
    - LP: #1561677
  * cfg80211/wext: fix message ordering
    - LP: #1561677
  * mac80211: fix use of uninitialised values in RX aggregation
    - LP: #1561677
  * mac80211: minstrel: Change expected throughput unit back to Kbps
    - LP: #1561677
  * libata: fix HDIO_GET_32BIT ioctl
    - LP: #1561677
  * iwlwifi: mvm: inc pending frames counter also when txing non-sta
    - LP: #1561677
  * [media] adv7604: fix tx 5v detect regression
    - LP: #1561677
  * ahci: add new Intel device IDs
    - LP: #1561677
  * ahci: Order SATA device IDs for codename Lewisburg
    - LP: #1561677
  * Adding Intel Lewisburg device IDs for SATA
    - LP: #1561677
  * ASoC: samsung: Use IRQ safe spin lock calls
    - LP: #1561677
  * mac80211: minstrel_ht: set default tx aggregation timeout to 0
    - LP: #1561677
  * usb: chipidea: otg: change workqueue ci_otg as freezable
    - LP: #1561677
  * jffs2: Fix page lock / f->sem deadlock
    - LP: #1561677
  * Fix directory hardlinks from deleted directories
    - LP: #1561677
  * iommu/amd: Fix boot warning when device 00:00.0 is not iommu covered
    - LP: #1561677
  * iommu/amd: Apply workaround for ATS write permission check
    - LP: #1561677
  * libata: Align ata_device's id on a cacheline
    - LP: #1561677
  * can: gs_usb: fixed disconnect bug by removing erroneous use of kfree()
    - LP: #1561677
  * fbcon: set a default value to blink interval
    - LP: #1561677
  * KVM: x86: fix root cause for missed hardware breakpoints
    - LP: #1561677
  * arm64: vmemmap: use virtual projection of linear region
    - LP: #1561677
  * vfio: fix ioctl error handling
    - LP: #1561677
  * ALSA: ctl: Fix ioctls for X32 ABI
    - LP: #1561677
  * ALSA: pcm: Fix ioctls for X32 ABI
    - LP: #1561677
  * ALSA: rawmidi: Fix ioct...

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Changed in linux (Fedora):
importance: Unknown → Critical
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.