trusty bcache NULL pointer exception

Bug #1754581 reported by Ryan Harper on 2018-03-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Trusty
Medium
Unassigned

Bug Description

successfully wiped device /dev/md0 on attempt 1/4
shutdown running on holder type: 'bcache' syspath: '/sys/class/block/bcache0'
stopping bcache cacheset at: /sys/fs/bcache/7834b3df-d029-49a3-9c1b-3fd628db10f2
waiting for /sys/fs/bcache/7834b3df-d029-49a3-9c1b-3fd628db10f2 to be removed
/sys/fs/bcache/7834b3df-d029-49a3-9c1b-3fd628db10f2 has been removed
[ 70.788038] BUG: unable to handle kernel NULL pointer dereference at 0000000000000a00
[ 70.789098] IP: [<ffffffffa0272440>] journal_write_unlocked+0x130/0x570 [bcache]
[ 70.790167] PGD 800000000dfa0067 PUD dfb2067 PMD 0
[ 70.790831] Oops: 0000 [#1] SMP
[ 70.791314] Modules linked in: bcache ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr dm_crypt kvm_intel kvm ppdev serio_raw parport_pc parport mac_hid i2c_piix4 overlayfs squashfs iscsi_ibft iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_memcpy async_raid6_recov async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear psmouse virtio_scsi floppy pata_acpi
[ 70.792016] CPU: 0 PID: 10996 Comm: kworker/0:3 Not tainted 3.13.0-142-generic #191-Ubuntu
[ 70.792016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 70.792016] Workqueue: events journal_write_work [bcache]
[ 70.792016] task: ffff88000df76000 ti: ffff88003a14c000 task.ti: ffff88003a14c000
[ 70.792016] RIP: 0010:[<ffffffffa0272440>] [<ffffffffa0272440>] journal_write_unlocked+0x130/0x570 [bcache]
[ 70.792016] RSP: 0000:ffff88003a14dd88 EFLAGS: 00010202
[ 70.792016] RAX: 0000000000000000 RBX: ffff88003bf4cba0 RCX: 0000000000000000
[ 70.792016] RDX: ffff88003bf40c48 RSI: ffff88003bf4cad8 RDI: ffff88003bec8040
[ 70.792016] RBP: ffff88003a14dde0 R08: 2000efd32f400000 R09: 5e80000000000000
[ 70.792016] R10: dffe982d0cb4cbd0 R11: 0000000000000005 R12: 0000000000000031
[ 70.792016] R13: ffff88003bf4ccc8 R14: 0000000000000001 R15: ffff88003bf40000
[ 70.792016] FS: 0000000000000000(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000
[ 70.792016] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 70.792016] CR2: 0000000000000a00 CR3: 000000003a1ae000 CR4: 0000000000000670
[ 70.792016] Stack:
[ 70.792016] ffff88003a14ddb0 ffff88003bf40000 ffff88003a14dde8 ffffffff81013558
[ 70.792016] 000000000df71868 ffff88000df764e8 ffff88003bf40000 ffff88003bf4cba0
[ 70.792016] ffff88003e217300 0000000000000000 ffff88003bf4cbd0 ffff88003a14de00
[ 70.792016] Call Trace:
[ 70.792016] [<ffffffff81013558>] ? __switch_to+0xe8/0x4f0
[ 70.792016] [<ffffffffa02728d0>] journal_try_write+0x50/0x60 [bcache]
[ 70.792016] [<ffffffffa0272902>] journal_write_work+0x22/0x30 [bcache]
[ 70.792016] [<ffffffff81087cb8>] process_one_work+0x178/0x470
[ 70.792016] [<ffffffff81088ad1>] worker_thread+0x121/0x410
[ 70.792016] [<ffffffff810889b0>] ? rescuer_thread+0x430/0x430
[ 70.792016] [<ffffffff8108f8b9>] kthread+0xc9/0xe0
[ 70.792016] [<ffffffff8108f7f0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 70.792016] [<ffffffff8173f6a8>] ret_from_fork+0x58/0x90
[ 70.792016] [<ffffffff8108f7f0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 70.792016] Code: 10 00 00 00 e8 a2 7a 10 e1 31 c0 66 83 bb 94 38 ff ff 00 48 8b 8b a0 40 ff ff 49 8d 97 48 0c 00 00 74 3b 0f 1f 84 00 00 00 00 00 <48> 8b b9 00 0a 00 00 0f b7 89 ce 00 00 00 83 c0 01 49 8b 75 00
[ 70.792016] RIP [<ffffffffa0272440>] journal_write_unlocked+0x130/0x570 [bcache]
[ 70.792016] RSP <ffff88003a14dd88>
[ 70.792016] CR2: 0000000000000a00
[ 70.792016] ---[ end trace c5387a4b27c41667 ]---
Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
[ 70.831831] BUG: unable to handle kernel paging request at ffffffffffffffd8
[ 70.832822] IP: [<ffffffff8108ff90>] kthread_data+0x10/0x20
[ 70.833396] PGD 1c13067 PUD 1c15067 PMD 0
[ 70.833396] Oops: 0000 [#2] SMP
[ 70.833396] Modules linked in: bcache ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr dm_crypt kvm_intel kvm ppdev serio_raw parport_pc parport mac_hid i2c_piix4 overlayfs squashfs iscsi_ibft iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_memcpy async_raid6_recov async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear psmouse virtio_scsi floppy pata_acpi
[ 70.833396] CPU: 0 PID: 10996 Comm: kworker/0:3 Tainted: G D 3.13.0-142-generic #191-Ubuntu
[ 70.833396] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 70.833396] task: ffff88000df76000 ti: ffff88003a14c000 task.ti: ffff88003a14c000
[ 70.833396] RIP: 0010:[<ffffffff8108ff90>] [<ffffffff8108ff90>] kthread_data+0x10/0x20
[ 70.833396] RSP: 0018:ffff88003a14d9c0 EFLAGS: 00010002
[ 70.833396] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000f
[ 70.833396] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffff88000df76000
[ 70.833396] RBP: ffff88003a14d9c0 R08: 0000000000000000 R09: ffff88003e216f20
[ 70.833396] R10: ffffffff8106887c R11: ffffea0000357200 R12: ffff88003e213b00
[ 70.833396] R13: 0000000000000000 R14: ffff88000df75ff0 R15: ffff88000df76000
[ 70.833396] FS: 0000000000000000(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000
[ 70.833396] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 70.833396] CR2: 0000000000000028 CR3: 00000000384d0000 CR4: 0000000000000670
[ 70.833396] Stack:
[ 70.833396] ffff88003a14d9d8 ffffffff81089291 ffff88000df76000 ffff88003a14da38
[ 70.833396] ffffffff817329b9 ffff88000df76000 0000000000013b00 ffff88003a14dfd8
[ 70.833396] 0000000000013b00 ffff88000df76000 ffff88000df76650 ffff88000df75ff0
[ 70.833396] Call Trace:
[ 70.833396] [<ffffffff81089291>] wq_worker_sleeping+0x11/0x90
[ 70.833396] [<ffffffff817329b9>] __schedule+0x539/0x730
[ 70.833396] [<ffffffff81732bd9>] schedule+0x29/0x70
[ 70.833396] [<ffffffff8106d95f>] do_exit+0x6cf/0xa60
[ 70.833396] [<ffffffff817381c9>] oops_end+0xa9/0x150
[ 70.833396] [<ffffffff81727106>] no_context+0x27e/0x28b
[ 70.833396] [<ffffffff81727186>] __bad_area_nosemaphore+0x73/0x1ca
[ 70.833396] [<ffffffff817272f0>] bad_area_nosemaphore+0x13/0x15
[ 70.833396] [<ffffffff8173ac67>] __do_page_fault+0xa7/0x560
[ 70.833396] [<ffffffff810a861c>] ? check_preempt_wakeup+0x17c/0x270
[ 70.833396] [<ffffffff8173757a>] ? error_entry+0x12a/0x179
[ 70.833396] [<ffffffff81737573>] ? error_entry+0x123/0x179
[ 70.833396] [<ffffffff8173756c>] ? error_entry+0x11c/0x179
[ 70.833396] [<ffffffff81737565>] ? error_entry+0x115/0x179
[ 70.833396] [<ffffffff8173755e>] ? error_entry+0x10e/0x179
[ 70.833396] [<ffffffff81737557>] ? error_entry+0x107/0x179
[ 70.833396] [<ffffffff81737550>] ? error_entry+0x100/0x179
[ 70.833396] [<ffffffff81737549>] ? error_entry+0xf9/0x179
[ 70.833396] [<ffffffff81737542>] ? error_entry+0xf2/0x179
[ 70.833396] [<ffffffff8173753b>] ? error_entry+0xeb/0x179
[ 70.833396] [<ffffffff81737534>] ? error_entry+0xe4/0x179
[ 70.833396] [<ffffffff8173752d>] ? error_entry+0xdd/0x179
[ 70.833396] [<ffffffff81737526>] ? error_entry+0xd6/0x179
[ 70.833396] [<ffffffff8173751f>] ? error_entry+0xcf/0x179
[ 70.833396] [<ffffffff81737518>] ? error_entry+0xc8/0x179
[ 70.833396] [<ffffffff81737511>] ? error_entry+0xc1/0x179
[ 70.833396] [<ffffffff8173750a>] ? error_entry+0xba/0x179
[ 70.833396] [<ffffffff81737503>] ? error_entry+0xb3/0x179
[ 70.833396] [<ffffffff8173b13a>] do_page_fault+0x1a/0x70
[ 70.833396] [<ffffffff817374cb>] ? error_entry+0x7b/0x179
[ 70.833396] [<ffffffff817374c4>] ? error_entry+0x74/0x179
[ 70.833396] [<ffffffff8173a7c9>] do_async_page_fault+0x29/0xe0
[ 70.833396] [<ffffffff817372e8>] async_page_fault+0x28/0x30
[ 70.833396] [<ffffffffa0272440>] ? journal_write_unlocked+0x130/0x570 [bcache]
[ 70.833396] [<ffffffffa027241e>] ? journal_write_unlocked+0x10e/0x570 [bcache]
[ 70.833396] [<ffffffff81013558>] ? __switch_to+0xe8/0x4f0
[ 70.833396] [<ffffffffa02728d0>] journal_try_write+0x50/0x60 [bcache]
[ 70.833396] [<ffffffffa0272902>] journal_write_work+0x22/0x30 [bcache]
[ 70.833396] [<ffffffff81087cb8>] process_one_work+0x178/0x470
[ 70.833396] [<ffffffff81088ad1>] worker_thread+0x121/0x410
[ 70.833396] [<ffffffff810889b0>] ? rescuer_thread+0x430/0x430
[ 70.833396] [<ffffffff8108f8b9>] kthread+0xc9/0xe0
[ 70.833396] [<ffffffff8108f7f0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 70.833396] [<ffffffff8173f6a8>] ret_from_fork+0x58/0x90
[ 70.833396] [<ffffffff8108f7f0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 70.833396] Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 87 c0 03 00 00 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[ 70.833396] RIP [<ffffffff8108ff90>] kthread_data+0x10/0x20
[ 70.833396] RSP <ffff88003a14d9c0>
[ 70.833396] CR2: ffffffffffffffd8
[ 70.833396] ---[ end trace c5387a4b27c41668 ]---
[ 70.833396] Fixing recursive fault but reboot is needed!

Ryan Harper (raharper) on 2018-03-09
tags: added: curtin

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1754581

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Trusty):
status: New → Incomplete
tags: added: trusty
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.16 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc4

Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key

On Fri, Mar 9, 2018 at 9:49 AM, Joseph Salisbury
<email address hidden> wrote:
> Did this issue start happening after an update/upgrade? Was there a
> prior kernel version where you were not having this particular problem?

I don't think so; how we're using/clearing/stopping bcache has changed.
I can reproduce this on multiple trusty kernels:

3.13.0-101-generic (buildd@lgw01-40) #148-Ubuntu SMP Thu Oct 20
22:08:32 UTC 2016
3.13.0-107-generic (buildd@lcy01-09) #154-Ubuntu SMP Tue Dec 20
09:57:27 UTC 2016
3.13.0-137-generic (buildd@lgw01-amd64-058) #186-Ubuntu SMP Mon Dec 4
19:09:19 UTC 2017
3.13.0-143-generic (buildd@lcy01-amd64-010) #192-Ubuntu SMP Tue Feb 27
10:45:36 UTC 2018

>
> Would it be possible for you to test the latest upstream kernel? Refer
> to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
> v4.16 kernel[0].

I can confirm that newer kernels do not have this issue:

4.4.0-116-generic (buildd@lcy01-amd64-023) (gcc version 4.8.4 (Ubuntu
4.8.4-2ubuntu1~14.04.4))

Xenial, Artful and Bionic are fine.

>
> If this bug is fixed in the mainline kernel, please add the following
> tag 'kernel-fixed-upstream'.
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-bug-exists-upstream'.
>
> Once testing of the upstream kernel is complete, please mark this bug as
> "Confirmed".
>
>
> Thanks in advance.
>
> [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc4
>
> ** Changed in: linux (Ubuntu Trusty)
> Importance: Undecided => Medium
>
> ** Changed in: linux (Ubuntu)
> Importance: Undecided => Medium
>
> ** Tags added: kernel-da-key
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1754581
>
> Title:
> trusty bcache NULL pointer exception
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1754581/+subscriptions

Joseph Salisbury (jsalisbury) wrote :

Do you happen to know if this is a regression in Trusty, or if this bug always existed in Trusty and was fixed in Xenial and newer kernels?

Changed in linux (Ubuntu Trusty):
status: Incomplete → Triaged
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Ryan Harper (raharper) wrote :

I think it may have always been present; I can recreate this on Trusty GA kernels; we've changed how we clean/remove bcache devices which I believe is triggering a new path on Trusty kernels.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers