kernel oops in bcache module
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Trusty |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Unassigned | ||
Bionic |
Fix Released
|
Medium
|
Unassigned | ||
Cosmic |
Fix Released
|
Medium
|
Unassigned |
Bug Description
SRU Justification
=================
[Impact]
Some users see panics like the following when performing fstrim on a bcached volume:
[ 529.803060] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 530.183928] #PF error: [normal kernel read fault]
[ 530.412392] PGD 8000001f42163067 P4D 8000001f42163067 PUD 1f42168067 PMD 0
[ 530.750887] Oops: 0000 [#1] SMP PTI
[ 530.920869] CPU: 10 PID: 4167 Comm: fstrim Kdump: loaded Not tainted 5.0.0-rc1+ #3
[ 531.290204] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[ 531.693137] RIP: 0010:blk_
[ 531.922205] Code: 60 38 89 55 a0 45 31 db 45 31 f6 45 31 c9 31 ff 89 4d 98 85 db 0f 84 7f 04 00 00 44 8b 6d 98 4c 89 ee 48 c1 e6 04 49 03 70 78 <8b> 46 08 44 8b 56 0c 48
8b 16 44 29 e0 39 d8 48 89 55 a8 0f 47 c3
[ 532.838634] RSP: 0018:ffffb9b708
[ 533.093571] RAX: 00000000ffffffff RBX: 0000000000046000 RCX: 0000000000000000
[ 533.441865] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000000
[ 533.789922] RBP: ffffb9b708df3a48 R08: ffff940d3b3fdd20 R09: 0000000000000000
[ 534.137512] R10: ffffb9b708df3958 R11: 0000000000000000 R12: 0000000000000000
[ 534.485329] R13: 0000000000000000 R14: 0000000000000000 R15: ffff940d39212020
[ 534.833319] FS: 00007efec26e384
[ 535.224098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 535.504318] CR2: 0000000000000008 CR3: 0000001f4e256004 CR4: 00000000001606e0
[ 535.851759] Call Trace:
[ 535.970308] ? mempool_
[ 536.174152] ? bch_data_
[ 536.403399] blk_mq_
[ 536.607036] generic_
[ 536.819164] submit_
[ 536.980168] ? submit_
[ 537.149731] ? bio_associate_
[ 537.391595] ? _cond_resched+
[ 537.573774] submit_
[ 537.756105] blkdev_
[ 537.959590] ext4_trim_
[ 538.137636] ? ext4_trim_
[ 538.324087] ext4_ioctl+
[ 538.497712] ? _copy_to_
[ 538.679632] do_vfs_
[ 538.853127] ? __do_sys_
[ 539.051951] ksys_ioctl+
[ 539.212785] __x64_sys_
[ 539.394918] do_syscall_
[ 539.568674] entry_SYSCALL_
[Fix]
Under certain conditions, the test for whether an operation should be written back to the underlying device was incorrect. Specifically, in should_writeback(), we were hitting a case where an optimisation for partial stripe conditions was returning true and so should_writeback() was returning true early. This caused the code to go down an incorrect path and create bios that contained NULL pointers.
To fix this issue, make sure that should_writeback() on a discard op never returns true.
[Test Case]
We have observed it on some systems where both:
1) LVM/devmapper is involved (bcache backing device is LVM volume) and
2) writeback cache is involved (bcache cache_mode is writeback)
Not every machine exhibits the bug. On one machine that does exhibit the bug, we can reliably reproduce it with:
# echo writeback > /sys/block/
# mount /dev/bcache0 /test
# for i in {0..10}; do file="$(mktemp /test/zero.XXX)"; dd if=/dev/zero of="$file" bs=1M count=256; sync; rm $file; done; fstrim -v /test
[Regression Potential]
This could affect any device where bcache is used.
In mitigation, however: the patch is simple, is limited to considering discard operations. The patch has been accepted upstream [1] and the maintainer will be including it in SuSE kernels [2]. A Gentoo user validated the upstream patch independently [3].
[1] https:/
[2] https:/
[3] https:/
[Original Description]
This was on an 18.04.1 install running the 4.15-34 generic kernel image, running from a normal ext4 root device.
I had just a short while before created a new bcache device that was mounted but to which no data had been written yet. Then without any apparent particular reason, an apport error popped up to inform of a bcache kernel oops. Crash log was uploaded but no idea how to link it, so I attach it as well.
Mostly I would like to know how concerned I should be as after a previous, successful test I wanted to move the whole install to bcache. Ideally, if this is a bug or similar, it would be nice if it could get fixed.
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-
ProcVersionSign
Uname: Linux 4.15.0-34-generic x86_64
NonfreeKernelMo
ApportVersion: 2.20.9-0ubuntu7.3
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Sat Sep 22 18:20:22 2018
HibernationDevice: RESUME=
InstallationDate: Installed on 2014-07-29 (1515 days ago)
InstallationMedia: It
IwConfig:
zthnhe3w6d no wireless extensions.
eth1 no wireless extensions.
lo no wireless extensions.
MachineType: System manufacturer System Product Name
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=de_DE.UTF-8
SHELL=/bin/bash
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 1.173.1
RfKill:
0: hci0: Bluetooth
Soft blocked: yes
Hard blocked: no
SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2018-09-07 (15 days ago)
dmi.bios.date: 10/22/2015
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0604
dmi.board.
dmi.board.name: H170I-PLUS D3
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.family: Default string
dmi.product.name: System Product Name
dmi.product.
dmi.sys.vendor: System manufacturer
CVE References
tags: | added: invalid |
description: | updated |
tags: | removed: amd64 bionic invalid needs-bisect |
Changed in linux (Ubuntu): | |
status: | Confirmed → Fix Committed |
Changed in linux (Ubuntu Xenial): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Trusty): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Xenial): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Trusty): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in linux (Ubuntu Cosmic): | |
status: | New → In Progress |
Changed in linux (Ubuntu Bionic): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Cosmic): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Cosmic): | |
status: | In Progress → Fix Committed |
repost with ubuntu-bug and for different package as this one might be more generally applicable. This is also reproducible, happens each time sometime after mount, even after a reboot. New crash log is attached.