Regression in kernel 4.15.0-91 causes kernel panic with Bcache

Bug #1867916 reported by Sebastian Marsching
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Confirmed
Medium
linux (Ubuntu)
Fix Released
Medium
Mauricio Faria de Oliveira
Xenial
Fix Released
Medium
Mauricio Faria de Oliveira
Bionic
Fix Released
Medium
Mauricio Faria de Oliveira
Eoan
Won't Fix
Medium
Mauricio Faria de Oliveira
Focal
Fix Released
Medium
Mauricio Faria de Oliveira
Groovy
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * Users of bcache who manually specified a block size
   greater than the page size when creating the device
   with 'make-bcache' started to hit a kernel BUG/oops
   after kernel upgrades. (This is not widely used.)

 * The issue has been exposed with commit ad6bf88a6c19
   ("block: fix an integer overflow in logical block size")
   because it increased the range of values accepted as
   logical block size, which used to overflow to zero,
   and thus receive a default of 512 via block layer.

 * The issue existed previously, but with fewer values
   exposed (e.g. 8k, 16k, 32k); the regression reports
   happened with larger values (512k) for RAID stripes.

[Fix]

 * The upstream commit dcacbc1242c7 ("bcache: check and
   adjust logical block size for backing devices") checks
   the block size and adjusts it if needed, to the value
   of the underlying device's logical block size.

 * It is merged as of v5.8-rcN, and sent to v5.7 stable.

[Test Case]

 * Run make-bcache with block size greater than page size.
   $ sudo make-bcache --bdev $DEV --block 8k

 * Expected results: bcache device registered; no BUG/oops.
 * Details steps on comment #43.

[Regression Potential]

 * Restricted to users who specify a bcache block size
   greater than page size.

 * Regressions could theoretically manifest on bcache
   device probe/register, if the underlying device's
   logical block size for whatever triggers issues not
   seen previously with the overflow/default 512 bytes.

[Other Info]

 * Unstable has the patch on both master/master-5.7.
 * Groovy should get it on rebase.

[Original Bug Description]
After upgrading from kernel 4.15.0-88 to 4.15.0-91 one of our systems does not boot any longer. It always crashes during boot with a kernel panic.

I suspect that this crash might be related to Bcache because this is the only one of our systems where we use Bcache and the kernel panic appears right after Bcache initialization.

I already checked that this bug still exists in the 4.15.0-92.93 kernel from proposed.

Unfortunately, I cannot do a bisect because this is a critical production system and we do not have any other system with a similar configuration.

I attached a screenshot with the trace of the kernel panic.

The last message that appears before the kernel panic (or rather the last one that I can see - there is a rather long pause between that message and the panic and I cannot scroll up far enough to ensure that there are no other messages in between) is:

bcache: register_bcache() error /dev/dm-0: device already registered

When booting with kernel 4.15.0-88 that does not have this problem, the next message is

bcache: register_bcache() error /dev/dm-12: device already registered (emitting change event)

After that the next message is:

Begin: Loading essential drivers ... done

This message also appears after the kernel panic, but the boot process stalls and the system can only be recovered by doing a hardware reset.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-88-generic 4.15.0-88.88
ProcVersionSignature: Ubuntu 4.15.0-88.88-generic 4.15.18
Uname: Linux 4.15.0-88-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Mar 17 21:08 seq
 crw-rw---- 1 root audio 116, 33 Mar 17 21:08 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.11
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Wed Mar 18 12:55:18 2020
HibernationDevice: RESUME=UUID=40512ea2-9fce-40f5-8362-5daf955cc26a
InstallationDate: Installed on 2013-07-02 (2450 days ago)
InstallationMedia: Ubuntu-Server 12.04.2 LTS "Precise Pangolin" - Release amd64 (20130214)
MachineType: HP ProLiant DL160 G6
PciMultimedia:

ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-88-generic root=/dev/mapper/vg0-root ro nosmt nomdmonddf nomdmonisw nomdmonddf nomdmonisw nomdmonddf nomdmonisw nomdmonddf nomdmonisw nomdmonddf nomdmonisw
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-88-generic N/A
 linux-backports-modules-4.15.0-88-generic N/A
 linux-firmware 1.173.16
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2018-09-23 (541 days ago)
dmi.bios.date: 11/06/2009
dmi.bios.vendor: HP
dmi.bios.version: O33
dmi.chassis.asset.tag: 0191525
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrO33:bd11/06/2009:svnHP:pnProLiantDL160G6:pvr:cvnHP:ct23:cvr:
dmi.product.name: ProLiant DL160 G6
dmi.sys.vendor: HP

Revision history for this message
Sebastian Marsching (sebastian-marsching) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Sebastian,

Thanks for reporting this bug.
It indeed seems related to bcache per the stack trace in the screenshot.

Could you please collect and upload a sosreport of the failing system (w/ any kernel version.)

$ sudo sosreport --batch --case-id lp1867916

This should generate a /tmp/sosreport-* tarball, mentioned in the output.

Thanks,
Mauricio

Revision history for this message
Sebastian Marsching (sebastian-marsching) wrote :

Hi Mauricio,

there is the problem that this tarball contains an awful lot of sensitive data, some of that personal data that is protected under the GDPR.

Maybe you could tell me which specific parts / sections you are interested in, so that I can anonymize and upload them.

Thanks,
Sebastian

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Sebastian,

Sure, that's certainly understandable.
For starters, the topology of the block/bcache devices would do it:

(set -x; lsblk; sudo dmsetup table; grep -r ^ /sys/block/bcache*/bcache) > lp1867916.1.out 2>&1

Then please upload the output file (or copy/paste, if file access is though.)

Thanks,
Mauricio

Revision history for this message
Sebastian Marsching (sebastian-marsching) wrote :
Revision history for this message
Sebastian Marsching (sebastian-marsching) wrote :

Hi Mauricio,

thanks for your support. I have attached the requested information to the bug. I had to pseudomize some of the logical volume names because they are considered sensitive, but I did so in a consistent way so the structure should still be clear.

So that you don’t have to look through all details, here is a rough overview:

There are six physical disks: 4 HDDs (sda, sdb, sdc, sdd) and 2 SSDs (sde, sdf).

There are three DM RAID devices:

- md0 is a RAID 1 that uses /dev/sda1, /dev/sdb1, /dev/sdc1, and /dev/sdd1. This device stores the boot partition.
- md1 is a RAID 6 that uses /dev/sda2, /dev/sdb2, /dev/sdc2, and /dev/sdd2. This device is the only physical volume backing volume group vg0.
- md2 is a RAID 1 that uses /dev/sde1 and /dev/sdf1. This device is used as the cache device (read/write) for Bcache.

There are three LVM volume groups:

- vg0 is backed by /dev/md0 and stores the root FS, the swap partition, a few LVs that are used for VMs, and an LV called vg2-backend, that is used by Bcache.
- vg1 is backed by /dev/md2 and only has an LV called vg2-cache that is used by Bcache.
- vg2 is backed by /dev/bcache0. It only contains LVs that are used for VMs.

There is one Bcache device:

- bcache0 is backed by /dev/vg0/vg2-backend (backend device) and /dev/vg1/vg2-cache (cache device). This device is used as the PV for vg2.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Sebastian,

Thanks for collecting the data and describing the block device topology, that's very helpful.

I'll try to reproduce the problem and analyze it.

cheers,
Mauricio

Changed in linux (Ubuntu):
assignee: nobody → Mauricio Faria de Oliveira (mfo)
importance: Undecided → Medium
status: Confirmed → In Progress
Revision history for this message
Ryan Finnie (fo0bar) wrote :

I can confirm going from 4.15.0-88 to 4.15.0-91 on my bcache system panics in the same way. Here's my layout:

-> sd{c,d,f,g,h}: each 4TB gpt, sd{c,d,f,g,h}1: each type linux_raid_member
--> md0: raid6, sd{c,d,f,g,h}1
--> sda: 512GB gpt, sda1: type bcache
---> bcache0: md0 + sda1
----> whatadisk_crypt: LUKS on bcache0
-----> LVM VG whatadisk: whatadisk_crypt
------> multiple LVs

There are also a handful of other disks (for example, nvme0n1 is the encrypted boot disk), but the layout above is the only part which bcache touches.

Happy to help with additional info or testing, please let me know.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Sebastian and Ryan,

I've created the setup that each of you described on a VM,
but unfortunately it wasn't able to reproduce the problem.

I double checked the output of lsblk with Sebastian's and
it's the same (also the bcache sysfs configuration/status),
and with Ryan's description, and both matched.

So apparently there's something else involved into bcache,
maybe a timing or corner case specific to actual hardware.

Ryan, since you're able to help w/ additional info/testing
(thanks you!), could you please collect a kernel crashdump?

I'll provide the configuration/test steps in another comment.

Should you have any questions or need assistance with those,
please just let me know.

cheers,
Mauricio

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (4.0 KiB)

Ryan,

Part 1)
------

First, please try to reproduce the problem later, not so early in boot,
by disabling the bcache module on the kernel boot parameters, and then
loading it after the system has booted successfully.
(This should be possible as you mentioned the boot disk isn't involved.)

1) Edit '/etc/fstab' and either comment or add the 'noauto' option to
the mounts depending on bcache, so that systemd doesn't delay on boot.

For example,

$ sudo vim /etc/fstab
From: /dev/mapper/*whatadisk* /mountpoint ext4 defaults 0 0
To: /dev/mapper/*whatadisk* /mountpoint ext4 defaults,noauto 0 0
Esc, :x, Enter

2) Edit '/etc/default/grub' and add the 'modprobe.blacklist=bcache' option
to GRUB_CMDLINE_LINUX_DEFAULT.

For example,

$ sudo vim /etc/default/grub
From: GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0"
To: GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0 modprobe.blacklist=bcache"
Esc, :x, Enter

Update and check grub config:

$ sudo update-grub

$ grep modprobe.blacklist=bcache /boot/grub/grub.cfg
                linux /boot/vmlinuz-4.15.0-91-generic ... modprobe.blacklist=bcache
                linux /boot/vmlinuz-4.15.0-88-generic ... modprobe.blacklist=bcache

3) Reboot the system in 4.15.0-91, it should not fail, as bcache is not loaded.

4) Now load bcache, retrigger device events, and check if the problem reproduces.

$ sudo modprobe bcache
$ sudo udevadm trigger

This should register the bcache devices, e.g., /dev/bcache0.

If you can see /dev/bcache0 and the problem did NOT happen,
please stop here and let me know.

If the problem reproduced, please proceed after your system
rebooted (it should boot normally as it has bcache disabled.)

...

Part 2)
------

1) Install linux-crashdump:

$ sudo apt install linux-crashdump

Answer these questions:

- Should kexec-tools handle reboots (sysvinit only)? No
- Should kdump-tools be enabled by default? Yes

2) Increase the reserved memory size for the crashdump kernel:

Edit '/etc/default/grub.d/kdump-tools.cfg' and change the crashkernel size from 192M to 512M or 768M if possible:

For example,

$ sudo vim /etc/default/grub.d/kdump-tools.cfg
from: GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=512M-:192M"
to: GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=512M-:768M"
Esc, :x, Enter

4) Update grub and reboot

$ sudo update-grub
$ sudo reboot

5) Check kdump status is 'ready' and that panic_on_oops is enabled (1) by default:

$ sudo kdump-config status
current state: ready to kdump

$ cat /proc/sys/kernel/panic_on_oops
1

6) Trigger a test crashdump

$ echo 1 | sudo tee /proc/sys/kernel/sysrq
$ echo c | sudo tee /proc/sysrq-trigger

This apparently 'reboots' the system, and collects a memory dump:

[ 8.510809] kdump-tools[781]: Starting kdump-tools: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/202004081540/dump-incomplet$
...
Copying data : [100.0 %] - eta: 0s
...
[ 15.964149] kdump-tools[781]: * kdump-tools: saved vmcore in /var/crash/202004081540
...
[ 16.176388] kdump-tools[781]: * kdump-tools: saved dmesg content in /var/crash/202004081540
...
[ 17.187848] kdump-tools[781]: Re...

Read more...

Revision history for this message
Ryan Finnie (fo0bar) wrote :

Here you go: https://www.finnie.org/stuff/lp1867916-crashdump.tar.xz (138MB)

Some notes on the process:
- Also blacklisted it87 (DKMS) so the running kernel wasn't "tainted"
- Also disabled the relevant crypttab entry for this group
- 768M produced "crashkernel reservation failed - No suitable area found" in dmesg, used 512M instead
- Had to disable secure boot (https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1869672 which you also appear to be working on, heh)

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Ryan,

Thanks a bunch for the crashdump; that helps.

I can confirm there's a valid and matching stack trace in the dmesg file.

I'll take a look.

cheers,
Mauricio

Revision history for this message
Ryan Finnie (fo0bar) wrote :
Download full text (5.4 KiB)

BTW, for future searchers, I've uploaded dmesg.202004081903 separately, and pasted the crash here:

[ 194.444436] bcache: bch_journal_replay() journal replay done, 3 keys in 6 entries, seq 23285862
[ 194.444622] bcache: register_cache() registered cache device sdb1
[ 194.448381] bcache: register_bdev() registered backing device md0
[ 194.602075] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 194.602100] IP: create_empty_buffers+0x29/0xf0
[ 194.602110] PGD 0 P4D 0
[ 194.602121] Oops: 0000 [#1] SMP NOPTI
[ 194.602137] Modules linked in: bcache ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter pps_ldisc aufs overlay cmac bnep bonding bridge stp llc arc4 snd_hda_codec_hdmi nls_iso8859_1 edac_mce_amd kvm_amd kvm irqbypass rtl8xxxu mac80211 btusb btrtl btbcm snd_hda_intel btintel snd_hda_codec bluetooth cfg80211 eeepc_wmi snd_hda_core asus_wmi joydev sparse_keymap snd_hwdep wmi_bmof input_leds snd_pcm ecdh_generic snd_timer snd soundcore ccp k10temp shpchp mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core nfsd iscsi_tcp libiscsi_tcp auth_rpcgss libiscsi nfs_acl scsi_transport_iscsi lockd grace sunrpc ip_tables x_tables autofs4 btrfs zstd_compress algif_skcipher af_alg dm_crypt raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
[ 194.602323] xor raid6_pq libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid nouveau crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc mxm_wmi video ttm drm_kms_helper aesni_intel syscopyarea sysfillrect igb sysimgblt aes_x86_64 fb_sys_fops crypto_simd glue_helper dca cryptd i2c_algo_bit drm i2c_piix4 ptp ahci nvme pps_core libahci nvme_core gpio_amdpt wmi gpio_generic
[ 194.602406] CPU: 1 PID: 4403 Comm: bcache-register Not tainted 4.15.0-91-generic #92-Ubuntu
[ 194.602425] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4207 12/08/2018
[ 194.602447] RIP: 0010:create_empty_buffers+0x29/0xf0
[ 194.602459] RSP: 0018:ffffa833cc37f7a8 EFLAGS: 00010246
[ 194.602471] RAX: 0000000000000000 RBX: fffff6227c3f4c00 RCX: 0000000000000013
[ 194.602487] RDX: 0000000000000000 RSI: 0000000000080000 RDI: fffff6227c3f4c00
[ 194.602503] RBP: ffffa833cc37f7c0 R08: 0000000000000001 R09: dead0000000000ff
[ 194.602519] R10: ffff8976ca4f7aa0 R11: 0000000000000000 R12: 0000000000000000
[ 194.602535] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000200
[ 194.602551] FS: 00007f51c3bb2500(0000) GS:ffff89771ec40000(0000) knlGS:0000000000000000
[ 194.602569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 194.602582] CR2: 0000000000000008 CR3: 0000000f24aea000 CR4: 00000000003406e0
[ 194.602598] Call Trace:
[ 194.602606] create_page_buffers+0x51/0x60
[ 194.602616] block_read_full_page+0x4e/0x370
[ 194.602626] ? set_init_blocksize+0x80/0x80
[ 194.602636] blkdev_readpage+0x18/0x20
[ 194.602646] do_read_cache_page+0x2a2/0x580
[ 194.602656] ? blkdev_writepages+0x40/0x40
[ 194.602667] ? update_load_avg+0x57f/0x6e0
[ 194.602676] read_cache_page+0x15/0x20
[ 194.602687] read_dev_sector+0x2d/0xd0
[ 194.602696] read_lba+0x130/0x220
[ 194.6027...

Read more...

tags: added: seg
Revision history for this message
Ryan Finnie (fo0bar) wrote :

I've bisected the problem down to commit c35a4a858d0616e7817026d88f377c7201ad449a ("block: fix an integer overflow in logical block size", upstream ad6bf88a6c19a39fb3b0045d78ea880325dfcf15).

I don't know what the exact problem is with the commit, but seems to be in the area of fs/block_dev.c set_init_blocksize() calling bdev_logical_block_size() which the offending commit touches.

# bad: [a78d21bd8bb58c158f73108eb7d7402619fcae3d] UBUNTU: Ubuntu-4.15.0-91.92
# good: [d5b8ff45eabff3cb2232a2eea38a862edc647ab8] UBUNTU: Ubuntu-4.15.0-88.88
git bisect start 'Ubuntu-4.15.0-91.92' 'Ubuntu-4.15.0-88.88'
# bad: [c04c6d87ca3a7e2c8019e0a2349fcf1175ffcce0] KVM: PPC: Release all hardware TCE tables attached to a group
git bisect bad c04c6d87ca3a7e2c8019e0a2349fcf1175ffcce0
# good: [9a6b3cea5792f94f03f579c668a60eb4ddd209f0] dt-bindings: reset: meson8b: fix duplicate reset IDs
git bisect good 9a6b3cea5792f94f03f579c668a60eb4ddd209f0
# bad: [6783f1bf22bb55bcd2628d7cabeacc9c720971e6] ALSA: usb-audio: update quirk for B&W PX to remove microphone
git bisect bad 6783f1bf22bb55bcd2628d7cabeacc9c720971e6
# skip: [c4099e7e88621e82a5b3a5b0a2e3d6d8eee4e8e0] ptp: free ptp device pin descriptors properly
git bisect skip c4099e7e88621e82a5b3a5b0a2e3d6d8eee4e8e0
# bad: [3222d8b5d803ed733dba2da03b595cf90c871fe6] drm/nouveau/mmu: qualify vmm during dtor
git bisect bad 3222d8b5d803ed733dba2da03b595cf90c871fe6
# skip: [3db4efba8dea644e54e339ac2e46b497aea93638] NFC: pn533: fix bulk-message timeout
git bisect skip 3db4efba8dea644e54e339ac2e46b497aea93638
# skip: [211ff6ef5f2f19d86d35dcbb94d4b09ab7de00fc] batman-adv: Fix DAT candidate selection on little endian systems
git bisect skip 211ff6ef5f2f19d86d35dcbb94d4b09ab7de00fc
# skip: [3cb4cbc154b57c2fb842fbacc6ea4bd625ce680f] perf hists: Fix variable name's inconsistency in hists__for_each() macro
git bisect skip 3cb4cbc154b57c2fb842fbacc6ea4bd625ce680f
# good: [1d9d2aae71b2d1c18d7d69c4850ca57ebd1e6fc1] clk: Don't try to enable critical clocks if prepare failed
git bisect good 1d9d2aae71b2d1c18d7d69c4850ca57ebd1e6fc1
# skip: [37f1c7c4bf8eaf03f0e07152f55e0dda97bd52e8] cfg80211: check for set_wiphy_params
git bisect skip 37f1c7c4bf8eaf03f0e07152f55e0dda97bd52e8
# good: [43cdcecf3a1bd5fdf9cbce0f3908cdde789c7ffd] ALSA: seq: Fix racy access for queue timer in proc read
git bisect good 43cdcecf3a1bd5fdf9cbce0f3908cdde789c7ffd
# bad: [7a10b28e6122e12323da46732fc345d476a21829] ARM: dts: am571x-idk: Fix gpios property to have the correct gpio number
git bisect bad 7a10b28e6122e12323da46732fc345d476a21829
# good: [31e3c075719a3afe5ffd2b9b8e1f18e07255bd96] Fix built-in early-load Intel microcode alignment
git bisect good 31e3c075719a3afe5ffd2b9b8e1f18e07255bd96
# bad: [c35a4a858d0616e7817026d88f377c7201ad449a] block: fix an integer overflow in logical block size
git bisect bad c35a4a858d0616e7817026d88f377c7201ad449a
# first bad commit: [c35a4a858d0616e7817026d88f377c7201ad449a] block: fix an integer overflow in logical block size

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Ryan, that's really great findings, thanks.
I'm out on holiday, and will check it next week.

Revision history for this message
Michael Dundas (michaeldundas) wrote :

For what it is worth, I have a kernel issue that appears after upgrading from 4.15.0-88 to 4.15.0-91 and it still is present in 4.15.0-96.
For my system the kernel does not panic, however it eventually starts erroring and then mounts /tmp as ro.
If I reboot in maintenance mode fsck the disk, fix errors and reboot, system will repeat. However, booting back to 4.15.0-88 and system will come back up and remain stable.

Once booting with kernal 4.15.0-91 or 4.15.0-96, syslog shows

Apr 12 00:06:00 Brushpass kernel: [34615.321379] sd 0:0:0:0: [sda] tag#1 Sense Key : Medium Error [current]
Apr 12 00:06:00 Brushpass kernel: [34615.321383] sd 0:0:0:0: [sda] tag#1 Add. Sense: Unrecovered read error - auto reallocate failed
Apr 12 00:06:00 Brushpass kernel: [34615.321386] sd 0:0:0:0: [sda] tag#1 CDB: Read(10) 28 00 00 04 28 90 00 00 d0 00
Apr 12 00:06:00 Brushpass kernel: [34615.321389] print_req_error: I/O error, dev sda, sector 272728
Apr 12 00:06:00 Brushpass kernel: [34615.321427] ata1: EH complete
Apr 12 00:06:03 Brushpass kernel: [34618.285416] ata1.00: exception Emask 0x0 SAct 0x6000 SErr 0x0 action 0x0
Apr 12 00:06:03 Brushpass kernel: [34618.285426] ata1.00: irq_stat 0x40000001
Apr 12 00:06:03 Brushpass kernel: [34618.285433] ata1.00: failed command: READ FPDMA QUEUED
Apr 12 00:06:03 Brushpass kernel: [34618.285445] ata1.00: cmd 60/08:68:58:29:04/00:00:00:00:00/40 tag 13 ncq dma 4096 in
Apr 12 00:06:03 Brushpass kernel: [34618.285445] res 41/40:08:58:29:04/00:00:00:00:00/40 Emask 0x409 (media error) <F>
Apr 12 00:06:03 Brushpass kernel: [34618.285451] ata1.00: status: { DRDY ERR }
Apr 12 00:06:03 Brushpass kernel: [34618.285455] ata1.00: error: { UNC }
Apr 12 00:06:03 Brushpass kernel: [34618.285459] ata1.00: failed command: WRITE FPDMA QUEUED
Apr 12 00:06:03 Brushpass kernel: [34618.285469] ata1.00: cmd 61/48:70:c0:88:86/01:00:1c:00:00/40 tag 14 ncq dma 167936 out
Apr 12 00:06:03 Brushpass kernel: [34618.285469] res 41/40:70:58:29:04/00:00:00:00:00/40 Emask 0x9 (media error)
Apr 12 00:06:03 Brushpass kernel: [34618.285474] ata1.00: status: { DRDY ERR }
Apr 12 00:06:03 Brushpass kernel: [34618.285477] ata1.00: error: { UNC }
Apr 12 00:06:03 Brushpass kernel: [34618.288690] ata1.00: configured for UDMA/100
Apr 12 00:06:03 Brushpass kernel: [34618.288737] sd 0:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 12 00:06:03 Brushpass kernel: [34618.288750] sd 0:0:0:0: [sda] tag#13 Sense Key : Medium Error [current]
Apr 12 00:06:03 Brushpass kernel: [34618.288760] sd 0:0:0:0: [sda] tag#13 Add. Sense: Unrecovered read error - auto reallocate failed

Not sure if this will help.\

-mike

Revision history for this message
Ryan Finnie (fo0bar) wrote :

Mike: Sorry, that is not related. I'd suggest filing a new bug.

Mauricio: Any update on this? This also affects the 5.4 line, so after upgrading to focal I needed to remain on 4.15.0-88.

Revision history for this message
In , ryan (ryan-linux-kernel-bugs) wrote :
Download full text (6.2 KiB)

Downstream bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1867916

The downstream bug involves regression from Ubuntu's 4.15.0-88 to 4.15.0-91, but I've determined it also affects Ubuntu's 5.4 kernel line, as well as HEAD (tl;dr: since commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15, which was backported to Ubuntu's 4.15.0-91).

The relevant part of my block setup:

-> sd{c,d,f,g,h}: each 4TB gpt, sd{c,d,f,g,h}1: each type linux_raid_member
--> md0: raid6, sd{c,d,f,g,h}1
--> sda: 512GB gpt, sda1: type bcache
---> bcache0: md0 + sda1
----> whatadisk_crypt: LUKS on bcache0
-----> LVM VG whatadisk: whatadisk_crypt
------> multiple LVs

When bcache0 is attempted to be activated, I get:

[ 194.444436] bcache: bch_journal_replay() journal replay done, 3 keys in 6 entries, seq 23285862
[ 194.444622] bcache: register_cache() registered cache device sdb1
[ 194.448381] bcache: register_bdev() registered backing device md0
[ 194.602075] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 194.602100] IP: create_empty_buffers+0x29/0xf0
[ 194.602110] PGD 0 P4D 0
[ 194.602121] Oops: 0000 [#1] SMP NOPTI
[ 194.602137] Modules linked in: bcache ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter pps_ldisc aufs overlay cmac bnep bonding bridge stp llc arc4 snd_hda_codec_hdmi nls_iso8859_1 edac_mce_amd kvm_amd kvm irqbypass rtl8xxxu mac80211 btusb btrtl btbcm snd_hda_intel btintel snd_hda_codec bluetooth cfg80211 eeepc_wmi snd_hda_core asus_wmi joydev sparse_keymap snd_hwdep wmi_bmof input_leds snd_pcm ecdh_generic snd_timer snd soundcore ccp k10temp shpchp mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core nfsd iscsi_tcp libiscsi_tcp auth_rpcgss libiscsi nfs_acl scsi_transport_iscsi lockd grace sunrpc ip_tables x_tables autofs4 btrfs zstd_compress algif_skcipher af_alg dm_crypt raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
[ 194.602323] xor raid6_pq libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid nouveau crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc mxm_wmi video ttm drm_kms_helper aesni_intel syscopyarea sysfillrect igb sysimgblt aes_x86_64 fb_sys_fops crypto_simd glue_helper dca cryptd i2c_algo_bit drm i2c_piix4 ptp ahci nvme pps_core libahci nvme_core gpio_amdpt wmi gpio_generic
[ 194.602406] CPU: 1 PID: 4403 Comm: bcache-register Not tainted 4.15.0-91-generic #92-Ubuntu
[ 194.602425] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4207 12/08/2018
[ 194.602447] RIP: 0010:create_empty_buffers+0x29/0xf0
[ 194.602459] RSP: 0018:ffffa833cc37f7a8 EFLAGS: 00010246
[ 194.602471] RAX: 0000000000000000 RBX: fffff6227c3f4c00 RCX: 0000000000000013
[ 194.602487] RDX: 0000000000000000 RSI: 0000000000080000 RDI: fffff6227c3f4c00
[ 194.602503] RBP: ffffa833cc37f7c0 R08: 0000000000000001 R09: dead0000000000ff
[ 194.602519] R10: ffff8976ca4f7aa0 R11: 0000000000000000 R12: 0000000000000000
[ 194.602535] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000200
[ 194.602551] FS: 00007f51c3bb2500(0000) GS:ffff89771ec40000(0000) knlGS:0000000000000000
[ 194.602569] CS: 0010 DS: 0000 ES: 0000 CR0: ...

Read more...

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Ryan,

Sorry, I couldn't look at this earlier, but can put some cycles this week.
I'm looking at the stack trace, crashdump, and the offending patch.
Thanks again for your work on this.

cheers,
Mauricio

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Ryan,

Good progress with your crashdump; thanks!

I could root cause the source of the BUG() / NULL pointer address 0x8,
and its relation to the offending commit you identified.

Could you please provide another crashdump, with the _working_ kernel 4.15.0-88,
with the bcache device _mounted_? I'd like to check a few values in its structs.

You should be able to force/trigger a crashdump in a working system with:
$ echo c | sudo tee /proc/sysrq-trigger

Please make sure you run 'sync' before triggering it, so commit any data to disk, and avoid losing it.

Thank you!
Mauricio

Revision history for this message
Ryan Finnie (fo0bar) wrote :

The bcache device is raid+bcache, with luks below it (and then lvm), so it's not directly mountable. Do you just want the stack set up and functional like normal?

Revision history for this message
Ryan Finnie (fo0bar) wrote :

Here's a crashdump from the working kernel, with everything enabled: https://www.finnie.org/stuff/lp1867916-crashdump-20200528.tar.xz

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Ryan,

Thanks. I've found the problem; working on a fix next week.

Could you please provide/upload the output of these commands?
I can get those from the crashdump, but need to double check.

$ sudo grep ^ /sys/block/*/queue/*_block_size > lp1867916-queue_block_size 2>&1

$ sudo bcache-super-show /dev/md0 > lp1867916-bcache-super-show.md 2>&1
$ sudo bcache-super-show /dev/sda1 > lp1867916-bcache-super-show.sd 2>&1

P.S.: usind md0 and sda1 per your comment #9, "---> bcache0: md0 + sda1"
please adjust if needed.

cheers,
Mauricio

Revision history for this message
Ryan Finnie (fo0bar) wrote :
Revision history for this message
Ryan Finnie (fo0bar) wrote :
Revision history for this message
Ryan Finnie (fo0bar) wrote :
Revision history for this message
Ryan Finnie (fo0bar) wrote :

Done, please see attached. (The sd* devices do tend to move around; the bcache backing device is currently at sdb1.)

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Ryan,

Thanks! That matches the expected output.

There's a bit more that I got curious about,
which is quicker to get with lsblk than in kdump:

$ lsblk --ascii > lp1867916-lsblk 2>&1

cheers,
Mauricio

Revision history for this message
Ryan Finnie (fo0bar) wrote :
Revision history for this message
Ryan Finnie (fo0bar) wrote :

Sure, done.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Ryan and Sebastian,

Could you please verify whether these test kernels resolve the problem?
There's 5.4 for Focal/20.04 and 4.15 for Bionic/18.04.

https://people.canonical.com/~mfo/lp1867916/focal/
https://people.canonical.com/~mfo/lp1867916/bionic/

Thank you,
Mauricio

Revision history for this message
Ryan Finnie (fo0bar) wrote :

Works on focal!

Linux nibbler 5.4.0-34-generic #38+lp1867916b1 SMP Sun May 31 21:41:06 -03 2020 x86_64 x86_64 x86_64 GNU/Linux

Thanks! I'm curious to see the patch; I tried root causing it myself and suspected it had to do with something like an overflow in an unanticipated block size, but never got anywhere on that.

Also keep in mind I filed an upstream bug (https://bugzilla.kernel.org/show_bug.cgi?id=207811), so please make sure to reference that when submitting the fix upstream.

Again, thank you Mauricio!

Revision history for this message
Sebastian Marsching (sebastian-marsching) wrote :

Hi Mauricio,

I can confirm that your patched kernel fixes the problem for me on Bionic (kernel 4.15).

Thanks for your help!

-Sebastian

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Ryan and Sebastian,

Thanks for testing! Glad it helps.

This is the patch [1] to address the issue on a general level,
but it had a suggestion to be done at the driver level, which
may or may not interfere with the patch being applied -- both
approaches are not mutually exclusive.

I'm working on the driver-level patch for bcache as well (it's
a bit more complex), and should provide you another test kernel.

Hopefully you may be able to test this one too; I can reproduce
the bug now, but feedback from real reporters is always welcome.

...

Ryan, yes, I noticed you had filed up an upstream bug (thanks!)
I should post on it with the new patch approach.

Sebastian, BTW, when you have a chance, could you please upload
the output of this command? Should be safe from sensitive data.

  $ sudo grep ^ /sys/block/*/queue/*_block_size > lp1867916-queue_block_size.seb 2>&1

cheers,
Mauricio

[1] https://lore.kernel.org<email address hidden>/T/#t

Revision history for this message
Sebastian Marsching (sebastian-marsching) wrote :

Hi Mauricio,

I have attached the requested information.

Best regards,
Sebastian

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Sebastian, thanks for the information!

Sebastian, Ryan,

Do you remember specifying a block size of 512 kB for 'make-bcache' when creating it?

e.g., make-bcache --bdev|-B /dev/mdN --block|-w 512k # or similarly.

Thanks,
Mauricio

Revision history for this message
Sebastian Marsching (sebastian-marsching) wrote :

Hi Mauricio,

according to my Bash history, I used the following two commands to create the Bcache device:

make-bcache -B -b 524288 -w 524288 -o 8192 /dev/vg0/vg2-backend
make-bcache -C -b 4194304 -w 4096 -o 8192 /dev/vg1/vg2-cache

I think the reason why I used this block size was that the MD RAID device backing vg0 uses a stripe size of 512K and I wanted to ensure that everything is neatly aligned.

Best regards,
Sebastian

Revision history for this message
Ryan Finnie (fo0bar) wrote : Re: [Bug 1867916] Re: Regression in kernel 4.15.0-91 causes kernel panic with Bcache

On 6/2/20 10:19 AM, Mauricio Faria de Oliveira wrote:
> Do you remember specifying a block size of 512 kB for 'make-bcache' when
> creating it?
>
> e.g., make-bcache --bdev|-B /dev/mdN --block|-w 512k # or similarly.

My set was created by hand, and it's entirely possible I specified
"--block 512k" instead of "--block 512" (though the backing device is
4096, so it should have actually been 4k).

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Sebastian, Ryan,

Thanks for checking and clarifying; it's very helpful.

Sebastian,
Yes, I think that makes sense for the physical block size (but not an expert :)

The problem with bcache is it takes it for logical block size as well,
which turns into that error -- as it cannot be greater than page size.

The next patch should check for that, and adjust accordingly to either
page size or underlying device's logical block size, as needed.

cheers,
Mauricio

Revision history for this message
In , mauricio.foliveira (mauricio.foliveira-linux-kernel-bugs) wrote :

Hi Coly,

I sent a patch for this problem for your review; hope it helps.

[PATCH] bcache: check and adjust logical block size for backing devices
https://www.spinics.net/lists/linux-bcache/msg08411.html

cheers,
Mauricio

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

For documentation purposes,

The bcache-specific patch has been sent today.
Waiting on review before providing test kernels.

[PATCH] bcache: check and adjust logical block size for backing devices
https://www.spinics.net/lists/linux-bcache/msg08411.html

cheers,
Mauricio

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Test-case
=========

echo 9 | sudo tee /proc/sys/kernel/printk # all messages on console

IMG="$HOME/disk.img"
rm -f $IMG
truncate --size 1G $IMG
DEV="$(sudo losetup --find --show $IMG)"

sudo modprobe bcache # just in case

sudo make-bcache --bdev $DEV --block 8k

Changed in linux (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Eoan):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Focal):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Groovy):
status: In Progress → Won't Fix
importance: Medium → Undecided
assignee: Mauricio Faria de Oliveira (mfo) → nobody
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
description: updated
description: updated
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (3.9 KiB)

Xenial / Testing
======

modified
--------

$ uname -rv
4.4.0-186-generic #216+lp1867916.1 SMP Mon Jul 6 18:45:47 -03 2020

$ sudo make-bcache --bdev $DEV --block 8k
[ 60.860259] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512)
[ 60.860312] bcache: register_bdev() registered backing device loop0

original
--------

$ uname -rv
4.4.0-186-generic #216-Ubuntu SMP Wed Jul 1 05:34:05 UTC 2020

$ sudo make-bcache --bdev $DEV --block 8k
[ 22.192801] bcache: register_bdev() registered backing device loop0
[ 22.197141] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[ 22.198983] IP: [<ffffffff8125acb0>] bdev_read_page+0x10/0xb0
[ 22.200283] PGD 0
[ 22.200843] Oops: 0000 [#1] SMP
[ 22.201796] Modules linked in: bcache isofs kvm_intel input_leds kvm irqbypass joydev serio_raw ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs xor raid6_pq psmouse floppy
[ 22.208162] CPU: 1 PID: 1301 Comm: systemd-udevd Not tainted 4.4.0-186-generic #216-Ubuntu
[ 22.209642] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 22.211045] task: ffff8800b97ea700 ti: ffff8800b9b78000 task.ti: ffff8800b9b78000
[ 22.212244] RIP: 0010:[<ffffffff8125acb0>] [<ffffffff8125acb0>] bdev_read_page+0x10/0xb0
[ 22.213616] RSP: 0018:ffff8800b9b7ba80 EFLAGS: 00010283
[ 22.214513] RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000000004
[ 22.215730] RDX: ffffea0004e55280 RSI: fff8800b9b7bb580 RDI: 0000000000000000
[ 22.217220] RBP: ffff8800b9b7baa0 R08: ffff8800b9b7bbb8 R09: ffff8800b9b7bbb0
[ 22.218640] R10: ffff8800368b4df8 R11: fff8800b9b7bb580 R12: 0000000000000000
[ 22.220114] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 22.221574] FS: 00007f5df7bd28c0(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000
[ 22.223416] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 22.224584] CR2: 0000000000000098 CR3: 00000000b9b76000 CR4: 0000000000000670
[ 22.225945] Stack:
[ 22.226534] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 22.228516] ffff8800b9b7bb78 ffffffff81261bbe ffff8800368b4f80 ffff8800b9b7bba8
[ 22.230532] 0000000000000000 ffff8800b9b7bbb8 ffff8800b9b7bbb0 ffffffff8125a6b0
[ 22.232485] Call Trace:
[ 22.233148] [<ffffffff81261bbe>] do_mpage_readpage+0x52e/0x7a0
[ 22.234384] [<ffffffff8125a6b0>] ? I_BDEV+0x20/0x20
[ 22.235487] [<ffffffff811abebe>] ? lru_cache_add+0xe/0x10
[ 22.236675] [<ffffffff81261f42>] mpage_readpages+0x112/0x190
[ 22.237903] [<ffffffff8125a6b0>] ? I_BDEV+0x20/0x20
[ 22.238960] [<ffffffff8125a6b0>] ? I_BDEV+0x20/0x20
[ 22.240025] [<ffffffff811f09dc>] ? alloc_pages_current+0x8c/0x110
[ 22.241308] [<ffffffff8125b1cd>] blkdev_readpages+0x1d/0x20
[ 22.242463] [<ffffffff811a9f39>] __do_page_cache_readahead+0x199/0x240
[ 22.243843] [<ffffffff811aa34a>] force_page_cache_readahead+0xaa/0x100
[ 22.245267] [<ffffffff811aa3df>] page_cache_sync_readahead+0x3f/0x50
[ 22.246572] [<ffffffff8119d0ba>] ...

Read more...

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (4.9 KiB)

Bionic / Testing
======

modified
--------

$ uname -rv
4.15.0-110-generic #111+lp1867916.1 SMP Mon Jul 6 19:09:14 -03 2020

$ sudo make-bcache --bdev $DEV --block 8k
[ 22.066760] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512)
[ 22.070587] bcache: register_bdev() registered backing device loop0

original
--------

$ uname -rv
4.15.0-110-generic #111-Ubuntu SMP Fri Jul 3 09:14:40 UTC 2020

$ sudo make-bcache --bdev $DEV --block 8k
[ 16.457709] bcache: register_bdev() registered backing device loop0
[ 16.460954] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 16.463150] IP: create_empty_buffers+0x29/0xf0
[ 16.464548] PGD 800000012aa90067 P4D 800000012aa90067 PUD 12aa91067 PMD 0
[ 16.466582] Oops: 0000 [#1] SMP PTI
[ 16.467702] Modules linked in: bcache isofs kvm_intel kvm irqbypass joydev input_leds serio_raw ib_iser rdma_cm iw_cm ib_cm ib_core iscsi
_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs xor zstd_compress raid6_pq psmouse virtio_net virtio_blk floppy
[ 16.472651] CPU: 0 PID: 1467 Comm: bcache-register Not tainted 4.15.0-110-generic #111-Ubuntu
[ 16.474238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 16.475756] RIP: 0010:create_empty_buffers+0x29/0xf0
[ 16.476715] RSP: 0018:ffffa9a740abb838 EFLAGS: 00010246
[ 16.477745] RAX: 0000000000000000 RBX: ffffebd404aa77c0 RCX: 000000000000000d
[ 16.479113] RDX: 0000000000000000 RSI: 0000000000002000 RDI: ffffebd404aa77c0
[ 16.480422] RBP: ffffa9a740abb850 R08: 0000000000026d60 R09: 0000000000000687
[ 16.481742] R10: 0000000000000002 R11: ffff91e6bffd2000 R12: 0000000000000000
[ 16.482976] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000200
[ 16.484304] FS: 00007f2c87adb700(0000) GS:ffff91e6bfc00000(0000) knlGS:0000000000000000
[ 16.485929] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.487084] CR2: 0000000000000008 CR3: 000000012a994000 CR4: 00000000000006f0
[ 16.488488] Call Trace:
[ 16.489042] create_page_buffers+0x51/0x60
[ 16.489924] block_read_full_page+0x4e/0x3a0
[ 16.490820] ? set_init_blocksize+0x80/0x80
[ 16.491680] ? pagevec_lru_move_fn+0xc3/0xe0
[ 16.492560] ? __lru_cache_add+0x58/0x70
[ 16.493418] blkdev_readpage+0x18/0x20
[ 16.494219] do_read_cache_page+0x2a3/0x580
[ 16.495090] ? blkdev_writepages+0x40/0x40
[ 16.495937] ? __switch_to_asm+0x35/0x70
[ 16.496768] ? __switch_to_asm+0x41/0x70
[ 16.497611] ? __switch_to_asm+0x35/0x70
[ 16.498455] ? __switch_to_asm+0x41/0x70
[ 16.499297] read_cache_page+0x15/0x20
[ 16.500086] read_dev_sector+0x2d/0xe0
[ 16.500858] read_lba+0x130/0x220
[ 16.501603] ? kmem_cache_alloc_trace+0x15d/0x1d0
[ 16.502557] efi_partition+0x138/0x790
[ 16.503344] ? string+0x60/0x90
[ 16.504003] ? vsnprintf+0xfb/0x510
[ 16.504729] ? snprintf+0x45/0x70
[ 16.505436] ? is_gpt_valid.part.6+0x420/0x420
[ 16.506310] check_partition+0x130/0x230
[ 16.507116] ? is_gpt_valid.part.6+0x420/0x420
[ 16.508045] ? check_partition+0x130/0x230
[ 16.508891] rescan_par...

Read more...

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (6.0 KiB)

Disco / Testing
=====

* Using the linux-hwe-5.0 from "Disco" (EOL) on Bionic for the 5.0 kernel.

modified
--------

$ uname -rv
5.0.0-57-generic #61~18.04.1+lp1867916.1 SMP Mon Jul 6 19:27:05 -03 2020

$ sudo make-bcache --bdev $DEV --block 8k
[ 109.818171] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512)
[ 109.822055] bcache: register_bdev() registered backing device loop0

original
--------

$ uname -rv
5.0.0-57-generic #61~18.04.1-Ubuntu SMP Mon Jul 6 09:40:52 UTC 2020

$ sudo make-bcache --bdev $DEV --block 8k
[ 112.148300] bcache: register_bdev() registered backing device loop0
[ 112.150575] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 112.153000] #PF error: [normal kernel read fault]
[ 112.154474] PGD 800000012a5df067 P4D 800000012a5df067 PUD 137558067 PMD 0
[ 112.156614] Oops: 0000 [#1] SMP PTI
[ 112.157742] CPU: 1 PID: 1649 Comm: bcache-register Not tainted 5.0.0-57-generic #61~18.04.1-Ubuntu
[ 112.161386] RIP: 0010:create_empty_buffers+0x29/0x110
[ 112.162321] Code: 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 49 89 d5 ba 01 00 00 00 48 89 fb e8 72 fe ff ff 49 89 c4 48 89 c2 eb 03 4$ 89 ca <48> 8b 4a 08 4c 09 2a 48 85 c9 75 f1 4c 89 62 08 48 8b 43 18 48 8d
[ 112.165333] RSP: 0018:ffff9fac40dd77f8 EFLAGS: 00010286
[ 112.166198] RAX: 0000000000000000 RBX: ffffc8b744a73700 RCX: ffff9051b779d000
[ 112.167430] RDX: 0000000000000000 RSI: ffff9051b779d000 RDI: ffffc8b744a73700
[ 112.168767] RBP: ffff9fac40dd7810 R08: dead0000000000ff R09: 0000000000000003
[ 112.169992] R10: 0000000000000000 R11: 00003748bb58c8ff R12: 0000000000000000
[ 112.171515] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000200
[ 112.172649] FS: 00007f4344d58700(0000) GS:ffff9051bba80000(0000) knlGS:0000000000000000
[ 112.174052] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 112.175056] CR2: 0000000000000008 CR3: 0000000129e2a000 CR4: 00000000000006e0
[ 112.176259] Call Trace:
[ 112.176773] create_page_buffers+0x52/0x60
[ 112.177525] block_read_full_page+0x4e/0x3c0
[ 112.178304] ? check_disk_change+0x70/0x70
[ 112.179098] ? count_shadow_nodes+0x130/0x130
[ 112.179881] blkdev_readpage+0x18/0x20
[ 112.180571] do_read_cache_page+0x37b/0x790
[ 112.181324] ? blkdev_writepages+0x10/0x10
[ 112.182064] ? get_page_from_freelist+0x154e/0x1560
[ 112.182924] ? update_load_avg+0x8b/0x5f0
[ 112.183657] read_cache_page+0x12/0x20
[ 112.184354] read_dev_sector+0x2d/0xe0
[ 112.185041] read_lba+0x130/0x220
[ 112.185665] efi_partition+0x131/0x770
[ 112.186360] ? string+0x60/0x90
[ 112.186971] ? vsnprintf+0xfb/0x510
[ 112.187625] ? snprintf+0x45/0x70
[ 112.188252] ? is_gpt_valid.part.7+0x420/0x420
[ 112.189056] check_partition+0x13f/0x250
[ 112.189759] ? is_gpt_valid.part.7+0x420/0x420
[ 112.190542] ? check_partition+0x13f/0x250
[ 112.191292] rescan_partitions+0xaf/0x360
[ 112.192015] bdev_disk_changed+0x5a/0x60
[ 112.192723] __blkdev_get+0x354/0x560
[ 112.193440] ? inode_init_always+0x131/0x1f0
[ 112.194365] blkdev_get+0x131/0x340
[ 112.195106] ? wake_up_bit+0x42/0x50
[ ...

Read more...

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (5.6 KiB)

Eoan / Testing
====

modified
--------

$ uname -rv
5.3.0-63-generic #57+lp1867916.1 SMP Mon Jul 6 18:33:27 -03 2020

$ sudo make-bcache --bdev $DEV --block 8k
[ 29.620685] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512)
[ 29.624382] bcache: register_bdev() registered backing device loop0

original
--------

$ uname -rv
5.3.0-63-generic #57-Ubuntu SMP Thu Jul 2 10:38:35 UTC 2020

$ sudo make-bcache --bdev $DEV --block 8k
[ 40.416094] bcache: register_bdev() registered backing device loop0
[ 40.418547] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 40.423539] #PF: supervisor write access in kernel mode
[ 40.424608] #PF: error_code(0x0002) - not-present page
[ 40.425610] PGD 80000001374fb067 P4D 80000001374fb067 PUD 138349067 PMD 0
[ 40.426901] Oops: 0002 [#1] SMP PTI
[ 40.427542] CPU: 3 PID: 1546 Comm: bcache-register Not tainted 5.3.0-63-generic #57-Ubuntu
[ 40.428937] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 40.430476] RIP: 0010:create_empty_buffers+0x24/0x110
[ 40.431373] Code: 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 d5 ba 01 00 00 00 41 54 53 48 89 fb e8 52 f4 ff ff 49 89 c4 4$ 89 c2 <4c> 09 2a 48 89 d0 48 8b 52 08 48 85 d2 75 f1 4c 89 60 08 48 8b 43
[ 40.434484] RSP: 0018:ffffb2ec803ab820 EFLAGS: 00010286
[ 40.435397] RAX: 0000000000000000 RBX: fffff0c884a2d000 RCX: ffff9f3af9141000
[ 40.436602] RDX: 0000000000000000 RSI: ffff9f3af9141000 RDI: fffff0c884a2d000
[ 40.437820] RBP: ffffb2ec803ab838 R08: ffff9f3af9141000 R09: 0000000000000000
[ 40.439040] R10: 0000000000000001 R11: ffff9f3af602f198 R12: 0000000000000000
[ 40.440260] R13: 0000000000000000 R14: ffff9f3af64d1c60 R15: 0000000000000000
[ 40.441480] FS: 00007fed73c25700(0000) GS:ffff9f3afbb80000(0000) knlGS:0000000000000000
[ 40.443000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 40.444068] CR2: 0000000000000000 CR3: 0000000128bd2000 CR4: 00000000000006e0
[ 40.445325] Call Trace:
[ 40.445858] create_page_buffers+0x52/0x60
[ 40.446656] block_read_full_page+0x4e/0x3a0
[ 40.447477] ? blkdev_direct_IO+0x70/0x70
[ 40.448255] ? __add_to_page_cache_locked+0x2e5/0x340
[ 40.449193] ? scan_shadow_nodes+0x30/0x30
[ 40.449982] blkdev_readpage+0x18/0x20
[ 40.450722] do_read_cache_page+0x2f6/0x830
[ 40.451527] ? update_load_avg+0x7c/0x670
[ 40.452300] ? prep_new_page+0x128/0x160
[ 40.453064] read_cache_page+0x12/0x20
[ 40.453801] read_dev_sector+0x27/0xd0
[ 40.454551] read_lba+0xbd/0x220
[ 40.455207] ? kmem_cache_alloc_trace+0x16c/0x240
[ 40.456074] efi_partition+0x1e0/0x6fd
[ 40.456796] ? vsnprintf+0x39e/0x4e0
[ 40.457494] ? snprintf+0x49/0x60
[ 40.458155] check_partition+0x154/0x244
[ 40.458916] rescan_partitions+0xae/0x280
[ 40.459675] bdev_disk_changed+0x5f/0x70
[ 40.460426] __blkdev_get+0x3f8/0x550
[ 40.461138] blkdev_get+0x3d/0x140
[ 40.461807] __device_add_disk+0x388/0x480
[ 40.462603] device_add_disk+0x13/0x20
[ 40.463346] bch_cached_dev_run+0x66/0x190 [bcache]
[ 40.464265] register_bc...

Read more...

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (5.6 KiB)

Focal / Testing
=====

modified
--------

$ uname -rv
5.4.0-41-generic #45+lp1867916.1 SMP Mon Jul 6 16:41:46 -03 2020

$ sudo make-bcache --bdev $DEV --block 8k
[ 29.593270] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512)
[ 29.596872] bcache: register_bdev() registered backing device loop0

original
--------

$ uname -rv
5.4.0-41-generic #45-Ubuntu SMP Fri Jul 3 10:57:47 UTC 2020

$ sudo make-bcache --bdev $DEV --block 8k
[ 37.880016] bcache: register_bdev() registered backing device loop0
[ 37.883376] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 37.884789] #PF: supervisor write access in kernel mode
[ 37.885862] #PF: error_code(0x0002) - not-present page
[ 37.886899] PGD 8000000129ee4067 P4D 8000000129ee4067 PUD 129ee5067 PMD 0
[ 37.888273] Oops: 0002 [#1] SMP PTI
[ 37.889014] CPU: 0 PID: 1585 Comm: bcache-register Not tainted 5.4.0-41-generic #45-Ubuntu
[ 37.890614] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 37.892334] RIP: 0010:create_empty_buffers+0x24/0x110
[ 37.893317] Code: 00 00 00 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 d5 ba 01 00 00 00 41 54 53 48 89 fb e8 32 f4 ff ff 49 89 c4 48 89 c2 <4c> 09 2a 48 89 d0 48 8b 52 08 48 85 d2 75 f1 4c 89 60 08 48 8b 43
[ 37.896260] RSP: 0018:ffffb4cb40347820 EFLAGS: 00010286
[ 37.897132] RAX: 0000000000000000 RBX: fffff9a944a7ad80 RCX: ffffa032b78e6000
[ 37.898304] RDX: 0000000000000000 RSI: ffffa032b78e6000 RDI: fffff9a944a7ad80
[ 37.899451] RBP: ffffb4cb40347838 R08: ffffa032b78e6000 R09: 0000000000000000
[ 37.900597] R10: 0000000000000001 R11: ffffa032bac48758 R12: 0000000000000000
[ 37.901747] R13: 0000000000000000 R14: ffffa032b7205c60 R15: 0000000000000000
[ 37.902901] FS: 00007f08ed221700(0000) GS:ffffa032bba00000(0000) knlGS:0000000000000000
[ 37.904327] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 37.905347] CR2: 0000000000000000 CR3: 0000000129eb0000 CR4: 00000000000006f0
[ 37.906560] Call Trace:
[ 37.907083] create_page_buffers+0x52/0x60
[ 37.907841] block_read_full_page+0x4e/0x3b0
[ 37.908650] ? blkdev_direct_IO+0x70/0x70
[ 37.909402] ? __add_to_page_cache_locked+0x2e5/0x340
[ 37.910299] ? scan_shadow_nodes+0x30/0x30
[ 37.911050] blkdev_readpage+0x18/0x20
[ 37.911753] do_read_cache_page+0x2f6/0x830
[ 37.912523] ? prep_new_page+0x128/0x160
[ 37.913260] read_cache_page+0x12/0x20
[ 37.913957] read_dev_sector+0x27/0xd0
[ 37.914655] read_lba+0xbd/0x220
[ 37.915276] ? kmem_cache_alloc_trace+0x16c/0x240
[ 37.916114] efi_partition+0x1e0/0x6fd
[ 37.916818] ? vsnprintf+0x39e/0x4e0
[ 37.917506] ? snprintf+0x49/0x60
[ 37.918142] check_partition+0x154/0x244
[ 37.918876] rescan_partitions+0xae/0x280
[ 37.919612] bdev_disk_changed+0x5f/0x70
[ 37.920325] __blkdev_get+0x3f8/0x550
[ 37.921011] blkdev_get+0x3d/0x140
[ 37.921654] __device_add_disk+0x329/0x480
[ 37.922396] device_add_disk+0x13/0x20
[ 37.923103] bch_cached_dev_run+0x66/0x190 [bcache]
[ 37.923969] register_bcache.cold+0x17a/0x1c6 [bcache]
[ 37.92486...

Read more...

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

[X/B/D/E/F][PATCH 0/1] bcache: fix oops for block size > page size
https://lists.ubuntu.com/archives/kernel-team/2020-July/111846.html

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Test kernels with the fix are available in:
https://people.canonical.com/~mfo/lp1867916/

Revision history for this message
Sebastian Marsching (sebastian-marsching) wrote :

Thank you, Mauricio! I can confirm that your test kernel for Bionic (4.15.0-110.111+lp1867916.1) fixes the problem for me.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hey Sebastian! Great; thanks for testing!

Revision history for this message
Ryan Finnie (fo0bar) wrote :

Works on focal, thank you

Revision history for this message
Ryan Finnie (fo0bar) wrote :

With the regression fix tested, I've gone ahead and fixed the core problem on my system, switching the backing device to 4k. Documented below and worked for me, but PLEASE don't take it at face value, since part of it is literally destroying the bcache header area of the backing disk.

# WARNING! Loss of data, life, liberty, etc. Do not blindly copy/paste!
# You need to unmount and make sure the bcache device is not being used,
# and specifically wait for the cache state to go from dirty to clean
# if using writeback.

DEV_BACKING=/dev/sdx1 # /dev/md0 for me
DEV_CACHE=/dev/sdy1 # /dev/sdb1 for me
DEV_BCACHE=bcache0
OFFSET="$(bcache-super-show ${DEV_BACKING?} | awk '$1=="dev.data.first_sector" {print $2}')"
CSET_UUID="$(bcache-super-show ${DEV_CACHE?} | awk '$1=="cset.uuid" {print $2}')"
file -s ${DEV_CACHE?}
file -s ${DEV_BACKING?}
dd if=${DEV_BACKING?} bs=512 skip=${OFFSET?} count=8 | file -

# Remove and wipe cache device
echo ${CSET_UUID?} >/sys/block/${DEV_BCACHE?}/bcache/detach
cat /sys/block/${DEV_BCACHE?}/bcache/state
echo 1 >/sys/fs/bcache/${CSET_UUID?}/stop
while [ -e /sys/fs/bcache/${CSET_UUID?}/stop ]; do sleep 1; done
wipefs -a ${DEV_CACHE?}

# Stop bcache0, wipe and recreate bcache portion of backing device
echo 1 >/sys/block/${DEV_BCACHE?}/bcache/stop
while [ -e /sys/block/${DEV_BCACHE?}/bcache/stop ]; do sleep 1; done
# Removes bcache magic bytes from beginning and (hopefully) leaves all data from
# ${OFFSET?} onward (the actual data). But don't trust me here.
wipefs -a ${DEV_BACKING?}
dd if=/dev/zero of=${DEV_BACKING?} bs=512 count=${OFFSET?}
make-bcache --block 4k --bucket 2M --data-offset ${OFFSET?} -B ${DEV_BACKING?}
while [ ! -e /sys/block/${DEV_BCACHE?}/bcache/attach ]; do sleep 1; done
file -s /dev/${DEV_BCACHE?}

# Safe to remount and use bcache device now

# Recreate and attach cache
make-bcache --block 4k --bucket 2M -C ${DEV_CACHE?}
CSET_UUID="$(bcache-super-show ${DEV_CACHE?} | awk '$1=="cset.uuid" {print $2}')"
echo ${CSET_UUID?} >/sys/block/${DEV_BCACHE?}/bcache/attach
echo writeback >/sys/block/${DEV_BCACHE?}/bcache/cache_mode
cat /sys/block/${DEV_BCACHE?}/bcache/state

Changed in linux (Ubuntu Eoan):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Sebastian Marsching (sebastian-marsching) wrote :

We don’t have any systems with Xenial any longer, so I can’t test there. In fact, just this week we decommissioned the Bionic-based system where I originally discovered this bug, so even there, I can’t test it any longer.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Sebastian, don't worry, we'll use the synthetic reproducer for the verification steps.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Verification done on xenial-proposed.

The kernel in -proposed logs the block size change.
The kernel in -updates fails.

xenial-proposed:
---

 $ uname -rv
 4.4.0-187-generic #217-Ubuntu SMP Tue Jul 21 04:18:15 UTC 2020

 $ apt-cache madison linux-image-4.4.0-187-generic
 linux-image-4.4.0-187-generic | 4.4.0-187.217 | http://archive.ubuntu.com/ubuntu xenial-proposed/main amd64 Packages

 $ sudo make-bcache --bdev $DEV --block 8k
 ...
 [ 88.514012] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512)
 [ 88.516852] bcache: register_bdev() registered backing device loop0

xenial-updates:
---

 $ uname -rv
 4.4.0-186-generic #216-Ubuntu SMP Wed Jul 1 05:34:05 UTC 2020

 $ apt-cache madison linux-image-4.4.0-186-generic | grep updates
 linux-image-4.4.0-186-generic | 4.4.0-186.216 | http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages

 $ sudo make-bcache --bdev $DEV --block 8k
 ...
 [ 56.341127] bcache: register_bdev() registered backing device loop0
 [ 56.344996] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
 [ 56.346996] IP: [<ffffffff8125acb0>] bdev_read_page+0x10/0xb0
 ...
 [ 56.379801] Call Trace:
 [ 56.380315] [<ffffffff81261bbe>] do_mpage_readpage+0x52e/0x7a0
 [ 56.381452] [<ffffffff8125a6b0>] ? I_BDEV+0x20/0x20
 [ 56.382550] [<ffffffff811abebe>] ? lru_cache_add+0xe/0x10
 [ 56.383544] [<ffffffff81261f42>] mpage_readpages+0x112/0x190
 [ 56.384508] [<ffffffff8125a6b0>] ? I_BDEV+0x20/0x20
 [ 56.385369] [<ffffffff8125a6b0>] ? I_BDEV+0x20/0x20
 [ 56.386196] [<ffffffff811f09dc>] ? alloc_pages_current+0x8c/0x110
 [ 56.387279] [<ffffffff8125b1cd>] blkdev_readpages+0x1d/0x20
 [ 56.388316] [<ffffffff811a9f39>] __do_page_cache_readahead+0x199/0x240
 [ 56.389394] [<ffffffff811aa34a>] force_page_cache_readahead+0xaa/0x100
 [ 56.390445] [<ffffffff811aa3df>] page_cache_sync_readahead+0x3f/0x50
 [ 56.391600] [<ffffffff8119d0ba>] generic_file_read_iter+0x54a/0x6b0
 [ 56.392734] [<ffffffff8125bcc5>] blkdev_read_iter+0x35/0x40
 [ 56.393658] [<ffffffff8121e14e>] new_sync_read+0x9e/0xe0
 [ 56.394620] [<ffffffff8121e1b9>] __vfs_read+0x29/0x40
 [ 56.395540] [<ffffffff8121e926>] vfs_read+0x86/0x130
 [ 56.396373] [<ffffffff8121f67c>] SyS_read+0x5c/0xe0
 [ 56.397192] [<ffffffff8186991b>] entry_SYSCALL_64_fastpath+0x22/0xd0
 ...

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Verification done for Bionic.

$ uname -rv
4.15.0-113-generic #114-Ubuntu SMP Sun Aug 9 07:27:58 UTC 2020

$ sudo make-bcache --bdev $DEV --block 8k
...
[ 18.467465] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512)
[ 18.470409] bcache: register_bdev() registered backing device loop0

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Verification done for Focal.

$ uname -rv
5.4.0-43-generic #47-Ubuntu SMP Sat Aug 8 06:34:35 UTC 2020

$ sudo make-bcache --bdev $DEV --block 8k
...
[ 71.251993] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512)
[ 71.255329] bcache: register_bdev() registered backing device loop0

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (9.9 KiB)

This bug was fixed in the package linux - 4.4.0-187.217

---------------
linux (4.4.0-187.217) xenial; urgency=medium

  * xenial/linux: 4.4.0-187.217 -proposed tracker (LP: #1888274)

  * Regression in kernel 4.15.0-91 causes kernel panic with Bcache
    (LP: #1867916)
    - bcache: check and adjust logical block size for backing devices

  * Xenial update: v4.4.230 upstream stable release (LP: #1887011)
    - btrfs: cow_file_range() num_bytes and disk_num_bytes are same
    - btrfs: fix data block group relocation failure due to concurrent scrub
    - mm: fix swap cache node allocation mask
    - EDAC/amd64: Read back the scrub rate PCI register on F15h
    - mm/slub: fix stack overruns with SLUB_STATS
    - usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect
    - kgdb: Avoid suspicious RCU usage warning
    - crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()
    - sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in
      milliseconds
    - hwmon: (max6697) Make sure the OVERT mask is set correctly
    - hwmon: (acpi_power_meter) Fix potential memory leak in
      acpi_power_meter_add()
    - virtio-blk: free vblk-vqs in error path of virtblk_probe()
    - i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665
    - Revert "ALSA: usb-audio: Improve frames size computation"
    - SMB3: Honor 'seal' flag for multiuser mounts
    - SMB3: Honor persistent/resilient handle flags for multiuser mounts
    - cifs: Fix the target file was deleted when rename failed.
    - MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
    - netfilter: nf_conntrack_h323: lost .data_len definition for Q.931/ipv6
    - Linux 4.4.230

  * Xenial update: v4.4.229 upstream stable release (LP: #1885932)
    - s390: fix syscall_get_error for compat processes
    - clk: sunxi: Fix incorrect usage of round_down()
    - i2c: piix4: Detect secondary SMBus controller on AMD AM4 chipsets
    - clk: qcom: msm8916: Fix the address location of pll->config_reg
    - ALSA: isa/wavefront: prevent out of bounds write in ioctl
    - scsi: qla2xxx: Fix issue with adapter's stopping state
    - i2c: pxa: clear all master action bits in i2c_pxa_stop_message()
    - usblp: poison URBs upon disconnect
    - ps3disk: use the default segment boundary
    - vfio/pci: fix memory leaks in alloc_perm_bits()
    - mfd: wm8994: Fix driver operation if loaded as modules
    - scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event
    - nfsd: Fix svc_xprt refcnt leak when setup callback client failed
    - powerpc/crashkernel: Take "mem=" option into account
    - yam: fix possible memory leak in yam_init_driver
    - mksysmap: Fix the mismatch of '.L' symbols in System.map
    - scsi: sr: Fix sr_probe() missing deallocate of device minor
    - scsi: ibmvscsi: Don't send host info in adapter info MAD after LPM
    - ALSA: usb-audio: Improve frames size computation
    - s390/qdio: put thinint indicator after early error
    - tty: hvc: Fix data abort due to race in hvc_open
    - staging: sm750fb: add missing case while setting FB_VISUAL
    - i2c: pxa: fix i2c_pxa_scream_blue_murder() debug output
    - serial: ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Verification done on "eoan" (5.3/linux-hwe on Bionic)

$ uname -rv
5.3.0-65-generic #59-Ubuntu SMP Tue Jul 28 07:27:41 UTC 2020

$ sudo make-bcache --bdev $DEV --block 8k
...
[ 103.766185] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512)
[ 103.775319] bcache: register_bdev() registered backing device loop0

Revision history for this message
Brian Murray (brian-murray) wrote :

The Eoan Ermine has reached end of life, so this bug will not be fixed for that release

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Won't Fix
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (97.9 KiB)

This bug was fixed in the package linux - 5.4.0-45.49

---------------
linux (5.4.0-45.49) focal; urgency=medium

  * focal/linux: 5.4.0-45.49 -proposed tracker (LP: #1893050)

  * [Potential Regression] dscr_inherit_exec_test from powerpc in
    ubuntu_kernel_selftests failed on B/E/F (LP: #1888332)
    - powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

linux (5.4.0-44.48) focal; urgency=medium

  * focal/linux: 5.4.0-44.48 -proposed tracker (LP: #1891049)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

linux (5.4.0-43.47) focal; urgency=medium

  * focal/linux: 5.4.0-43.47 -proposed tracker (LP: #1890746)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Devlink - add RoCE disable kernel support (LP: #1877270)
    - devlink: Add new "enable_roce" generic device param
    - net/mlx5: Document flow_steering_mode devlink param
    - net/mlx5: Handle "enable_roce" devlink param
    - IB/mlx5: Rename profile and init methods
    - IB/mlx5: Load profile according to RoCE enablement state
    - net/mlx5: Remove unneeded variable in mlx5_unload_one
    - net/mlx5: Add devlink reload
    - IB/mlx5: Do reverse sequence during device removal

  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test

  * Enlarge hisi_sec2 capability (LP: #1890222)
    - Revert "UBUNTU: [Config] Disable hisi_sec2 temporarily"
    - crypto: hisilicon - update SEC driver module parameter

  * Fix missing HDMI/DP Audio on an HP Desktop (LP: #1890441)
    - ALSA: hda/hdmi: Add quirk to force connectivity

  * Fix IOMMU error on AMD Radeon Pro W5700 (LP: #1890306)
    - PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken

  * ASoC:amd:renoir: the dmic can't record sound after suspend and resume
    (LP: #1890220)
    - SAUCE: ASoC: amd: renoir: restore two more registers during resume

  * No sound, Dummy output on Acer Swift 3 SF314-57G with Ice Lake core-i7 CPU
    (LP: #1877757)
    - ASoC: SOF: Intel: hda: fix generic hda codec support

  * Fix right speaker of HP laptop (LP: #1889375)
    - SAUCE: hda/realtek: Fix right speaker of HP laptop

  * blk_update_request error when mount nvme partition (LP: #1872383)
    - SAUCE: nvme-pci: prevent SK hynix PC400 from using Write Zeroes command

  * soc/amd/renoir: detect dmic from acpi table (LP: #1887734)
    - ASoC: amd: add logic to check dmic hardware runtime
    - ASoC: amd: add ACPI dependency check
    - ASoC: amd: fixed kernel warnings

  * soc/amd/renoir: change the module name to make it work with ucm3
    (LP: #1888166)
    - AsoC: amd: add missing snd- module prefix to the acp3x-rn driver kernel
      module
    - SAUCE: remove a kernel module since its name is changed

  * Focal update: v5.4.55 upstream stable release (LP: #1890343)
    - AX.25: Fix out-of-bounds read in ax25_connect()
    - AX.25: Prevent out-of-bounds read in ax25_sendmsg()
    - dev: Defer free of skbs in flush_backlog
    - drivers/net/wan/x25_asy: Fix to make i...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (55.0 KiB)

This bug was fixed in the package linux - 4.15.0-115.116

---------------
linux (4.15.0-115.116) bionic; urgency=medium

  * bionic/linux: 4.15.0-115.116 -proposed tracker (LP: #1893055)

  * [Potential Regression] dscr_inherit_exec_test from powerpc in
    ubuntu_kernel_selftests failed on B/E/F (LP: #1888332)
    - powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

linux (4.15.0-114.115) bionic; urgency=medium

  * bionic/linux: 4.15.0-114.115 -proposed tracker (LP: #1891052)

  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

linux (4.15.0-113.114) bionic; urgency=medium

  * bionic/linux: 4.15.0-113.114 -proposed tracker (LP: #1890705)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Reapply "usb: handle warm-reset port requests on hub resume" (LP: #1859873)
    - usb: handle warm-reset port requests on hub resume

  * Bionic update: upstream stable patchset 2020-07-29 (LP: #1889474)
    - gpio: arizona: handle pm_runtime_get_sync failure case
    - gpio: arizona: put pm_runtime in case of failure
    - pinctrl: amd: fix npins for uart0 in kerncz_groups
    - mac80211: allow rx of mesh eapol frames with default rx key
    - scsi: scsi_transport_spi: Fix function pointer check
    - xtensa: fix __sync_fetch_and_{and,or}_4 declarations
    - xtensa: update *pos in cpuinfo_op.next
    - drivers/net/wan/lapbether: Fixed the value of hard_header_len
    - net: sky2: initialize return of gm_phy_read
    - drm/nouveau/i2c/g94-: increase NV_PMGR_DP_AUXCTL_TRANSACTREQ timeout
    - irqdomain/treewide: Keep firmware node unconditionally allocated
    - SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO
      compeletion")
    - spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours
    - IB/umem: fix reference count leak in ib_umem_odp_get()
    - uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix
      GDB regression
    - ALSA: info: Drop WARN_ON() from buffer NULL sanity check
    - ASoC: rt5670: Correct RT5670_LDO_SEL_MASK
    - btrfs: fix double free on ulist after backref resolution failure
    - btrfs: fix mount failure caused by race with umount
    - btrfs: fix page leaks after failure to lock page for delalloc
    - bnxt_en: Fix race when modifying pause settings.
    - hippi: Fix a size used in a 'pci_free_consistent()' in an error handling
      path
    - ax88172a: fix ax88172a_unbind() failures
    - net: dp83640: fix SIOCSHWTSTAMP to update the struct with actual
      configuration
    - drm: sun4i: hdmi: Fix inverted HPD result
    - net: smc91x: Fix possible memory leak in smc_drv_probe()
    - bonding: check error value of register_netdevice() immediately
    - mlxsw: destroy workqueue when trap_register in mlxsw_emad_init
    - ipvs: fix the connection sync failed in some cases
    - i2c: rcar: always clear ICSAR to avoid side effects
    - bonding: check return value of register_netdevice() in bond_newlink()
    - serial: exar: Fix GPIO configuration for Sealevel cards based on XR17V35X
    - scripts/decode_stacktrace: strip basepath from all paths
    - HID: i...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Also marking Ubuntu / Groovy as Fix Released, as Groovy/devel has the 5.8 kernel already, which ships the fix.

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Groovy):
status: Won't Fix → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.