Kernel Oops: Rsyncing to bcache device w/o backing cache kernel panic

Bug #1895563 reported by Brendan Boerner
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Platform details:

1) version.log attached
2) lspci-vnvn.log attached
3) lsb_release -rd
Description: Ubuntu 18.04.5 LTS
Release: 18.04
4) apt-cache policy bcache-tools
bcache-tools:
  Installed: 1.0.8-2ubuntu0.18.04.1
  Candidate: 1.0.8-2ubuntu0.18.04.1
  Version table:
 *** 1.0.8-2ubuntu0.18.04.1 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1.0.8-2build1 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages

5) uname -a:
Linux timber4 4.15.0-117-generic #118-Ubuntu SMP Fri Sep 4 20:02:41 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Short problem description:

Rsyncing from a 1.4TB home dir to a an new filesystem on a bcache device without an attached cache eventually results in kernel panic.

What I expected to happen: no kernel panic.

Long problem description:

1) Devices / Filesystems:

bfd01: 8*4TB configured as RAID6 using md LVM2 volume group.
bfd02: 8*4TB configured as RAID6 using Dell PERC H810 LVM2 volume group.

The tree contains 1069685 files / dirs.

2) Steps to reproduce:

# Make the device, format and mount it
make-bcache -B /dev/vg-bfd02/delme01_bc
mkfs.xfs -f -L delme01bc /dev/bcache0
mount /dev/bcache0 /dev/bfd02/delme01bc

# Rsync
flags="-a --delete" ; info_cmd="--info=progress2" ; excludes="";
src=/mnt/bfd01/delme01 ; tgt=/mnt/bfd02/delme01bc/ ;
time rsync $flags $info_cmd $excludes $src/ $tgt/
...
# eventually kernel panic

Using incremental rsync it will kernel panic after reading about 21GB.

# If instead I use non-incremental rsync it will kernel panic after reading about 1.2-1.3TB.

flags="--no-inc-recursive -ax -HAXS --delete" ; info_cmd="--info=progress2" ; excludes="";
...
# eventually kernel panic

Additional detail:

1) I was *not* able to reproduce if bfd01/delm01bc was a target and bfd02 was the source e.g.

# Rsync
flags="-a --delete" ; info_cmd="--info=progress2" ; excludes="";
src=/mnt/bfd02/delme01 ; tgt=/mnt/bfd01/delme01bc/ ;
time rsync $flags $info_cmd $excludes $src/ $tgt/
...
# rsync completes successfully

2) I was *not* able to reproduce if I attached and then detached a cache set:

ceho 'd3b93488-714e-4efa-af94-cd80fd2db11f' > /sys/block/bcache0/bcache/attach
echo 1 > /sys/block/bcache0/bcache/detach
...
time rsync $flags $info_cmd $excludes $src/ $tgt/
...
# rsync completes successfully

I have 11 compressed kernel crash dumps. I will upload 3 after filing this. Let me know if you want the other 8.

ProblemType: KernelCrash
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-117-generic 4.15.0-117.118
ProcVersionSignature: Ubuntu 4.15.0-117.118-generic 4.15.18
Uname: Linux 4.15.0-117-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Sep 13 12:53 seq
 crw-rw---- 1 root audio 116, 33 Sep 13 12:53 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.17
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Sat Sep 12 09:51:37 2020
HibernationDevice: RESUME=none
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
MachineType: Dell Inc. PowerEdge R620
PciMultimedia:

ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-117-generic root=UUID=de68d15b-3948-4666-9bc5-5cbecc83971c ro maybe-ubiquity crashkernel=512M-:512M
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-117-generic N/A
 linux-backports-modules-4.15.0-117-generic N/A
 linux-firmware 1.173.19
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/06/2019
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.9.0
dmi.board.name: 0KCKR5
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.9.0:bd12/06/2019:svnDellInc.:pnPowerEdgeR620:pvr:rvnDellInc.:rn0KCKR5:rvrA00:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R620
dmi.sys.vendor: Dell Inc.
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Sep 13 12:53 seq
 crw-rw---- 1 root audio 116, 33 Sep 13 12:53 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.17
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=none
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
MachineType: Dell Inc. PowerEdge R620
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-117-generic root=UUID=de68d15b-3948-4666-9bc5-5cbecc83971c ro maybe-ubiquity crashkernel=512M-:512M
ProcVersionSignature: Ubuntu 4.15.0-117.118-generic 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-117-generic N/A
 linux-backports-modules-4.15.0-117-generic N/A
 linux-firmware 1.173.19
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic uec-images
Uname: Linux 4.15.0-117-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 12/06/2019
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.9.0
dmi.board.name: 0KCKR5
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.9.0:bd12/06/2019:svnDellInc.:pnPowerEdgeR620:pvr:rvnDellInc.:rn0KCKR5:rvrA00:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R620
dmi.sys.vendor: Dell Inc.
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Sep 13 12:53 seq
 crw-rw---- 1 root audio 116, 33 Sep 13 12:53 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.17
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=none
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
MachineType: Dell Inc. PowerEdge R620
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-117-generic root=UUID=de68d15b-3948-4666-9bc5-5cbecc83971c ro maybe-ubiquity crashkernel=512M-:512M
ProcVersionSignature: Ubuntu 4.15.0-117.118-generic 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-117-generic N/A
 linux-backports-modules-4.15.0-117-generic N/A
 linux-firmware 1.173.19
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic uec-images
Uname: Linux 4.15.0-117-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: False
dmi.bios.date: 12/06/2019
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.9.0
dmi.board.name: 0KCKR5
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.9.0:bd12/06/2019:svnDellInc.:pnPowerEdgeR620:pvr:rvnDellInc.:rn0KCKR5:rvrA00:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R620
dmi.sys.vendor: Dell Inc.

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote :
Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : CRDA.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : Lspci.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : Lsusb.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : ProcEnviron.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : ProcModules.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : UdevDb.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : WifiSyslog.txt

apport information

description: updated
Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : CRDA.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : Lspci.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : Lsusb.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : ProcEnviron.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : ProcModules.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : UdevDb.txt

apport information

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote : WifiSyslog.txt

apport information

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hey Brendan,

Thanks for reporting this bug.
It looks like the crashdump files didn't come through?

Could you please try to reproduce with the 5.8 kernel from Groovy on your Bionic install?

These are the steps to install it.
Please remove the .list file as soon as you install the packages, to avoid other packages from Groovy to be installed unintendly.

$ echo 'deb http://archive.ubuntu.com/ubuntu groovy main restricted' | sudo tee /etc/apt/sources.list.d/groovy.list
deb http://archive.ubuntu.com/ubuntu groovy main restricted
$ sudo apt update
$ sudo apt install --dry-run linux-{image,modules-extra,headers}-5.8.0-18-generic
$ sudo rm /etc/apt/sources.list.d/groovy.list

Once you're done with testing, you can remove the installed packages with:

$ sudo apt purge linux-{image,modules,modules-extra,headers}-5.8.0-18-generic

Please note there's an additional linux-modules package in this list (which is a pulled as a dependency in the install command.)
You can also specify other packages if any are also pulled during the install step, but testing it here it doesn't look like any other packages are pulled in.

Thanks,
Mauricio

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Oops, please remove the --dry-run from apt install.

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote :

I'll test the 5.8 kernel this weekend.

I'll upload a couple of tarballs next.

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote :

Cannot upload crash dumps due to timeout presumably due to size (1.6Gb).

Here're links instead:

1. http://temp.karakhorum.com/dl/crash_202009120944.tar.bz2
2. http://temp.karakhorum.com/dl/crash_202009121459.tar.bz2

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote :

groovy: 20.10: 5.8.0-19-generic: incremental, no dry-run: Success (no panic).

Kernel version bisection:

cosmic: 18.10: 4.18.0-25-generic: incremental, dry-run: panic
disco: 19.04: 5.0.0-13-generic: incremental, dry-run: success. 105m19s (*insanely* fast).

Attempted revision bisection

Within disco tree:

git checkout -b mybisect origin/master
git bisect start
git bisect good Ubuntu-4.18.0-12.13
git bisect bad Ubuntu-5.0.0-10.11

This set my HEAD to:

commit 94710cac0ef4ee177a63b5227664b38c95bbf703 (HEAD, tag: v4.18)
Author: Linus Torvalds <email address hidden>
Date: Sun Aug 12 13:41:04 2018 -0700

    Linux 4.18

I then did a 'Kernel made' e.g.

cp -a /boot/config-`uname -r` .config
make oldconfig
make clean
custom=bbb01
make -j $(nproc) deb-pkg LOCALVERSION=-${custom}

The resulting kernel would boot but I could not ssh to it. systemctl would return error about not being able to connect to dbus.

I assumed there are Ubuntu specific bits to the kernel which systemctl requires so aborted.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Brendan,

Thanks for testing! Glad that 5.0 and 5.8 works, so this is already fixed.

For now, I'd suggest bisecting with the already available/built kernel
packages from the launchpad archive, which should be faster;
then eventually get down to git to identify it more closely.

Since you know that 4.18.0-25 fails and 5.0.0-13 works, I'd suggest you
take the available built deb files from the Disco linux package archive,
which has some 4.18, 4.19, and 5.0 packages.

https://launchpad.net/ubuntu/disco/+source/linux
Click on the desired version number in 'Releases in Ubuntu' (duplicates are OK)
Click on 'amd64' in 'Builds'
Download the required .deb files (watch out for .ddeb and .udeb) in 'Built files'
Install them with 'sudo dpkg -i *.deb' as usual

And then you can iterate/bisect with the available builds,
which should help reduce the range of version tags to find
the fix in git.

As for building the kernel packages and the systemd failure,
I would suggest you to use the Ubuntu kernel build process
and configs instead of 'make deb-pkg' from upstream, which
doesn't consider Ubuntu's stuff/configs in the tree, and
might then give you some trouble.

Ah, and building the Disco packages on a Disco container,
so that build deps and release stuff are as expected.

Steps follow below. Hope this helps.

cheers,
Mauricio

...

# start a disco container

lxc launch ubuntu:19.04 disco
lxc shell disco

# fixup apt sources and install build dependencies

sed -i 's,http://.*.ubuntu.com,http://old-releases.ubuntu.com,' /etc/apt/sources.list
sed -i 's/^# deb-src/deb-src/' /etc/apt/sources.list
apt build-dep -y linux

# select source/tag you'll build

git clone ... # --reference ~/git/linux is handy/speedy if you have it already.
cd ubuntu-disco
git checkout Ubuntu-<version>

# I'd suggest manually bisecting with tags
# as a complete tag/package release is more
# likely to work; git bisect might land you
# in a commit that doesn't build/work well.

# always build using the ubuntu configs and
# stuff, w/ this procedure (reference below)

# build

LANG=C fakeroot debian/rules clean
LANG=C fakeroot debian/rules binary-headers binary-generic binary-perarch

# packages in ../*.deb
# you'll likely need linux-image-unsigned, linux-modules,
# possibly linux-modules-extra (storage/raid controllers mostly)
# and linux-headers (two pkgs) if you build modules/dkms/nvidia.

reference: https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Ah, there's a more recent 4.18.0-26 package, in the cosmic linux archive,
if it comes down to this.

https://launchpad.net/ubuntu/cosmic/+source/linux

Revision history for this message
Brendan Boerner (brendan-karakhorum) wrote :

Kernel release bisection:

cosmic: 4.18.0-26: incremental, dry-run: panic about 303GB
cosmic: 4.18.0-26: incremental, dry-run: panic about 21GB
cosmic: 4.18.0-26: incremental, dry-run: panic about 22GB

disco: 4.19.0-13: incremental, dry-run: testing: success

The first run using 4.18.0-26 that was able to get to 303GB before panic is intriguing (it's why I ran it twice). All previous panics when doing an incremental rsync were ~20-22GB in (ala the subsequent two runs).

The only thing we can say with certainty is the problem did not reproduce using 4.19.0-13. If the problem is a stack overflow then any changes in stack size, or different paths which use less stack, would result in the problem being less visible using the amount of data I'm copying (~1.4TB).

To post a comment you must log in.