bcache is unstable on ppc64el

Bug #1602299 reported by Ryan Harper
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned

Bug Description

Stopping and unregistering bcache caches and backing devices results in losing the bcache superblock on devices rendering bcached devices corrupt or broken.

1. Ubuntu 4.4.0-28.47-generic 4.4.13
2. attaching lspci log
3.

Description: Ubuntu 16.04 LTS
Release: 16.04

4. # apt-cache policy linux-image
linux-image:
  Installed: (none)
  Candidate: (none)
  Version table:
root@rharper-vm1:~# apt-cache policy linux-image-generic
linux-image-generic:
  Installed: (none)
  Candidate: 4.4.0.28.30
  Version table:
     4.4.0.28.30 500
        500 http://ports.ubuntu.com/ubuntu-ports xenial-updates/main ppc64el Packages
        500 http://ports.ubuntu.com/ubuntu-ports xenial-security/main ppc64el Packages
     4.4.0.21.22 500
        500 http://ports.ubuntu.com/ubuntu-ports xenial/main ppc64el Packages

5. After creating bcache devices, upon reboot they continue to remain as bcache devices for use
6. Sometimes after reboot either the cache device, or the backing device loses the bcache-super-block rendering them useless to the bcache module (it fails to detect them).

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-28-generic 4.4.0-28.47
ProcVersionSignature: Ubuntu 4.4.0-28.47-generic 4.4.13
Uname: Linux 4.4.0-28-generic ppc64le
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jul 11 21:30 seq
 crw-rw---- 1 root audio 116, 33 Jul 11 21:30 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
Date: Tue Jul 12 14:20:24 2016
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 001 Device 003: ID 0627:0001 Adomax Technology Co., Ltd
 Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
PciMultimedia:

ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 OFfb vga
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinux-4.4.0-28-generic root=LABEL=cloudimg-rootfs earlyprintk
ProcLoadAvg: 6.28 4.78 4.12 1/133 4838
ProcLocks:
 1: POSIX ADVISORY WRITE 1306 00:11:541 0 EOF
 2: FLOCK ADVISORY WRITE 918 00:11:521 0 EOF
 3: POSIX ADVISORY WRITE 875 00:11:507 0 EOF
 4: POSIX ADVISORY WRITE 870 00:11:123 0 EOF
 5: POSIX ADVISORY WRITE 682 00:11:358 0 EOF
ProcSwaps: Filename Type Size Used Priority
ProcVersion: Linux version 4.4.0-28-generic (buildd@bos01-ppc64el-018) (gcc version 5.3.1 20160413 (Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #47-Ubuntu SMP Fri Jun 24 10:09:20 UTC 2016
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-28-generic N/A
 linux-backports-modules-4.4.0-28-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
cpu_cores: Number of cores present = 1
cpu_coreson: Number of cores online = 1
cpu_dscr: DSCR is 0
cpu_freq:
 min: 3.684 GHz (cpu 0)
 max: 3.684 GHz (cpu 0)
 avg: 3.684 GHz
cpu_runmode: run-mode=0
cpu_smt: Error: command ['ppc64_cpu', '--smt'] failed with exit code 255: Machine is not SMT capable

Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Ryan Harper (raharper) wrote :

Here's my recreate:

On Power8 system with Xenial, KVM enabled.

1. sudo apt-get install uvtool uvtool-libvirt
2. wget http://people.canonical.com/~rharper/bugs/lp_1602299/lp_1602299.tgz
3. tar xzvf lp_1602299.tgz
4. cd lp_1602299
5. uvt-simplestreams-libvirt sync --source http://cloud-images.ubuntu.com/daily "release~(xenial|yakkety)" "arch=ppc64el"
6. uvt-kvm create --template ppc64le-template.xml --cpu 1 --disk 5 rharper-vm1 release=xenial
7. virsh destroy rharper-vm1
8. sudo qemu-img create -f raw /var/lib/uvtool/libvirt/images/rharper-vm1-disk2.raw 5G
9. virsh edit rharper-vm1
10. Add to the xml under the vdb disk element:

    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/uvtool/libvirt/images/rharper-vm1-disk2.raw'/>
      <target dev='vdc' bus='virtio'/>
      <serial>bcache</serial>
    </disk>

11. virsh start rharper-vm1
12. uvt-kvm ssh --insecure rharper-vm1

# in guest
1. wget http://people.canonical.com/~rharper/bugs/lp_1602299/lp_1602299.tgz
2. tar xzvf lp_1602299.tgz
3. cd lp_1602299
4. sudo ./mkbcache.sh /dev/disk/by-id/virtio-bcache
5. sudo ./break-backing.sh /dev/disk/by-id/virtio-bcache
6. sudo ./break-cache.sh /dev/disk/by-id/virtio-bcache

The break-* scripts emit PASS or FAIL as appropriate.

Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

Recreated on a test system. The fact that the superblock gets written to is expected. Though for ppc64el this seems to go wrong at early stages. I am comparing the results on a ppc64el vm and a x86 vm. After setting up bcache and attaching the cache to the backing dev, the output of bcache-super-show already differs from sysfs information on ppc64el. On x86 this in consistent (which means the superblock was already updated by the kernel at that stage).

On ppc64el the differences are:
- cache dev
  * sb.version is still 0 (should be 3)
  * dev.cache.ordered no (is yes on x86)
- backing dev
  * dev.data.cache_state detached (should be clean)

Created a vm dump to investigate deeper into this.

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Stefan Bader (smb) wrote :

While I have not found out where exactly things go wrong, I could confirm that this is related to ppc64el builds using a 64k page size. I build a test kernel with 4k page size and installed it inside the VM. With that bcache-super-show will will reflect the correct status after activation and also the superblock magic will remain intact when breaking the cache and backing device.

Revision history for this message
Stefan Bader (smb) wrote :

So I believe the problem is the hackish way to acquire buffer pages for internal biovec structures. This is done by taking a reference on the page which is returned by __bread() in read_super(). With 4k pages and reading 4k from sector 1 bh->b_data will always be the start of a page.
But with 64k pages I can see that bh->b_data is at offset 4k from the start of the page. This will definitively result in inconsistent data written as __write_super overlays the sb data from the page start and only updates some fields. It could be even worse if the page returned by __bread() is actually shared between other buffer_heads in case the requested read size is only a fraction of a page... but I do not know that for sure.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1602299] Re: bcache is unstable on ppc64el

This looks relevant:

  https://www.redhat.com/archives/dm-devel/2016-June/msg00015.html

On Tue, Jul 19, 2016 at 5:19 AM, Stefan Bader <email address hidden>
wrote:

> While I have not found out where exactly things go wrong, I could
> confirm that this is related to ppc64el builds using a 64k page size. I
> build a test kernel with 4k page size and installed it inside the VM.
> With that bcache-super-show will will reflect the correct status after
> activation and also the superblock magic will remain intact when
> breaking the cache and backing device.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1602299
>
> Title:
> bcache is unstable on ppc64el
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1602299/+subscriptions
>

Revision history for this message
Stefan Bader (smb) wrote :

Not sure. That talks about the location of the super block. It does not seem to change the size. But then it could be that this papers over things if the offset of data in the page returned by __bread depends on the relative location of a disk section within a page size area. So reading block one with a 4k block size gives a 4k offset with >4k pages...

I find the way they attach the page used by the buffer head to their internal bio structures rather unclean...

Revision history for this message
Stefan Bader (smb) wrote :

This is hopefully the last iteration on a stable targeted fix for bcache on non-4k architectures. Noting that this possibly applies to 4.6 (if that gets used for Yakkety) but needs at least some rework for 4.7 as it looks like submit_bio* functions changed arguments with rc2.

tags: added: patch
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.