Ubuntu
ceph package

ceph-disk-prepare command always fails; new partition table not avaliable until reboot

Bug #1371526 reported by James Page on 2014-09-19

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	ceph (Ubuntu)	Invalid	Undecided	Unassigned
	linux (Ubuntu)	Expired	Undecided	Unassigned

Bug Description

$ sudo ceph-disk-prepare --fs-type xfs --zap-disk /dev/vdb
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery & transformation menu to examine the two tables.

Warning! One or more CRCs don't match. You should repair the disk!

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
mkfs.xfs: cannot open /dev/vdb1: Device or resource busy
ceph-disk: Error: Command '['/sbin/mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/vdb1']' returned non-zero exit status 1

I can reproduce this consistently across ceph nodes; also impacts on the way we use swift for storage as well.

ProblemType: Bug
DistroRelease: Ubuntu 14.10
Package: ceph 0.80.5-1
ProcVersionSignature: User Name 3.16.0-16.22-generic 3.16.2
Uname: Linux 3.16.0-16-generic x86_64
ApportVersion: 2.14.7-0ubuntu2
Architecture: amd64
Date: Fri Sep 19 09:39:18 2014
Ec2AMI: ami-00000084
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.small
Ec2Kernel: aki-00000002
Ec2Ramdisk: ari-00000002
SourcePackage: ceph
UpgradeStatus: No upgrade log present (probably fresh install)
---
ApportVersion: 2.14.7-0ubuntu2
Architecture: amd64
DistroRelease: Ubuntu 14.10
Ec2AMI: ami-00000084
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.small
Ec2Kernel: aki-00000002
Ec2Ramdisk: ari-00000002
Package: linux
PackageArchitecture: amd64
ProcVersionSignature: Ubuntu 3.16.0-16.22-generic 3.16.2
Tags: utopic ec2-images
Uname: Linux 3.16.0-16-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True

See original description

Tags:

Revision history for this message

James Page (james-page) wrote on 2014-09-19:

Dependencies.txt Edit (4.1 KiB, text/plain; charset="utf-8")
ProcEnviron.txt Edit (315 bytes, text/plain; charset="utf-8")

Revision history for this message

James Page (james-page) wrote on 2014-09-19:

This is what things should look like:

sudo ceph-disk-prepare --fs-type xfs --zap-disk /dev/vdb
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Caution! After loading partitions, the CRC doesn't check out!
Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery & transformation menu to examine the two tables.

Warning! One or more CRCs don't match. You should repair the disk!

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
The operation has completed successfully.
The operation has completed successfully.
The operation has completed successfully.
meta-data=/dev/vdb1 isize=2048 agcount=4, agsize=1245119 blks
         = sectsz=512 attr=2, projid32bit=1
         = crc=0 finobt=0
data = bsize=4096 blocks=4980475, imaxpct=25
         = sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=2560, version=2
         = sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
The operation has completed successfully.

(that was after a reboot).

Revision history for this message

James Page (james-page) wrote on 2014-09-19:

Its probably worth noting that this is in a cloud-instance which automatically formats and mounts /dev/vdb on first boot; however we do umount it prior to performing the ceph-disk-prepare command.

Revision history for this message

James Page (james-page) wrote on 2014-09-19:

I poked at this with some additional volumes attached to the cloud instance:

Offending device:

ubuntu@juju-t-machine-22:~$ sudo umount /dev/vdb
ubuntu@juju-t-machine-22:~$ sudo lsof | grep vdb
jbd2/vdb- 1268 root cwd DIR 253,1 4096 2 /
jbd2/vdb- 1268 root rtd DIR 253,1 4096 2 /
jbd2/vdb- 1268 root txt unknown /proc/1268/exe

Additional device:

ubuntu@juju-t-machine-22:~$ sudo mount /dev/vdc /mnt2
ubuntu@juju-t-machine-22:~$ sudo lsof | grep vdc
jbd2/vdc- 16058 root cwd DIR 253,1 4096 2 /
jbd2/vdc- 16058 root rtd DIR 253,1 4096 2 /
jbd2/vdc- 16058 root txt unknown /proc/16058/exe
ubuntu@juju-t-machine-22:~$ sudo umount /dev/vdc
ubuntu@juju-t-machine-22:~$ sudo lsof | grep vdc

As you can see, the jbd2 process for vdb appears to hang around, which I think is what is keeping the partition table locked in kernel and hence stale.

Revision history for this message

James Page (james-page) wrote on 2014-09-19:

sudo lshw -class disk -class storage
  *-ide
       description: IDE interface
       product: 82371SB PIIX3 IDE [Natoma/Triton II]
       vendor: Intel Corporation
       physical id: 1.1
       bus info: pci@0000:00:01.1
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: ide bus_master
       configuration: driver=ata_piix latency=0
       resources: irq:0 ioport:1f0(size=8) ioport:3f6 ioport:170(size=8) ioport:376 ioport:c0e0(size=16)
  *-scsi:0
       description: SCSI storage controller
       product: Virtio block device
       vendor: Red Hat, Inc
       physical id: 4
       bus info: pci@0000:00:04.0
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: scsi msix bus_master cap_list
       configuration: driver=virtio-pci latency=0
       resources: irq:11 ioport:c000(size=64) memory:febd2000-febd2fff
  *-scsi:1
       description: SCSI storage controller
       product: Virtio block device
       vendor: Red Hat, Inc
       physical id: 5
       bus info: pci@0000:00:05.0
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: scsi msix bus_master cap_list
       configuration: driver=virtio-pci latency=0
       resources: irq:10 ioport:c040(size=64) memory:febd3000-febd3fff
  *-scsi:2
       description: SCSI storage controller
       product: Virtio block device
       vendor: Red Hat, Inc
       physical id: 7
       bus info: pci@0000:00:07.0
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: scsi msix bus_master cap_list
       configuration: driver=virtio-pci latency=0
       resources: irq:10 ioport:1000(size=64) memory:80000000-80000fff

Revision history for this message

Brad Figg (brad-figg) wrote on 2014-09-19: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1371526

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

James Page (james-page) wrote on 2014-09-19:

This only appears to happen with the device on first boot; after a reboot mount/umount drops all jdb2 processes as I think it should.

Revision history for this message

James Page (james-page) wrote on 2014-09-19: Dependencies.txt

Dependencies.txt Edit (4.1 KiB, text/plain)

apport information

tags:	added: apport-collected
description:	updated

Revision history for this message

James Page (james-page) wrote on 2014-09-19: ProcEnviron.txt

ProcEnviron.txt Edit (293 bytes, text/plain)

apport information

Revision history for this message

Stefan Bader (smb) wrote on 2014-09-19:

#10

So to gather more pointers I tried a Trusty host and Utopic KVM guests. Either manually created (with virt-manager and not involving cloud-init then) and also using uvtool which is at least using a vdb for cloud-init data (in some way, though the image is a ro iso). Both ways the jbd2 process goes away after unmount.

Revision history for this message

Stefan Bader (smb) wrote on 2014-09-22:

#11

We would really need information about setting up a system that runs into issues. In particular, how is the cloud-init ephemeral disk created? I still cannot reproduce this (making an ext3 fs outside and put something into it, then mount it by label from the guest, unmount it, mount it again and write something, all works ok). So we need as much info about what is done to the ephemeral disk from the start to the point where umount fails.

Revision history for this message

James Page (james-page) wrote on 2014-09-23:

#12

I can't reproduce this problem any longer, so I'm assuming that something changed to fix this as I was seeing it 100% of the time.

Revision history for this message

shawnggraham (shawnggraham) wrote on 2014-09-23:

#13

I believe this is affecting live migration of ubuntu 14.04 kvm guests from one kvm host to another. When i attempt this now I get read write errors on the guest, and the file system goes into read only mode. I have a test environment where I can verify this.

James Page (james-page) on 2014-09-23

Changed in ceph (Ubuntu):
status:	New → Invalid

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-11-23:

#14

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status:	Incomplete → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuceph package

ceph-disk-prepare command always fails; new partition table not avaliable until reboot

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
ceph package