OpenStack Compute (nova)

VM fails to boot after evacuation when it uses ceph disk

Bug #1781878 reported by Vahid Ashrafian on 2018-07-16

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Invalid	Undecided	Unassigned

Bug Description

Description
===========
If we use Ceph RBD as storage backend and Ceph Disks (image) have exclusive-lock feature, when a compute node goes down, the evacuation process works fine and nova detects the VM has a disk on a shared storage, so it rebuild the VM on another node. But after the evacuation, although nova marks the instance as active, the instance fails to boot and encounter a kernel panic caused by inability of the kernel to write on disk.

It is possible to disable exclusive-lock feature on Ceph and the evacuation process works fine, but it needed to be enabled in some use-cases.

Also there is a workaround for this problem, we were able to evacuate an instance successfully by removing the lock of the disk to the old instance using rbd command line, but I think it should be done in the code of rbd driver in Nova and Cinder.

The problem seams to be with the exclusive-lock feature. when a disk has exclusive-lock enabled, as soon as a client (the VM) connects and writes on disk, Ceph locks the disk for the client (lock-on-write) (also if we enable lock-on-read in Ceph conf, it would lock the disk on the first read). In the evacuation process, since there is no defined process to remove the exclusive-lock from the old VM, when the new VM tries to write on the disk, it fails to write since it can't get the lock.

I found similar problem reported for kubernetes when a node goes down and the system tries to attach its volume to new Pod.
https://github.com/openshift/origin/issues/7983#issuecomment-243736437
There, some people proposed before bringing up the new instance, first blacklist the old instance, then unlock the disk and lock it for the new one.

Steps to reproduce
==================
* Create an instance (with ceph storage backend) and wait for boot
* Poweroff the Host of the instance
* Evacuate the instance
* Check the Console in the dashboard

Expected result
===============
The instance should boot without any problem.

Actual result
=============
The instance encounter kernel panic and fails to boot.

Environment
===========
1. Openstack Queens, Nova 17.0.2
2. hypervisor: Libvirt (v4.0.0) + KVM
2. Storage: 12.2.4

Logs & Configs
==============
Console log of the instance after it evacuation:

[ 2.352586] blk_update_request: I/O error, dev vda, sector 18436
[ 2.357199] Buffer I/O error on dev vda1, logical block 2, lost async page write
[ 2.363736] blk_update_request: I/O error, dev vda, sector 18702
[ 2.431927] Buffer I/O error on dev vda1, logical block 135, lost async page write
[ 2.442673] blk_update_request: I/O error, dev vda, sector 18708
[ 2.449862] Buffer I/O error on dev vda1, logical block 138, lost async page write
[ 2.460061] blk_update_request: I/O error, dev vda, sector 18718
[ 2.468022] Buffer I/O error on dev vda1, logical block 143, lost async page write
[ 2.477360] blk_update_request: I/O error, dev vda, sector 18722
[ 2.484106] Buffer I/O error on dev vda1, logical block 145, lost async page write
[ 2.493227] blk_update_request: I/O error, dev vda, sector 18744
[ 2.499642] Buffer I/O error on dev vda1, logical block 156, lost async page write
[ 2.505792] blk_update_request: I/O error, dev vda, sector 35082
[ 2.510281] Buffer I/O error on dev vda1, logical block 8325, lost async page write
[ 2.516296] Buffer I/O error on dev vda1, logical block 8326, lost async page write
[ 2.522749] blk_update_request: I/O error, dev vda, sector 35096
[ 2.527483] Buffer I/O error on dev vda1, logical block 8332, lost async page write
[ 2.533616] Buffer I/O error on dev vda1, logical block 8333, lost async page write
[ 2.540085] blk_update_request: I/O error, dev vda, sector 35104
[ 2.545149] blk_update_request: I/O error, dev vda, sector 36236
[ 2.549948] JBD2: recovery failed
[ 2.552989] EXT4-fs (vda1): error loading journal
[ 2.557228] VFS: Dirty inode writeback failed for block device vda1 (err=-5).
[ 2.563139] EXT4-fs (vda1): couldn't mount as ext2 due to feature incompatibilities
[ 2.704190] JBD2: recovery failed
[ 2.708709] EXT4-fs (vda1): error loading journal
[ 2.714963] VFS: Dirty inode writeback failed for block device vda1 (err=-5).
mount: mounting /dev/vda1 on /newroot failed: Invalid argument
umount: can't umount /dev/vda1: Invalid argument
mcb [info=LABEL=cirros-rootfs dev=/dev/vda1 target=/newroot unmount=cbfail callback=check_sbin_init ret=1: failed to unmount
[ 2.886773] JBD2: recovery failed
[ 2.892670] EXT4-fs (vda1): error loading journal
[ 2.900580] VFS: Dirty inode writeback failed for block device vda1 (err=-5).
[ 2.911330] EXT4-fs (vda1): couldn't mount as ext2 due to feature incompatibilities
[ 3.044295] JBD2: recovery failed
[ 3.050363] EXT4-fs (vda1): error loading journal
[ 3.058689] VFS: Dirty inode writeback failed for block device vda1 (err=-5).
mount: mounting /dev/vda1 on /newroot failed: Invalid argument
info: copying initramfs to /dev/vda1
mount: can't find /newroot in /proc/mounts
info: initramfs loading root from /dev/vda1
BusyBox v1.23.2 (2017-11-20 02:37:12 UTC) multi-call binary.

Usage: switch_root [-c /dev/console] NEW_ROOT NEW_INIT [ARGS]

Free initramfs and switch to another root fs:
chroot to NEW_ROOT, delete all in /, move NEW_ROOT to /,
execute NEW_INIT. PID must be 1. NEW_ROOT must be a mountpoint.

-c DEV Reopen stdio to DEV after switch

[ 3.170388] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[ 3.170388]
[ 3.186305] CPU: 0 PID: 1 Comm: switch_root Not tainted 4.4.0-28-generic #47-Ubuntu
[ 3.198826] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.10.2-1ubuntu1~cloud0 04/01/2014
[ 3.213538] 0000000000000086 000000004cbc7242 ffff88001f63be10 ffffffff813eb1a3
[ 3.227588] ffffffff81cb10d8 ffff88001f63bea8 ffff88001f63be98 ffffffff8118bf57
[ 3.241405] ffff880000000010 ffff88001f63bea8 ffff88001f63be40 000000004cbc7242
[ 3.251820] Call Trace:
[ 3.254191] [<ffffffff813eb1a3>] dump_stack+0x63/0x90
[ 3.258257] [<ffffffff8118bf57>] panic+0xd3/0x215
[ 3.261865] [<ffffffff81184e1e>] ? perf_event_exit_task+0xbe/0x350
[ 3.266173] [<ffffffff81084541>] do_exit+0xae1/0xaf0
[ 3.269989] [<ffffffff8106b554>] ? __do_page_fault+0x1b4/0x400
[ 3.274408] [<ffffffff810845d3>] do_group_exit+0x43/0xb0
[ 3.278557] [<ffffffff81084654>] SyS_exit_group+0x14/0x20
[ 3.282693] [<ffffffff818276b2>] entry_SYSCALL_64_fastpath+0x16/0x71
[ 3.290709] Kernel Offset: disabled
[ 3.293770] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[ 3.293770]

See original description

Tags:

Vahid Ashrafian (vahid-arn) on 2018-07-16

description:	updated
description:	updated

Matt Riedemann (mriedem) on 2018-07-18

tags:	added: ceph
tags:	added: evacuate

Revision history for this message

melanie witt (melwitt) wrote on 2018-08-09:

We discussed this bug on IRC in #openstack-nova last week [1].

We don't think this is an issue with nova -- based on past issues that sound similar [2][3][4], it sounds like you have not enabled capabilities to blacklist other clients in ceph. Please see the ceph documentation [5][6], #6 specifically on [6], for more information.

Please confirm whether setting the ceph auth capabilities properly fixes your issue.

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-08-01.log.html#t2018-08-01T21:15:59
[2] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020722.html
[3] https://bugs.launchpad.net/nova/+bug/1773449
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1591434#c9
[5] http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
[6] http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken

Changed in nova:
status:	New → Incomplete

Revision history for this message

Cong Tran (congtt2801) wrote on 2019-05-13:

Same case here.

Setting the ceph auth capabilities properly fixes this issue.

Command: ceph auth caps client.<ID> mon 'allow r, allow command "osd blacklist"' osd '<existing OSD caps for user>'

In my case, I update client.nova & client.cinder in Ceph.

Revision history for this message

melanie witt (melwitt) wrote on 2019-05-15:

Hi Cong, thank you for confirming the fix in your case.

I'm going to go ahead and close this bug as Invalid for nova since it's not a nova issue, but a ceph configuration issue. If there are any changes or fixes needed in deployment tools to do the ceph configuration, please add those projects to this bug.

Changed in nova:
status:	Incomplete → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

redhat-bugs #1591434
[CLOSED ERRATA] Edit

Bug watches keep track of this bug in other bug trackers.