VM fails to boot after evacuation when it uses ceph disk

Bug #1781878 reported by Vahid Ashrafian
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

Description
===========
If we use Ceph RBD as storage backend and Ceph Disks (image) have exclusive-lock feature, when a compute node goes down, the evacuation process works fine and nova detects the VM has a disk on a shared storage, so it rebuild the VM on another node. But after the evacuation, although nova marks the instance as active, the instance fails to boot and encounter a kernel panic caused by inability of the kernel to write on disk.

It is possible to disable exclusive-lock feature on Ceph and the evacuation process works fine, but it needed to be enabled in some use-cases.

Also there is a workaround for this problem, we were able to evacuate an instance successfully by removing the lock of the disk to the old instance using rbd command line, but I think it should be done in the code of rbd driver in Nova and Cinder.

The problem seams to be with the exclusive-lock feature. when a disk has exclusive-lock enabled, as soon as a client (the VM) connects and writes on disk, Ceph locks the disk for the client (lock-on-write) (also if we enable lock-on-read in Ceph conf, it would lock the disk on the first read). In the evacuation process, since there is no defined process to remove the exclusive-lock from the old VM, when the new VM tries to write on the disk, it fails to write since it can't get the lock.

I found similar problem reported for kubernetes when a node goes down and the system tries to attach its volume to new Pod.
https://github.com/openshift/origin/issues/7983#issuecomment-243736437
There, some people proposed before bringing up the new instance, first blacklist the old instance, then unlock the disk and lock it for the new one.

Steps to reproduce
==================
* Create an instance (with ceph storage backend) and wait for boot
* Poweroff the Host of the instance
* Evacuate the instance
* Check the Console in the dashboard

Expected result
===============
The instance should boot without any problem.

Actual result
=============
The instance encounter kernel panic and fails to boot.

Environment
===========
1. Openstack Queens, Nova 17.0.2
2. hypervisor: Libvirt (v4.0.0) + KVM
2. Storage: 12.2.4

Logs & Configs
==============
Console log of the instance after it evacuation:

[ 2.352586] blk_update_request: I/O error, dev vda, sector 18436
[ 2.357199] Buffer I/O error on dev vda1, logical block 2, lost async page write
[ 2.363736] blk_update_request: I/O error, dev vda, sector 18702
[ 2.431927] Buffer I/O error on dev vda1, logical block 135, lost async page write
[ 2.442673] blk_update_request: I/O error, dev vda, sector 18708
[ 2.449862] Buffer I/O error on dev vda1, logical block 138, lost async page write
[ 2.460061] blk_update_request: I/O error, dev vda, sector 18718
[ 2.468022] Buffer I/O error on dev vda1, logical block 143, lost async page write
[ 2.477360] blk_update_request: I/O error, dev vda, sector 18722
[ 2.484106] Buffer I/O error on dev vda1, logical block 145, lost async page write
[ 2.493227] blk_update_request: I/O error, dev vda, sector 18744
[ 2.499642] Buffer I/O error on dev vda1, logical block 156, lost async page write
[ 2.505792] blk_update_request: I/O error, dev vda, sector 35082
[ 2.510281] Buffer I/O error on dev vda1, logical block 8325, lost async page write
[ 2.516296] Buffer I/O error on dev vda1, logical block 8326, lost async page write
[ 2.522749] blk_update_request: I/O error, dev vda, sector 35096
[ 2.527483] Buffer I/O error on dev vda1, logical block 8332, lost async page write
[ 2.533616] Buffer I/O error on dev vda1, logical block 8333, lost async page write
[ 2.540085] blk_update_request: I/O error, dev vda, sector 35104
[ 2.545149] blk_update_request: I/O error, dev vda, sector 36236
[ 2.549948] JBD2: recovery failed
[ 2.552989] EXT4-fs (vda1): error loading journal
[ 2.557228] VFS: Dirty inode writeback failed for block device vda1 (err=-5).
[ 2.563139] EXT4-fs (vda1): couldn't mount as ext2 due to feature incompatibilities
[ 2.704190] JBD2: recovery failed
[ 2.708709] EXT4-fs (vda1): error loading journal
[ 2.714963] VFS: Dirty inode writeback failed for block device vda1 (err=-5).
mount: mounting /dev/vda1 on /newroot failed: Invalid argument
umount: can't umount /dev/vda1: Invalid argument
mcb [info=LABEL=cirros-rootfs dev=/dev/vda1 target=/newroot unmount=cbfail callback=check_sbin_init ret=1: failed to unmount
[ 2.886773] JBD2: recovery failed
[ 2.892670] EXT4-fs (vda1): error loading journal
[ 2.900580] VFS: Dirty inode writeback failed for block device vda1 (err=-5).
[ 2.911330] EXT4-fs (vda1): couldn't mount as ext2 due to feature incompatibilities
[ 3.044295] JBD2: recovery failed
[ 3.050363] EXT4-fs (vda1): error loading journal
[ 3.058689] VFS: Dirty inode writeback failed for block device vda1 (err=-5).
mount: mounting /dev/vda1 on /newroot failed: Invalid argument
info: copying initramfs to /dev/vda1
mount: can't find /newroot in /proc/mounts
info: initramfs loading root from /dev/vda1
BusyBox v1.23.2 (2017-11-20 02:37:12 UTC) multi-call binary.

Usage: switch_root [-c /dev/console] NEW_ROOT NEW_INIT [ARGS]

Free initramfs and switch to another root fs:
chroot to NEW_ROOT, delete all in /, move NEW_ROOT to /,
execute NEW_INIT. PID must be 1. NEW_ROOT must be a mountpoint.

 -c DEV Reopen stdio to DEV after switch

[ 3.170388] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[ 3.170388]
[ 3.186305] CPU: 0 PID: 1 Comm: switch_root Not tainted 4.4.0-28-generic #47-Ubuntu
[ 3.198826] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.10.2-1ubuntu1~cloud0 04/01/2014
[ 3.213538] 0000000000000086 000000004cbc7242 ffff88001f63be10 ffffffff813eb1a3
[ 3.227588] ffffffff81cb10d8 ffff88001f63bea8 ffff88001f63be98 ffffffff8118bf57
[ 3.241405] ffff880000000010 ffff88001f63bea8 ffff88001f63be40 000000004cbc7242
[ 3.251820] Call Trace:
[ 3.254191] [<ffffffff813eb1a3>] dump_stack+0x63/0x90
[ 3.258257] [<ffffffff8118bf57>] panic+0xd3/0x215
[ 3.261865] [<ffffffff81184e1e>] ? perf_event_exit_task+0xbe/0x350
[ 3.266173] [<ffffffff81084541>] do_exit+0xae1/0xaf0
[ 3.269989] [<ffffffff8106b554>] ? __do_page_fault+0x1b4/0x400
[ 3.274408] [<ffffffff810845d3>] do_group_exit+0x43/0xb0
[ 3.278557] [<ffffffff81084654>] SyS_exit_group+0x14/0x20
[ 3.282693] [<ffffffff818276b2>] entry_SYSCALL_64_fastpath+0x16/0x71
[ 3.290709] Kernel Offset: disabled
[ 3.293770] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[ 3.293770]

Tags: ceph evacuate
description: updated
description: updated
Matt Riedemann (mriedem)
tags: added: ceph
tags: added: evacuate
Revision history for this message
melanie witt (melwitt) wrote :

We discussed this bug on IRC in #openstack-nova last week [1].

We don't think this is an issue with nova -- based on past issues that sound similar [2][3][4], it sounds like you have not enabled capabilities to blacklist other clients in ceph. Please see the ceph documentation [5][6], #6 specifically on [6], for more information.

Please confirm whether setting the ceph auth capabilities properly fixes your issue.

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-08-01.log.html#t2018-08-01T21:15:59
[2] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020722.html
[3] https://bugs.launchpad.net/nova/+bug/1773449
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1591434#c9
[5] http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
[6] http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken

Changed in nova:
status: New → Incomplete
Revision history for this message
Cong Tran (congtt2801) wrote :

Same case here.

Setting the ceph auth capabilities properly fixes this issue.

Command: ceph auth caps client.<ID> mon 'allow r, allow command "osd blacklist"' osd '<existing OSD caps for user>'

In my case, I update client.nova & client.cinder in Ceph.

Revision history for this message
melanie witt (melwitt) wrote :

Hi Cong, thank you for confirming the fix in your case.

I'm going to go ahead and close this bug as Invalid for nova since it's not a nova issue, but a ceph configuration issue. If there are any changes or fixes needed in deployment tools to do the ceph configuration, please add those projects to this bug.

Changed in nova:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.