Exclusive-Lock Issue

Bug #1607694 reported by Luis
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

After few days running a VM over RBD, under heavy loads. We get freeze the VMs and get this log:

2016-07-26 16:20:47.069+0000: starting up libvirt version: 1.3.1, package: 1ubuntu10~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Fri, 15 Apr 2016 08:05:47 +0000), qemu version: 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.24), hostname: tckvmhv01.todoencloud
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name opennebula -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 8192 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 7733f4c6-c72d-af36-2e84-cbc44c5c8da0 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-opennebula/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=c,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 'file=rbd:sata/opennebula.img:id=sata:key=AQB69U1Xq+bxHhAAcw7ZWZlOjE0Bf88QMsTqMQ==:auth_supported=cephx\;none:mon_host=10.255.128.238\:6789,format=raw,if=none,id=drive-virtio-disk0' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/usr/src/systemrescuecd.iso,format=raw,if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:07:7f,bus=pci.0,addr=0x3 -netdev tap,fd=35,id=hostnet1,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:42:b6:74,bus=pci.0,addr=0x6 -netdev tap,fd=37,id=hostnet2,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:2b:bb:89,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:2 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
librbd/ExclusiveLock.cc: In function 'std::string librbd::ExclusiveLock<ImageCtxT>::encode_lock_cookie() const [with ImageCtxT = librbd::ImageCtx; std::string = std::basic_string<char>]' thread 7fded06a2a00 time 2016-07-28 23:44:09.845595
librbd/ExclusiveLock.cc: 197: FAILED assert(m_watch_handle != 0)
 ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
 1: (()+0x24c4bb) [0x7fdebcbde4bb]
 2: (()+0x7cf54) [0x7fdebca0ef54]
 3: (()+0x7dadb) [0x7fdebca0fadb]
 4: (()+0x7e827) [0x7fdebca10827]
 5: (()+0x65d46) [0x7fdebc9f7d46]
 6: (()+0x66218) [0x7fdebc9f8218]
 7: (rbd_aio_write()+0x29) [0x7fdebc9ec629]
 8: (()+0xde816) [0x7fded0108816]
 9: (()+0xde910) [0x7fded0108910]
 10: (()+0xb69b0) [0x7fded00e09b0]
 11: (()+0xbbc44) [0x7fded00e5c44]
 12: (()+0xbbc44) [0x7fded00e5c44]
 13: (()+0xbc8fc) [0x7fded00e68fc]
 14: (()+0xfc36a) [0x7fded012636a]
 15: (()+0x49800) [0x7fdeb84ac800]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-07-28 21:44:10.296+0000: shutting down

root@tckvmhv01:/var/log/libvirt/qemu# dpkg -l | grep ceph
ii ceph-common 10.2.0-0ubuntu0.16.04.1~cloud0 amd64 common utilities to mount and interact with a ceph storage cluster
ii libcephfs1 10.2.0-0ubuntu0.16.04.1~cloud0 amd64 Ceph distributed file system client library
ii python-ceph 10.2.0-0ubuntu0.16.04.1~cloud0 all Meta-package for python libraries for the Ceph libraries
ii python-cephfs 10.2.0-0ubuntu0.16.04.1~cloud0 amd64 Python libraries for the Ceph libcephfs library

root@tckvmhv01:/var/log/libvirt/qemu# dpkg -l | grep libvirt
ii libvirt-bin 1.3.1-1ubuntu10.1~cloud0 amd64 programs for the libvirt library
ii libvirt0:amd64 1.3.1-1ubuntu10.1~cloud0 amd64 library for interfacing with different virtualization systems
ii python-libvirt 1.3.1-1ubuntu1~cloud0 amd64 libvirt Python bindings

root@tckvmhv01:/var/log/libvirt/qemu# uname -a
Linux tckvmhv01 3.13.0-86-generic #131-Ubuntu SMP Thu May 12 23:33:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Tags: ceph libvirtd rbd
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ceph (Ubuntu):
status: New → Confirmed
Revision history for this message
Daniel Kraft (daniel-kraft) wrote :

We have had this on two VMs in the last 2 days which run on ubuntu 12.04 and 14.04 via virtio-scsi with 8 queues. Just around the time they were killed, also unattended-upgrades were running on the host.

Revision history for this message
Daniel Kraft (daniel-kraft) wrote :

We had this again last night. It hit the 14.04 machine again. This bug seems to prefer this special VM...

librbd/ExclusiveLock.cc: In function 'std::__cxx11::string librbd::ExclusiveLock<ImageCtxT>::encode_lock_cookie() const [with ImageCtxT = librbd::ImageCtx; std::__cxx11::string = std::__cxx11::basic_string<char>]' thread 7efe88abfc00 time 2016-08-03 22:40:44.069765
librbd/ExclusiveLock.cc: 217: FAILED assert(m_watch_handle != 0)
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x28d5d0) [0x7efe703335d0]
 2: (()+0x8c785) [0x7efe70132785]
 3: (()+0x8deab) [0x7efe70133eab]
 4: (()+0x8d6c7) [0x7efe701336c7]
 5: (()+0x74167) [0x7efe7011a167]
 6: (()+0x7456d) [0x7efe7011a56d]
 7: (rbd_aio_write()+0x4c) [0x7efe701067fc]
 8: (()+0x2359) [0x7efe7999c359]
 9: (()+0x241b) [0x7efe7999c41b]
 10: (()+0x495ada) [0x564d75434ada]
 11: (()+0x49809d) [0x564d7543709d]
 12: (()+0x498a87) [0x564d75437a87]
 13: (()+0x4633b6) [0x564d754023b6]
 14: (()+0x49809d) [0x564d7543709d]
 15: (()+0x498a87) [0x564d75437a87]
 16: (()+0x499730) [0x564d75438730]
 17: (()+0x4dd5ca) [0x564d7547c5ca]
 18: (()+0x4a590) [0x7efe83724590]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Might this correlate with creating snapshots or reading from them? Because 4 seconds prior to this error, we took a snapshot of this block device and created a rbd diff --whole-diff between the old snapshot and the new snapshot...

Revision history for this message
Jason Dillaman (jdillaman) wrote :

@Luis or Daniel: Are you seeing a "image watch failed" error message in your QEMU librbd logs?

Revision history for this message
Daniel Kraft (daniel-kraft) wrote :

@jdillaman nope, there's no such entry.

Revision history for this message
Luis (luis-ramirez) wrote : Re: [Bug 1607694] Re: Exclusive-Lock Issue
Download full text (5.1 KiB)

No.we don't have any message like that.

On 4 Aug 2016 13:55, "Jason Dillaman" <email address hidden> wrote:

> @Luis or Daniel: Are you seeing a "image watch failed" error message in
> your QEMU librbd logs?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1607694
>
> Title:
> Exclusive-Lock Issue
>
> Status in ceph package in Ubuntu:
> Confirmed
>
> Bug description:
> After few days running a VM over RBD, under heavy loads. We get freeze
> the VMs and get this log:
>
> 2016-07-26 16:20:47.069+0000: starting up libvirt version: 1.3.1,
> package: 1ubuntu10~cloud0 (Openstack Ubuntu Testing Bot <
> <email address hidden>> Fri, 15 Apr 2016 08:05:47 +0000), qemu
> version: 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.24), hostname:
> tckvmhv01.todoencloud
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
> QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name opennebula -S -machine
> pc-i440fx-trusty,accel=kvm,usb=off -m 8192 -realtime mlock=off -smp
> 4,sockets=4,cores=1,threads=1 -uuid 7733f4c6-c72d-af36-2e84-cbc44c5c8da0
> -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/
> var/lib/libvirt/qemu/domain-opennebula/monitor.sock,server,nowait -mon
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
> -boot order=c,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
> -drive 'file=rbd:sata/opennebula.img:id=sata:key=AQB69U1Xq+
> bxHhAAcw7ZWZlOjE0Bf88QMsTqMQ==:auth_supported=cephx\;none:
> mon_host=10.255.128.238\:6789,format=raw,if=none,id=drive-virtio-disk0'
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> -drive file=/usr/src/systemrescuecd.iso,format=raw,if=none,id=drive-ide0-1-0,readonly=on
> -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
> tap,fd=28,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=
> hostnet0,id=net0,mac=52:54:00:1c:07:7f,bus=pci.0,addr=0x3 -netdev
> tap,fd=35,id=hostnet1,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=
> hostnet1,id=net1,mac=52:54:00:42:b6:74,bus=pci.0,addr=0x6 -netdev
> tap,fd=37,id=hostnet2,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=
> hostnet2,id=net2,mac=52:54:00:2b:bb:89,bus=pci.0,addr=0x7 -chardev
> pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
> -device usb-tablet,id=input0 -vnc 127.0.0.1:2 -device
> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
> -msg timestamp=on
> char device redirected to /dev/pts/1 (label charserial0)
> librbd/ExclusiveLock.cc: In function 'std::string librbd::ExclusiveLock<
> ImageCtxT>::encode_lock_cookie() const [with ImageCtxT =
> librbd::ImageCtx; std::string = std::basic_string<char>]' thread
> 7fded06a2a00 time 2016-07-28 23:44:09.845595
> librbd/ExclusiveLock.cc: 197: FAILED assert(m_watch_handle != 0)
> ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
> 1: (()+0x24c4bb) [0x7fdebcbde4bb]
> 2: (()+0x7cf54) [0x7fdebca0ef54]
> 3: (()+0x7dadb) [0x7fdebca0fadb]
> 4: (()+0x7e827) [0x7fdebca10827]
> 5: (()+0x65d46) [0...

Read more...

Revision history for this message
Jason Dillaman (jdillaman) wrote :

Hmm -- any chance either one of you could reproduce with debug logging enabled? In your ceph configuration file, add the following to the "[client]" section:

debug rbd = 20
log path = /path/that/is/writable/by/qemu/process/ceph-$pid.log

Revision history for this message
Daniel Kraft (daniel-kraft) wrote :

Sure - but that's a bit heavy because these are production machines ;)
That'll take some time for a maintenance window.
Would this setting be active as soon as I `virsh restart` the machine?

Revision history for this message
Jason Dillaman (jdillaman) wrote :

As an alternative, could you provide the core dump? You can use ceph-post-file to upload the file.

Revision history for this message
Daniel Kraft (daniel-kraft) wrote :

I have added

[client]
debug rbd = 20
log path = /tmp/ceph-$pid.log

to /etc/ceph/ceph.conf
but this seems to have no effect for logging. At least, I can't find anything relevant below /tmp.

We're using virsh/libvirt for kvm virtualization on a hybrid configuration (osd hosts == VM hosts). Any hints?

Revision history for this message
Jason Dillaman (jdillaman) wrote :

I think the config option is "log file" instead of "log path". Also, make sure your QEMU process has access to the directory (both permissions and AppArmor).

Revision history for this message
Daniel Kraft (daniel-kraft) wrote :

Ok, it's logging now for a machine that died 3 times in the last 3 nights.
All deaths followed directly a backup run, where we do this:

1. create a snapshot
2. create a rbd fast-diff between this snapshot and the 24h older one

The log file is growing *very* fast, so let's see if we can afford to wait until tomorrow night.

Revision history for this message
Daniel Kraft (daniel-kraft) wrote :
Download full text (4.6 KiB)

Got it. This is the log output that results in the dead kvm process.

2016-08-12 22:29:44.499285 7f6dcffff700 10 librbd::ImageWatcher: 0x7f6db400b020 C_NotifyAck start: id=234071422725522, handle=140096867143248
2016-08-12 22:29:44.518673 7f6dcffff700 10 librbd::ImageWatcher: 0x7f6db4003b60 image header updated
2016-08-12 22:29:44.518680 7f6dcffff700 20 librbd::ImageState: refresh_seq = 5, last_refresh = 4
2016-08-12 22:29:44.518689 7f6dcffff700 10 librbd::ImageWatcher: 0x7f6db400b020 C_NotifyAck finish: id=234071422725522, handle=140096867143248
2016-08-12 22:29:44.573373 7f6dcffff700 10 librbd::ImageWatcher: 0x7f6db400b020 C_NotifyAck start: id=234071422725523, handle=140096867143248
2016-08-12 22:29:44.573386 7f6dcffff700 10 librbd::ImageWatcher: 0x7f6db4003b60 image header updated
2016-08-12 22:29:44.573389 7f6dcffff700 20 librbd::ImageState: refresh_seq = 6, last_refresh = 4
2016-08-12 22:29:44.573393 7f6dcffff700 10 librbd::ImageWatcher: 0x7f6db400b020 C_NotifyAck finish: id=234071422725523, handle=140096867143248
2016-08-12 22:29:45.249895 7f6dcffff700 -1 librbd::ImageWatcher: 0x7f6db4003b60 image watch failed: 140096867143248, (107) Transport endpoint is not connected
2016-08-12 22:29:45.262607 7f6dcffff700 10 librbd::ImageWatcher: 0x7f6db4003b60 current lock owner: [0,0]
2016-08-12 22:29:45.319110 7f6dcd6f9700 10 librbd::ImageWatcher: 0x7f6db4003b60 re-registering image watch
2016-08-12 22:29:45.319145 7f6dcd6f9700 10 librbd::ExclusiveLock: 0x7f6db40029f0 release_lock
2016-08-12 22:29:45.319150 7f6dcd6f9700 10 librbd::ExclusiveLock: 0x7f6db40029f0 send_release_lock
2016-08-12 22:29:45.319251 7f6dcf7fe700 10 librbd::exclusive_lock::ReleaseRequest: send_cancel_op_requests
2016-08-12 22:29:45.319263 7f6dcf7fe700 10 librbd::exclusive_lock::ReleaseRequest: handle_cancel_op_requests: r=0
2016-08-12 22:29:45.319267 7f6dcf7fe700 10 librbd::exclusive_lock::ReleaseRequest: send_block_writes
2016-08-12 22:29:45.319271 7f6dcf7fe700 5 librbd::AioImageRequestWQ: block_writes: 0x55ee44904f50, num=1
2016-08-12 22:29:45.319285 7f6dcf7fe700 10 librbd::exclusive_lock::ReleaseRequest: handle_block_writes: r=0
2016-08-12 22:29:45.319352 7f6dcf7fe700 10 librbd::ExclusiveLock: 0x7f6db40029f0 handle_releasing_lock
2016-08-12 22:29:45.319361 7f6dcf7fe700 10 librbd::exclusive_lock::ReleaseRequest: send_flush_notifies
2016-08-12 22:29:45.319366 7f6dcf7fe700 10 librbd::exclusive_lock::ReleaseRequest: handle_flush_notifies
2016-08-12 22:29:45.319368 7f6dcf7fe700 10 librbd::exclusive_lock::ReleaseRequest: send_close_object_map
2016-08-12 22:29:45.319378 7f6dcf7fe700 10 librbd::object_map::UnlockRequest: 0x7f6dc407b8c0 send_unlock: oid=rbd_object_map.40c5612ae8944a
2016-08-12 22:29:45.320613 7f6dcffff700 10 librbd::object_map::UnlockRequest: 0x7f6dc407b8c0 handle_unlock: r=-108
2016-08-12 22:29:45.320626 7f6dcffff700 -1 librbd::object_map::UnlockRequest: failed to release object map lock: (108) Cannot send after transport endpoint shutdown
2016-08-12 22:29:45.320639 7f6dcffff700 10 librbd::exclusive_lock::ReleaseRequest: handle_close_object_map: r=0
2016-08-12 22:29:45.320647 7f6dcffff700 10 librbd::exclusive_lock::ReleaseRequest: send_unlock
2016-08-12 22:29:45....

Read more...

Revision history for this message
Jason Dillaman (jdillaman) wrote :

Thanks for the assistance. It appears like the issue you are hitting is due to a failed watch:

2016-08-12 22:29:45.249895 7f6dcffff700 -1 librbd::ImageWatcher: 0x7f6db4003b60 image watch failed: 140096867143248, (107) Transport endpoint is not connected

There is a heartbeat that your client is supposed to send to the OSD every 5 seconds to prevent the watch from timing out after 30 seconds. This is indicative of an overloaded client / cluster.

The good news is that the fix is already available [1] and should be included in the next Ceph point release. This won't prevent the watch failure but should prevent the race condition between failure and recovery.

[1] http://tracker.ceph.com/issues/16923

Revision history for this message
Jason Dillaman (jdillaman) wrote :

I should also mention that, in the meantime, you may want to disable the exclusive-lock feature on the affected images via the rbd CLI:

rbd feature disable <image-spec> exclusive-lock,object-map,fast-diff

Revision history for this message
Philipp Noack (philipp-noack-b) wrote :

Thanks for the possible workaround. But fast-diff is a very important feature for us and unfortunately disabling it is not an options for us. The virtual machine gets killed almost every night at the moment. Is there an estimated time window for the fix in sight?

Revision history for this message
Jason Dillaman (jdillaman) wrote :

If you are interested, I can build a Jewel 10.2.2 + this fix "development" release package [1] that you can install and verify that it resolves the issue. I am going out of town for a few days so if it's desired, I would need to know ASAP in order to start the build.

[1] http://docs.ceph.com/docs/master/install/get-packages/#add-ceph-development

Revision history for this message
Philipp Noack (philipp-noack-b) wrote :

@jdillaman That would be awesome, I'd test this version on our servers to see if it resolves our problems.

Revision history for this message
Jason Dillaman (jdillaman) wrote :

The v10.2.2 branch with the cherry-picked fix is available under the 'wip-16950-jewel' branch name. See the instructions above for how to install test/development packages.

Revision history for this message
Christian Theune (ctheune) wrote :

As 10.2.3 was released - any news whether this is fixed now? Daniel?

Revision history for this message
Daniel Kraft (daniel-kraft) wrote :

Until monday I'd have said: yes, fixed. We have 10.2.3 running for a few weeks now and it looks good.

However, on monday we've had 2 vms in one night that died with the error message 'INFO: task jbd2/sda1-8:171_blocked for more than 120 seconds.'
This was during snapshot create/delete times.

Also, we've had one vm where virsh/qemu must have lost the rbd / ceph connection. Hard to say what happened exactly, but the running VM logged to stdout that there was no sda anymore and a hard reset of the vm had no disk to boot from. A shutdown / start cycle resolved that.

Both errors look like different problems right now, that's why I wasn't reporting anything here. So from that perspective I'd say:

"solved".

Revision history for this message
Christian Theune (ctheune) wrote :

Thanks. Interesting note from my side: I'm currently debugging a different hang that we're experiencing with XFS on the inside. See http://tracker.ceph.com/issues/17536 which will be updated with detailed logs soon.

Revision history for this message
James Page (james-page) wrote :

Included in the 10.2.3 release which is currently in xenial-proposed for testing so marking Fix Committed.

Changed in ceph (Ubuntu):
status: Confirmed → Fix Committed
importance: Undecided → Medium
Revision history for this message
James Page (james-page) wrote :

10.2.3 is now release to updates - Marking 'Fix Released'

Changed in ceph (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.