qemu segfaults after re-attaching ceph volume to instance

Bug #1763649 reported by Crazik
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
New
Undecided
Unassigned
qemu (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Incomplete
Undecided
Unassigned
Artful
Incomplete
Undecided
Unassigned

Bug Description

I have OpenStack compute nodes with qemu-system-x86. Using Ceph as storage backend for base disks and volumes (no local storage).

When I create a new volume on ceph and attach to instance - it's working.
When I detach volume, and re-attach again, with limited number of repeats I am able to crash my instance. Sometimes it's just in second try, sometimes 6, 9. In most cases it won't survive 10 cycles.

Steps to reproduce:

- create instance
- create volume in ceph

define volume in disk.xml: http://paste.openstack.org/show/719130/

now try a loop:

while true; do
  virsh attach-device instance-0xxx disk.xml;
  sleep 5;
  virsh detach-disk instance-000022e8 vdb --live;
  sleep 5;
done

After few iterations, instance is crashed.

Logs:

kernel: [3866704.245319] traps: qemu-system-x86[23382] general protection ip:558690860750 sp:7faaf36f6ea8 error:0 in qemu-system-x86_64[5586902a7000+842000]

or

kernel: [7252748.718834] qemu-system-x86[30720]: segfault at 100 ip 000056258ba78144 sp 00007fca010c1eb0 error 4 in qemu-system-x86_64[56258b47a000+842000]

Ubuntu Xenial 16.04.3 with cloud-archive@Ocata repositories
kernel: 4.4.0-109-generic
qemu-system-x86 1:2.8+dfsg-3ubuntu2.9~cloud1
libvirt-bin 2.5.0-3ubuntu5.6~cloud0
ceph/rados: 10.2.10-1xenial

Tags: ceph qemu rgw
Crazik (crazik)
description: updated
Crazik (crazik)
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Corey / James - I have no ceph around at all, also this is reported against a cloud-archive qemu (Ocata if I read it correctly).

Can you confirm this issue and if so are there further insights how to handle it further?

@Crazik - to what extend could you try on your existing setup with different qemu&libvirt versions like those of Ubuntu Cloud Archive Pike (2.10) and Queens (2.11) from [1] ?
If you can it might be worth to update the storage node (ceph) independently to the compute node qemu/libvirt - that way more easily we might get a feeling in which area a potential fix might be.

[1]: https://wiki.ubuntu.com/OpenStack/CloudArchive

Revision history for this message
Crazik (crazik) wrote :

Problem was solved by upgrade to Queens.
Looks like it was caused by ceph/rados libs w/ qemu issues.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Crazik for reporting that, at least that means newer versions on Bionic/Queens are good.

Changed in qemu (Ubuntu):
status: New → Fix Released
Changed in qemu (Ubuntu Xenial):
status: New → Incomplete
Changed in qemu (Ubuntu Artful):
status: New → Incomplete
Revision history for this message
Crazik (crazik) wrote :

Well, still Xenial with cloud archive repos for queens.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So it is not fixed by your upgrade to queens as you first thought?
=> Ubuntu 16.04 + UCA-Queens fails other than stated in comment #2?

We these different versions here to consider:
- Xenial as-is (no report on it yet)
- Artful as-is (no report on it yet)
- Xenial-Ocata (initial report, fails)
- Xenial-Queens (comment #2, reported good)

Is the summary above correct?

Revision history for this message
Crazik (crazik) wrote :

I was confused by your "versions on Bionic/Queens are good" statement.

Openstack was upgraded to Queens, it's correct, I am using cloud archive repositories for OpenStack packages, while base system is still based on Xenial.

So final summary is correct,

Revision history for this message
Gaudenz Steinlin (gaudenz-debian) wrote :

I can confirm that upgrading from Xenial-Ocata to Xenial-Pike (both Cloud Archive) solves the issue.

Revision history for this message
Gaudenz Steinlin (gaudenz-debian) wrote :

Would it be possible to get a fixed QEMU version into the Xenial-Ocata cloud archive?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.