[ppc64el] [ocata] Instance shutdown by itself. Calling the stop API.

Bug #1716469 reported by Ryan Beisner
This bug report is a duplicate of:  Bug #1709784: KVM on 16.04.3 throws an error. Edit Remove
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
New
Undecided
Unassigned
libvirt (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

On a ppc64el Ocata deployment, instances launch but are unreachable, then the instances shut down.

#### Juju status
http://paste.ubuntu.com/25516162/

#### libvirt
2017-09-11 18:49:28.053+0000: starting up libvirt version: 2.5.0, package: 3ubuntu5.4~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Fri, 28 Jul 2017 14:04:18 +0000), qemu version: 2
.8.0(Debian 1:2.8+dfsg-3ubuntu2.3~cloud0), hostname: node-loudred.maas
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name guest=instance-00000001,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var
/lib/libvirt/qemu/domain-4-instance-00000001/master-key.aes -machine pseries-zesty,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 768 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 79d6f9b9
-ce0e-4ba4-9742-206eb89562ed -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-4-instance-00000001/monitor.sock,server,nowait -mon chardev=charmonitor
,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-shutdown -boot strict=on -device pci-ohci,id=usb,bus=pci.0,addr=0x2 -drive file=/var/lib/nova/instances/79d6f9b9-ce0e-4ba4-9742-206eb89562ed/disk,form
at=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/79d6f9b9-ce0e-4ba
4-9742-206eb89562ed/disk.swap,format=qcow2,if=none,id=drive-virtio-disk1,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=28,id=hostnet0,v
host=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:ee:cc:52,bus=pci.0,addr=0x1 -add-fd set=2,fd=32 -chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on -device spapr-vty,
chardev=charserial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
Domain id=4 is tainted: host-cpu
char device redirected to /dev/pts/6 (label charserial0)
qemu:qemu_cpu_kick_thread: No such process2017-09-11 18:52:37.405+0000: shutting down, reason=crashed

#### nova-compute
2017-09-11 18:49:26.399 257659 WARNING nova.virt.libvirt.driver [req-29a9770e-612c-42c3-a997-cc47e58538ba 991f1e1738ab48039a28f9e4b3e0054e 842add06e52e41bd923c3d4476e3747b - - -] USB tablet requested for gue
sts by host configuration. In order to accept this request VNC should be enabled or SPICE and SPICE agent disabled on host.
2017-09-11 18:52:53.114 257659 WARNING nova.compute.manager [req-37f4920a-2156-406d-8cf1-f02ba416d310 - - - - -] [instance: 79d6f9b9-ce0e-4ba4-9742-206eb89562ed] Instance shutdown by itself. Calling the stop
 API. Current vm_state: active, current task_state: None, original DB power_state: 1, current VM power_state: 4

Revision history for this message
Ryan Beisner (1chb1n) wrote :

I manually stopped/started the libvirtd service on the compute nodes, confirmed that the service was active (running).

As soon as I start instances, there is a kernel OOPS, the corresponding qemu instance log shows a crash, and the libvirtd logs show errors.

See attached.

Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Retrying with HWE kernel, will report back.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

ppc tests were blocked by bug 1715397 for a while but before I had some valid manual tests on the qemu-rc/libvirt by IBM and myself.

I started a debuggable (interactive) regression test myself and look forward to what you find.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Actually reading deeper into your logs I think I know your case:

Sep 11 19:26:10 node-loudred kernel: [266093.954128] Facility 'TM' unavailable, exception at 0xd0000000243b7f10, MSR=9000000000009033

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

"domain-1-instance-00000001" in your tests isn't a good name to see what it actually does :-/
If you happen to run Xenial guests then you very likely are affected by this bug 1709784

Please close as a dup if you can confirm that this is the case (it most likely is IMHO if your setup has current Xenial guests).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I just re-verified it seems even your Host kernel is not allowed to be the bad one.
That said you should get out of the issue with the HWE kernel then which you were about to test anyway.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Still a dup IMHO

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.