crash on s390 in kvm run due to background load on postcopy
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu on IBM z Systems |
Invalid
|
Undecided
|
bugproxy | ||
qemu (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Hi,
I happened rather often (but not 100% reproducible) into an issue that I wanted to document and ask if that is some sort of known issue.
On migration with options like:
$ virsh migrate --live --postcopy --postcopy-
Note: All other migration types we test are working.
Even postcopy is good without background workload.
Just this combination of postcopy-
FYI - BG-Load is on a 4 vcpu guest
- nohup stress-ng -m 1 --vm-keep --vm-bytes 256M 1>/dev/null 2>&1 &
- nohup md5sum /dev/urandom 1>/dev/null 2>&1 &
- nohup bash -c "while /bin/true; do dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M count=100; done" 1>/dev/null 2>&1
That load runs on 3 of those guests on a 8 CPU Host.
So we make more than the 8 cpus we have busy with the load.
Migration is accounted as success on initiator, but as paused on target
46577 State: paused
The error I get from qmeu is on kvm run like:
cat /var/log/
[...]
46164 error: kvm run failed Bad address
46165 PSW=mask 0404d00180000000 addr 0000000000831996 cc 00
46166 R00=00000000211
46167 R04=000000005a4
46168 R08=00000000ac4
46169 R12=00000000d91
46170 F00=000003ffc0f
46171 F04=00000000000
46172 F08=000002aa10c
46173 F12=0000000021d
46174 V00=000003ffc0f
46175 V02=000002aa10c
46176 V04=00000000000
46177 V06=000002aa112
46178 V08=000002aa10c
46179 V10=0000000021d
46180 V12=0000000021d
46181 V14=000003ffefc
46182 V16=00000000000
46183 V18=40404040404
46184 V20=0f0e0d0c0b0
46185 V22=0000ff00000
46186 V24=00000000000
46187 V26=00000000000
46188 V28=00000000000
46189 V30=000002aa0ba
46190 C00=00800000148
46191 C04=00000000000
46192 C08=00000000000
46193 C12=00000000000
FYI: Our machine is generally very slow, especially on I/O, but also on CPU when the builders are busy. Same test run good a few days ago, seems to depend on overall machine load adding up to the background load on migration test. Which in turn adds up to break it on s390x.
Note: It is also a very unfair comparison, we have 8 cores on s390x, while on x86 and ppc we have way more.
I haven't catched it "live" so far to debug it any further - only in automated testing I realized that this is at least occurring once every other week.
Affected releases seem to be Yakkety (libvirt 2.1 / qemu 2.6.1) and zesty (libvirt 2.5 / qmeu 2.8).
As soon as our Artful stack is fully done I'll add those.
For know a check against known issues would be nice.
Changed in ubuntu-z-systems: | |
assignee: | nobody → bugproxy (bugproxy) |
tags: | added: architecture-s39064 bugnameltc-156764 severity-high targetmilestone-inin1704 |
tags: | added: s390x |
------- Comment From <email address hidden> 2017-07-18 03:33 EDT-------
I assume the crash is on the target (not the source). Do you have any dmesg messages from that system? Does the target system have enough memory/swap?