[Potential Regression] ubuntu_kvm_smoke_test failed with X-4.4.0-255.289 on Oracle instances

Bug #2069243 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Invalid
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned

Bug Description

Issue found on Oracle instance VM.DenseIO2.8, VM.Standard2.1 with 4.4.0-255.289

KVM failed to start, test failed with:
  uvt-kvm: error: timed out waiting for dnsmasq lease for 52:54:00:9e:29:7e.

Test log:
 Running 'sudo -u ubuntu /home/ubuntu/autotest/client/tests/ubuntu_kvm_smoke_test/the-test amd64'
 + SUT=bjf-test
 + SSH_KEY=/home/ubuntu/.ssh/id_rsa
 + SSH_OPTIONS='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -i /home/ubuntu/.ssh/id_rsa'
 ++ lsb_release -c
 ++ awk '{print$2}'
 + DISTRO=xenial
 + ARCHITECTURE=amd64
 + trap cleanup EXIT
 + '[' -z amd64 ']'
 + kvm-ok
 INFO: /dev/kvm exists
 KVM acceleration can be used
 + '[' 0 '!=' 0 ']'
 + set -e
 + '[' '!' -f /home/ubuntu/.ssh/id_rsa ']'
 + ssh-keygen -f /home/ubuntu/.ssh/id_rsa -t rsa -N ''
 Generating public/private rsa key pair.
 Your identification has been saved in /home/ubuntu/.ssh/id_rsa.
 Your public key has been saved in /home/ubuntu/.ssh/id_rsa.pub.
 The key fingerprint is:
 SHA256:sDO+6ZjLP+nlVSse7DVFfjs171zBcpTNuU/7o9nwiCs ubuntu@x-l-gen-s2-vmdio28-u-kvm-smk-test
 The key's randomart image is:
 +---[RSA 2048]----+
 | |
 | .+|
 | . .+o|
 | o oo .|
 | + S ..o*+|
 | . o . . oooO|
 | ... = +. ++|
 | . oo= E +..*++|
 | ==*.. +o.+.o=|
 +----[SHA256]-----+
 + '[' amd64 = ppc64el ']'
 ++ uvt-simplestreams-libvirt query
 ++ grep -P 'xenial.*amd64'
 ++ true
 + image=
 + '[' -z '' ']'
 + uvt-simplestreams-libvirt sync --source http://cloud-images.ubuntu.com/daily release=xenial arch=amd64
 + uvt-kvm create bjf-test release=xenial arch=amd64
 + uvt-kvm wait bjf-test --insecure --ssh-private-key-file /home/ubuntu/.ssh/id_rsa
 uvt-kvm: error: timed out waiting for dnsmasq lease for 52:54:00:9e:29:7e.
 + cleanup
 + uvt-kvm destroy bjf-test
 + '[' amd64 = ppc64el ']'

Console output:
[ 829.922430] BUG: unable to handle kernel paging request at 00000000000049d8
[ 829.929523] IP: [<ffffffffc04ce54f>] vmx_vcpu_run+0xff/0x5d0 [kvm_intel]
[ 829.931108] PGD 0
[ 829.931616] Oops: 0000 [#1] SMP
[ 829.932442] Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat bridge stp llc ebtable_filter ebtables kvm_intel ip6table_filter ip6_tables xt_comment ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_filter ip_tables x_tables edac_core kvm irqbypass joydev input_leds shpchp i2c_piix4 serio_raw pvpanic 8250_fintek mac_hid ib_iser rdma_cm iw_cm ib_cm sunrpc ib_sa ib_mad ib_core ib_addr autofs4 btrfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iscsi_ibft iscsi_boot_sysfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi nvme pata_acpi floppy [last unloaded: kvm_intel]
[ 829.954484] CPU: 9 PID: 6108 Comm: qemu-system-x86 Not tainted 4.4.0-255-generic #289-Ubuntu
[ 829.956393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.6.6 08/22/2023
[ 829.958270] task: ffff881db8e8d600 ti: ffff881db8174000 task.ti: ffff881db8174000
[ 829.960011] RIP: 0010:[<ffffffffc04ce54f>] [<ffffffffc04ce54f>] vmx_vcpu_run+0xff/0x5d0 [kvm_intel]
[ 829.962153] RSP: 0018:ffff881db8177ce8 EFLAGS: 00010046
[ 829.963397] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 829.965051] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[ 829.966690] RBP: ffff881db8177d40 R08: 0000000000000000 R09: 0000000000000000
[ 829.968338] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 829.970020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 829.971655] FS: 00007f680a6f9700(0000) GS:ffff881dc7440000(0000) knlGS:0000000000000000
[ 829.973517] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 829.974825] CR2: 00000000000049d8 CR3: 0000001dba3ba000 CR4: 0000000000362670
[ 829.976482] Stack:
[ 829.976959] 0000000000000000 ffff881db8288000 0000000000000000 ffff881db8288000
[ 829.978801] 00000000b8177d28 cd7740a309bdf11f ffff881db8288000 0000000000000000
[ 829.980616] 0000000000000000 000000c0dcf13e97 00000000001e7230 ffff881db8177db8
[ 829.982445] Call Trace:
[ 829.983128] [<ffffffffc0420442>] vcpu_enter_guest+0x782/0x11e0 [kvm]
[ 829.984622] [<ffffffff810934d1>] ? __set_task_blocked+0x41/0xa0
[ 829.986038] [<ffffffffc0427186>] kvm_arch_vcpu_ioctl_run+0xe6/0x410 [kvm]
[ 829.987626] [<ffffffffc040c10d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
[ 829.989028] [<ffffffff8110b5b8>] ? do_futex+0x108/0x480
[ 829.990288] [<ffffffff8123109f>] do_vfs_ioctl+0x2af/0x4b0
[ 829.991567] [<ffffffff81231319>] SyS_ioctl+0x79/0x90
[ 829.992738] [<ffffffff8186d105>] ? clear_bhb_loop+0x15/0x70
[ 829.994052] [<ffffffff8186a108>] entry_SYSCALL_64_fastpath+0x22/0xc3
[ 829.995531] Code: 48 8b b9 c8 44 00 00 e8 80 e2 ff ff 48 8b 4d b0 0f 1f 44 00 00 e9 33 03 00 00 0f 1f 44 00 00 e9 01 01 00 00 31 db e8 a1 eb 39 c1 <80> b9 d8 49 00 00 00 0f 85 f9 03 00 00 48 8b 7d c0 48 89 4d b0
[ 830.001696] RIP [<ffffffffc04ce54f>] vmx_vcpu_run+0xff/0x5d0 [kvm_intel]
[ 830.003298] RSP <ffff881db8177ce8>
[ 830.004131] CR2: 00000000000049d8
[ 830.006429] ---[ end trace a1cf19df2cdd1499 ]---

A manual test shows with 4.4.0-254-generic works fine with VM.DenseIO2.8

Therefore this seems to be a regression to me.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

The same "BUG: unable to handle kernel paging request" issue can be triggered by the "kvm" stressor in ubuntu_stress_smoke_test on these instances as well.

description: updated
Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu):
status: New → Invalid
Revision history for this message
Juerg Haefliger (juergh) wrote :

Confirmed: VMexit BHI mitigation in Xenial is bad.

Changed in linux (Ubuntu Xenial):
status: New → Confirmed
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

The respined Xenial kernel 4.4.0-256.290 works fine with this test.

Changed in linux (Ubuntu Xenial):
status: Confirmed → Fix Released
Changed in ubuntu-kernel-tests:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.