Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed

Bug #1661386 reported by Matwey V. Kornilov
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

Hello,

I see the following when try to run qemu from master as the following:

# ./x86_64-softmmu/qemu-system-x86_64 --version
QEMU emulator version 2.8.50 (v2.8.0-1006-g4e9f524)
Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
# ./x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -nodefaults
-no-reboot -nographic -cpu host -vga none -kernel .build.kernel.kvm
-initrd .build.initrd.kvm -append 'panic=1 no-kvmclock console=ttyS0
loglevel=7' -m 1024 -serial stdio
qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:1849:
kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.

First broken commit has been bisected:

commit 48e1a45c3166d659f781171a47dabf4a187ed7a5
Author: Paolo Bonzini <email address hidden>
Date: Wed Mar 30 22:55:29 2016 +0200

    target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs

    This would have caught the bug in the previous patch.

    Signed-off-by: Paolo Bonzini <email address hidden>

My cpuinfo is the following:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
stepping : 2
microcode : 0x14
cpu MHz : 3066.775
cache size : 12288 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat epb dtherm tpr_shadow vnmi ept vpid
bugs :
bogomips : 6133.55
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Hi Matwey,
  That shouldn't happen! The patch you've bisected to is just the one that complains if the ioctl fails rather than silently ignoring the failure - it means the failure probably previously existed and was ignored and that causes random other problems.

What kernel are you using on the host?

We need to figure out which MSR it's objecting to; probably the easiest way is to :

1) Edit mvm_msr_entry_add in target/i386/kvm.c to something like:

    assert((void *)(entry + 1) <= limit);
    fprintf(stderr,"kvm_msr_entry_add: @%d index=%x value=%lx\n", msrs->nmsrs, index, value);
    entry->index = index;

2) edit kvm_put_msrs near the bottom:

    fprintf(stderr,"kvm_put_msrs: ret=%d expected=%d\n", ret, cpu->kvm_msr_buf->nmsrs);
    assert(ret == cpu->kvm_msr_buf->nmsrs);

Now with any luck the 'ret' value will tell you the entry which is bad, and you can match
that to the @%d value (or maybe it's the entry before that one which failed?) then we get the index, look it up in the intel docs and figure out which MSR it's complaining about.

Dave

Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote : Re: [Bug 1661386] Re: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed
Download full text (8.3 KiB)

Hello,

The output is the following:

kvm_msr_entry_add: @0 index=174 value=0
kvm_msr_entry_add: @1 index=175 value=0
kvm_msr_entry_add: @2 index=176 value=0
kvm_msr_entry_add: @3 index=277 value=7040600070406
kvm_msr_entry_add: @4 index=c0000081 value=0
kvm_msr_entry_add: @5 index=c0010117 value=0
kvm_msr_entry_add: @6 index=3b value=0
kvm_msr_entry_add: @7 index=1a0 value=1
kvm_msr_entry_add: @8 index=c0000083 value=0
kvm_msr_entry_add: @9 index=c0000102 value=0
kvm_msr_entry_add: @10 index=c0000084 value=0
kvm_msr_entry_add: @11 index=c0000082 value=0
kvm_msr_entry_add: @12 index=10 value=0
kvm_msr_entry_add: @13 index=12 value=0
kvm_msr_entry_add: @14 index=11 value=0
kvm_msr_entry_add: @15 index=4b564d02 value=0
kvm_msr_entry_add: @16 index=4b564d04 value=0
kvm_msr_entry_add: @17 index=4b564d03 value=0
kvm_msr_entry_add: @18 index=38d value=0
kvm_msr_entry_add: @19 index=38f value=0
kvm_msr_entry_add: @20 index=309 value=0
kvm_msr_entry_add: @21 index=30a value=0
kvm_msr_entry_add: @22 index=30b value=0
kvm_msr_entry_add: @23 index=c1 value=0
kvm_msr_entry_add: @24 index=186 value=0
kvm_msr_entry_add: @25 index=c2 value=0
kvm_msr_entry_add: @26 index=187 value=0
kvm_msr_entry_add: @27 index=c3 value=0
kvm_msr_entry_add: @28 index=188 value=0
kvm_msr_entry_add: @29 index=c4 value=0
kvm_msr_entry_add: @30 index=189 value=0
kvm_msr_entry_add: @31 index=38e value=0
kvm_msr_entry_add: @32 index=390 value=0
kvm_msr_entry_add: @33 index=38d value=0
kvm_msr_entry_add: @34 index=38f value=0
kvm_msr_entry_add: @35 index=2ff value=0
kvm_msr_entry_add: @36 index=250 value=0
kvm_msr_entry_add: @37 index=258 value=0
kvm_msr_entry_add: @38 index=259 value=0
kvm_msr_entry_add: @39 index=268 value=0
kvm_msr_entry_add: @40 index=269 value=0
kvm_msr_entry_add: @41 index=26a value=0
kvm_msr_entry_add: @42 index=26b value=0
kvm_msr_entry_add: @43 index=26c value=0
kvm_msr_entry_add: @44 index=26d value=0
kvm_msr_entry_add: @45 index=26e value=0
kvm_msr_entry_add: @46 index=26f value=0
kvm_msr_entry_add: @47 index=200 value=0
kvm_msr_entry_add: @48 index=201 value=0
kvm_msr_entry_add: @49 index=202 value=0
kvm_msr_entry_add: @50 index=203 value=0
kvm_msr_entry_add: @51 index=204 value=0
kvm_msr_entry_add: @52 index=205 value=0
kvm_msr_entry_add: @53 index=206 value=0
kvm_msr_entry_add: @54 index=207 value=0
kvm_msr_entry_add: @55 index=208 value=0
kvm_msr_entry_add: @56 index=209 value=0
kvm_msr_entry_add: @57 index=20a value=0
kvm_msr_entry_add: @58 index=20b value=0
kvm_msr_entry_add: @59 index=20c value=0
kvm_msr_entry_add: @60 index=20d value=0
kvm_msr_entry_add: @61 index=20e value=0
kvm_msr_entry_add: @62 index=20f value=0
kvm_msr_entry_add: @63 index=17a value=0
kvm_msr_entry_add: @64 index=17b value=ffffffffffffffff
kvm_msr_entry_add: @65 index=400 value=ffffffffffffffff
kvm_msr_entry_add: @66 index=401 value=0
kvm_msr_entry_add: @67 index=402 value=0
kvm_msr_entry_add: @68 index=403 value=0
kvm_msr_entry_add: @69 index=404 value=ffffffffffffffff
kvm_msr_entry_add: @70 index=405 value=0
kvm_msr_entry_add: @71 index=406 value=0
kvm_msr_entry_add: @72 index=407 value=0
kvm_msr_entry_add: @73 index=408 value=ffffffffffffffff
kvm_msr_entry_add: @74 index=...

Read more...

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Hi,
  OK, lets see:

 kvm_put_msrs: ret=18 expected=105

so I think it's one of the MSRs around 18 that it's upset at:

kvm_msr_entry_add: @17 index=4b564d03 value=0

  41:#define MSR_KVM_STEAL_TIME 0x4b564d03

kvm_msr_entry_add: @18 index=38d value=0

     #define MSR_CORE_PERF_FIXED_CTR_CTRL 0x38d

So my guess is it's the steal time thing.

1) You didn't say what kernel your host was running - please tell me
  I think that steal time thing went into the kernel ~3.0
2) try starting qemu with -cpu host,-kvm_steal_time and/or -cpu host,-perfctr_core
3) If those don't work, in kvm_put_msrs try hacking out the lines:

          if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_STEAL_TIME)) {
            kvm_msr_entry_add(cpu, MSR_KVM_STEAL_TIME, env->steal_time_msr);
        }

and turning the :

        if (has_msr_architectural_pmu) {

into if (0) {

Dave

Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Download full text (11.8 KiB)

2017-02-03 21:34 GMT+03:00 Dr. David Alan Gilbert <email address hidden>:
> Hi,
> OK, lets see:
>
> kvm_put_msrs: ret=18 expected=105
>
> so I think it's one of the MSRs around 18 that it's upset at:
>
> kvm_msr_entry_add: @17 index=4b564d03 value=0
>
> 41:#define MSR_KVM_STEAL_TIME 0x4b564d03
>
> kvm_msr_entry_add: @18 index=38d value=0
>
> #define MSR_CORE_PERF_FIXED_CTR_CTRL 0x38d
>
> So my guess is it's the steal time thing.
>
> 1) You didn't say what kernel your host was running - please tell me
> I think that steal time thing went into the kernel ~3.0

Sorry, I've missed. I tested both 3.16 and 4.1.

> 2) try starting qemu with -cpu host,-kvm_steal_time and/or -cpu host,-perfctr_core

Nothing of this helps.

> 3) If those don't work, in kvm_put_msrs try hacking out the lines:
>
> if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_STEAL_TIME)) {
> kvm_msr_entry_add(cpu, MSR_KVM_STEAL_TIME, env->steal_time_msr);
> }
>
> and turning the :
>
> if (has_msr_architectural_pmu) {
>
> into if (0) {
>

This also doesn't helps. But It seems to be failed in other line now.

kvm_msr_entry_add: @0 index=174 value=0
kvm_msr_entry_add: @1 index=175 value=0
kvm_msr_entry_add: @2 index=176 value=0
kvm_msr_entry_add: @3 index=277 value=7040600070406
kvm_msr_entry_add: @4 index=c0000081 value=0
kvm_msr_entry_add: @5 index=c0010117 value=0
kvm_msr_entry_add: @6 index=3b value=0
kvm_msr_entry_add: @7 index=1a0 value=1
kvm_msr_entry_add: @8 index=c0000083 value=0
kvm_msr_entry_add: @9 index=c0000102 value=0
kvm_msr_entry_add: @10 index=c0000084 value=0
kvm_msr_entry_add: @11 index=c0000082 value=0
kvm_msr_entry_add: @12 index=10 value=0
kvm_msr_entry_add: @13 index=12 value=0
kvm_msr_entry_add: @14 index=11 value=0
kvm_msr_entry_add: @15 index=4b564d02 value=0
kvm_msr_entry_add: @16 index=4b564d04 value=0
kvm_msr_entry_add: @17 index=2ff value=0
kvm_msr_entry_add: @18 index=250 value=0
kvm_msr_entry_add: @19 index=258 value=0
kvm_msr_entry_add: @20 index=259 value=0
kvm_msr_entry_add: @21 index=268 value=0
kvm_msr_entry_add: @22 index=269 value=0
kvm_msr_entry_add: @23 index=26a value=0
kvm_msr_entry_add: @24 index=26b value=0
kvm_msr_entry_add: @25 index=26c value=0
kvm_msr_entry_add: @26 index=26d value=0
kvm_msr_entry_add: @27 index=26e value=0
kvm_msr_entry_add: @28 index=26f value=0
kvm_msr_entry_add: @29 index=200 value=0
kvm_msr_entry_add: @30 index=201 value=0
kvm_msr_entry_add: @31 index=202 value=0
kvm_msr_entry_add: @32 index=203 value=0
kvm_msr_entry_add: @33 index=204 value=0
kvm_msr_entry_add: @34 index=205 value=0
kvm_msr_entry_add: @35 index=206 value=0
kvm_msr_entry_add: @36 index=207 value=0
kvm_msr_entry_add: @37 index=208 value=0
kvm_msr_entry_add: @38 index=209 value=0
kvm_msr_entry_add: @39 index=20a value=0
kvm_msr_entry_add: @40 index=20b value=0
kvm_msr_entry_add: @41 index=20c value=0
kvm_msr_entry_add: @42 index=20d value=0
kvm_msr_entry_add: @43 index=20e value=0
kvm_msr_entry_add: @44 index=20f value=0
kvm_msr_entry_add: @45 index=17a value=0
kvm_msr_entry_add: @46 index=17b value=ffffffffffffffff
kvm_msr_entry_add: @47 index=400 value=ffffffffffffffff
kvm_msr_entry_add: @48 i...

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Ah well that is a bit better; you see now it's failing in kvm_**get**_msrs rather
than put; so the question is which of the two changes made it survive kvm_put_msrs

I'd hoped that the flags in (2) would have turned off the CPU flag and thus made it go in both of them.

kvm_msr_entry_add: @103 index=20f value=0
qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:2218:
kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.

1) Was it the steal time or the pmu change that made it flip over to the get_msrs?
2) Can you get it to flip over to the get_msrs with the flag rather than the code change?

Dave

Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Download full text (3.3 KiB)

2017-02-03 22:51 GMT+03:00 Dr. David Alan Gilbert <email address hidden>:
> Ah well that is a bit better; you see now it's failing in kvm_**get**_msrs rather
> than put; so the question is which of the two changes made it survive kvm_put_msrs
>
> I'd hoped that the flags in (2) would have turned off the CPU flag and
> thus made it go in both of them.
>
> kvm_msr_entry_add: @103 index=20f value=0
> qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:2218:
> kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>
> 1) Was it the steal time or the pmu change that made it flip over to the get_msrs?

It was has_msr_architectural_pmu.

> 2) Can you get it to flip over to the get_msrs with the flag rather than the code change?

Only using code change.

>
> Dave
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1661386
>
> Title:
> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed
>
> Status in QEMU:
> New
>
> Bug description:
> Hello,
>
>
> I see the following when try to run qemu from master as the following:
>
> # ./x86_64-softmmu/qemu-system-x86_64 --version
> QEMU emulator version 2.8.50 (v2.8.0-1006-g4e9f524)
> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
> # ./x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -nodefaults
> -no-reboot -nographic -cpu host -vga none -kernel .build.kernel.kvm
> -initrd .build.initrd.kvm -append 'panic=1 no-kvmclock console=ttyS0
> loglevel=7' -m 1024 -serial stdio
> qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:1849:
> kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>
> First broken commit has been bisected:
>
> commit 48e1a45c3166d659f781171a47dabf4a187ed7a5
> Author: Paolo Bonzini <email address hidden>
> Date: Wed Mar 30 22:55:29 2016 +0200
>
> target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs
>
> This would have caught the bug in the previous patch.
>
> Signed-off-by: Paolo Bonzini <email address hidden>
>
> My cpuinfo is the following:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 44
> model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
> stepping : 2
> microcode : 0x14
> cpu MHz : 3066.775
> cache size : 12288 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 11
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat epb dtherm tpr_shadow vnmi ept vpid
> bugs :
> bogomips : 6133.55
> clflush size : 64
> cache_alignment : 64
> address sizes : 40 bits physical, 48 bits virtual
> power manag...

Read more...

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Hi Matwey,
  1) Can you provide me with the output of the 'dmesg' command straight after boot on your host.
  2) If you look in target/i386/kvm.c in kvm_arch_init_vcpu around line 871 is some code like:

        if ((ver & 0xff) > 0) {
            has_msr_architectural_pmu = true;
            num_architectural_pmu_counters = (ver & 0xff00) >> 8;

            /* Shouldn't be more than 32, since that's the number of bits
             * available in EBX to tell us _which_ counters are available.
             * Play it safe.
             */
            if (num_architectural_pmu_counters > MAX_GP_COUNTERS) {
                num_architectural_pmu_counters = MAX_GP_COUNTERS;
            }

    change the start of that to :
    fprintf(stderr, "kvm_arch_init_vcpu ver=%x\n", ver);
    if (0) {

    I think that might make it work, but please tell us what it prints as ver=

Dave

Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Download full text (3.7 KiB)

2017-02-06 13:02 GMT+03:00 Dr. David Alan Gilbert <email address hidden>:
> Hi Matwey,
> 1) Can you provide me with the output of the 'dmesg' command straight after boot on your host.

I've attached dmesg. I had to do this from beginning.

> 2) If you look in target/i386/kvm.c in kvm_arch_init_vcpu around line 871 is some code like:

kvm_arch_init_vcpu ver=7300402

Indeed, the guest kernel started.

>
> if ((ver & 0xff) > 0) {
> has_msr_architectural_pmu = true;
> num_architectural_pmu_counters = (ver & 0xff00) >> 8;
>
> /* Shouldn't be more than 32, since that's the number of bits
> * available in EBX to tell us _which_ counters are available.
> * Play it safe.
> */
> if (num_architectural_pmu_counters > MAX_GP_COUNTERS) {
> num_architectural_pmu_counters = MAX_GP_COUNTERS;
> }
>
> change the start of that to :
> fprintf(stderr, "kvm_arch_init_vcpu ver=%x\n", ver);
> if (0) {
>
> I think that might make it work, but please tell us what it prints
> as ver=
>
> Dave
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1661386
>
> Title:
> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed
>
> Status in QEMU:
> New
>
> Bug description:
> Hello,
>
>
> I see the following when try to run qemu from master as the following:
>
> # ./x86_64-softmmu/qemu-system-x86_64 --version
> QEMU emulator version 2.8.50 (v2.8.0-1006-g4e9f524)
> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
> # ./x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -nodefaults
> -no-reboot -nographic -cpu host -vga none -kernel .build.kernel.kvm
> -initrd .build.initrd.kvm -append 'panic=1 no-kvmclock console=ttyS0
> loglevel=7' -m 1024 -serial stdio
> qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:1849:
> kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>
> First broken commit has been bisected:
>
> commit 48e1a45c3166d659f781171a47dabf4a187ed7a5
> Author: Paolo Bonzini <email address hidden>
> Date: Wed Mar 30 22:55:29 2016 +0200
>
> target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs
>
> This would have caught the bug in the previous patch.
>
> Signed-off-by: Paolo Bonzini <email address hidden>
>
> My cpuinfo is the following:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 44
> model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
> stepping : 2
> microcode : 0x14
> cpu MHz : 3066.775
> cache size : 12288 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 11
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nop...

Read more...

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Ahha!
[ 0.000000] DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[ 0.000000] Hypervisor detected: VMware

So you didn't mention this was running inside VMWare; it looks to me as if that's rejecting the PMU MSR accesses.
For reference which version of VMWare are you using?

My colleague suggested that '-cpu host,pmu=off' might work instead of having to hack around with the source.

Revision history for this message
Paolo Bonzini (bonzini) wrote :

Seems like a VMware bug.

Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Download full text (3.3 KiB)

2017-02-06 20:11 GMT+03:00 Dr. David Alan Gilbert <email address hidden>:
> Ahha!
> [ 0.000000] DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
> [ 0.000000] Hypervisor detected: VMware
>
> So you didn't mention this was running inside VMWare; it looks to me as if that's rejecting the PMU MSR accesses.
> For reference which version of VMWare are you using?

ESXi 6.0.0 Build 2494585

I also find that enabling perf counters in VMWare configuration also helps.
But why did it just work before 48e1a45c3166 with perf counters disabled?

>
> My colleague suggested that '-cpu host,pmu=off' might work instead of
> having to hack around with the source.

Indeed, this also helps.

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1661386
>
> Title:
> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed
>
> Status in QEMU:
> New
>
> Bug description:
> Hello,
>
>
> I see the following when try to run qemu from master as the following:
>
> # ./x86_64-softmmu/qemu-system-x86_64 --version
> QEMU emulator version 2.8.50 (v2.8.0-1006-g4e9f524)
> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
> # ./x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -nodefaults
> -no-reboot -nographic -cpu host -vga none -kernel .build.kernel.kvm
> -initrd .build.initrd.kvm -append 'panic=1 no-kvmclock console=ttyS0
> loglevel=7' -m 1024 -serial stdio
> qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:1849:
> kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>
> First broken commit has been bisected:
>
> commit 48e1a45c3166d659f781171a47dabf4a187ed7a5
> Author: Paolo Bonzini <email address hidden>
> Date: Wed Mar 30 22:55:29 2016 +0200
>
> target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs
>
> This would have caught the bug in the previous patch.
>
> Signed-off-by: Paolo Bonzini <email address hidden>
>
> My cpuinfo is the following:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 44
> model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
> stepping : 2
> microcode : 0x14
> cpu MHz : 3066.775
> cache size : 12288 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 11
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat epb dtherm tpr_shadow vnmi ept vpid
> bugs :
> bogomips : 6133.55
> clflush size : 64
> cache_alignment : 64
> address sizes : 40 bits physical, 48 bits virtual
> power management:
>
> To manage notifications about this bug...

Read more...

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

>> So you didn't mention this was running inside VMWare; it looks to me as if that's rejecting the PMU MSR accesses.
>> For reference which version of VMWare are you using?

>ESXi 6.0.0 Build 2494585

>I also find that enabling perf counters in VMWare configuration also helps.

OK, so that suggests the problem is that with PMU disabled in VMWare config, it's not giving the right info to the guest to know it's disabled.

>But why did it just work before 48e1a45c3166 with perf counters disabled?

Before that bug it ignored the failure to write/read the PMU MSRs - but also lost all the MSRs after the PMU access and we'd found that if we ever had that happen we'd get lots of weird bugs related to the other MSRs.

>>
>> My colleague suggested that '-cpu host,pmu=off' might work instead of
>> having to hack around with the source.

> Indeed, this also helps.

Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Download full text (3.6 KiB)

2017-02-06 21:05 GMT+03:00 Dr. David Alan Gilbert <email address hidden>:
>>> So you didn't mention this was running inside VMWare; it looks to me as if that's rejecting the PMU MSR accesses.
>>> For reference which version of VMWare are you using?
>
>>ESXi 6.0.0 Build 2494585
>
>>I also find that enabling perf counters in VMWare configuration also
> helps.
>
> OK, so that suggests the problem is that with PMU disabled in VMWare
> config, it's not giving the right info to the guest to know it's
> disabled.

How should it provide info? Can we check it?

>
>>But why did it just work before 48e1a45c3166 with perf counters
> disabled?
>
> Before that bug it ignored the failure to write/read the PMU MSRs - but
> also lost all the MSRs after the PMU access and we'd found that if we
> ever had that happen we'd get lots of weird bugs related to the other
> MSRs.
>
>>>
>>> My colleague suggested that '-cpu host,pmu=off' might work instead of
>>> having to hack around with the source.
>
>> Indeed, this also helps.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1661386
>
> Title:
> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed
>
> Status in QEMU:
> New
>
> Bug description:
> Hello,
>
>
> I see the following when try to run qemu from master as the following:
>
> # ./x86_64-softmmu/qemu-system-x86_64 --version
> QEMU emulator version 2.8.50 (v2.8.0-1006-g4e9f524)
> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
> # ./x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -nodefaults
> -no-reboot -nographic -cpu host -vga none -kernel .build.kernel.kvm
> -initrd .build.initrd.kvm -append 'panic=1 no-kvmclock console=ttyS0
> loglevel=7' -m 1024 -serial stdio
> qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:1849:
> kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>
> First broken commit has been bisected:
>
> commit 48e1a45c3166d659f781171a47dabf4a187ed7a5
> Author: Paolo Bonzini <email address hidden>
> Date: Wed Mar 30 22:55:29 2016 +0200
>
> target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs
>
> This would have caught the bug in the previous patch.
>
> Signed-off-by: Paolo Bonzini <email address hidden>
>
> My cpuinfo is the following:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 44
> model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
> stepping : 2
> microcode : 0x14
> cpu MHz : 3066.775
> cache size : 12288 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 11
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes ...

Read more...

Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Download full text (4.2 KiB)

2017-02-06 22:38 GMT+03:00 Matwey V. Kornilov <email address hidden>:
> 2017-02-06 21:05 GMT+03:00 Dr. David Alan Gilbert <email address hidden>:
>>>> So you didn't mention this was running inside VMWare; it looks to me as if that's rejecting the PMU MSR accesses.
>>>> For reference which version of VMWare are you using?
>>
>>>ESXi 6.0.0 Build 2494585
>>
>>>I also find that enabling perf counters in VMWare configuration also
>> helps.
>>
>> OK, so that suggests the problem is that with PMU disabled in VMWare
>> config, it's not giving the right info to the guest to know it's
>> disabled.
>
> How should it provide info? Can we check it?
>

Hi,

I've found the following doc:

https://software.intel.com/sites/default/files/m/5/2/c/f/1/30320-Nehalem-PMU-Programming-Guide-Core.pdf

I am not sure how up-to-date it is. Does qemu follow recommendations
from secton 4.3?
I use msr-tools package and rdmsr tool to check some registers from table 26...
IA32_MISC_ENABLE = 0
IA32_PERF_CAPABILITIES = 0
in my case.

>>
>>>But why did it just work before 48e1a45c3166 with perf counters
>> disabled?
>>
>> Before that bug it ignored the failure to write/read the PMU MSRs - but
>> also lost all the MSRs after the PMU access and we'd found that if we
>> ever had that happen we'd get lots of weird bugs related to the other
>> MSRs.
>>
>>>>
>>>> My colleague suggested that '-cpu host,pmu=off' might work instead of
>>>> having to hack around with the source.
>>
>>> Indeed, this also helps.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1661386
>>
>> Title:
>> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed
>>
>> Status in QEMU:
>> New
>>
>> Bug description:
>> Hello,
>>
>>
>> I see the following when try to run qemu from master as the following:
>>
>> # ./x86_64-softmmu/qemu-system-x86_64 --version
>> QEMU emulator version 2.8.50 (v2.8.0-1006-g4e9f524)
>> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
>> # ./x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -nodefaults
>> -no-reboot -nographic -cpu host -vga none -kernel .build.kernel.kvm
>> -initrd .build.initrd.kvm -append 'panic=1 no-kvmclock console=ttyS0
>> loglevel=7' -m 1024 -serial stdio
>> qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:1849:
>> kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>>
>> First broken commit has been bisected:
>>
>> commit 48e1a45c3166d659f781171a47dabf4a187ed7a5
>> Author: Paolo Bonzini <email address hidden>
>> Date: Wed Mar 30 22:55:29 2016 +0200
>>
>> target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs
>>
>> This would have caught the bug in the previous patch.
>>
>> Signed-off-by: Paolo Bonzini <email address hidden>
>>
>> My cpuinfo is the following:
>>
>> processor : 0
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 44
>> model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
>> stepping : 2
>> microcode : 0x14
>> cpu MHz : 3066.775
>> cache size : 12288 KB
>> physica...

Read more...

Revision history for this message
Paolo Bonzini (bonzini) wrote :

> Does qemu follow recommendations from section 4.3?

All that QEMU does is initialize MSR values and QEMU is talking to KVM, not to the processor; KVM in turn talks to the host kernel's perf subsystem.

It's the host kernel's perf subsystem that needs to follow Intel's recommendation. In particular, QEMU is setting CPUID to the values retrieved by

    perf_get_x86_pmu_capability(&cap);

so perhaps it's perf_get_x86_pmu_capability that misreads the performance monitoring capabilities provided by ESX. Please attach dmesg logs from starting the host with loglevel=9, as well as "x86info -a" output from the host, to see if perf misses some problematic CPUID/MSR combination.

Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :

Hi,

I've attached the files with logs you requested. Could you comment them somehow?

x86info says that IA32_PERF is not enabled:

Performance MSRs:
  MSR_IA32_PERF_STATUS: 0x0
  MSR_IA32_MISC_ENABLE: 0x0 [Enabled: ]

Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Download full text (3.5 KiB)

2017-02-08 11:49 GMT+03:00 Paolo Bonzini <email address hidden>:
>> Does qemu follow recommendations from section 4.3?
>
> All that QEMU does is initialize MSR values and QEMU is talking to KVM,
> not to the processor; KVM in turn talks to the host kernel's perf
> subsystem.
>
> It's the host kernel's perf subsystem that needs to follow Intel's
> recommendation. In particular, QEMU is setting CPUID to the values
> retrieved by
>
> perf_get_x86_pmu_capability(&cap);

I can not find this function mentioned in qemu master sources.

The only thing I see is that has_msr_architectural_pmu is set to be
true in kvm_arch_init_vcpu() if 0xA EAX has non-zero version. This is
not enough according to the Intel specs.

>
> so perhaps it's perf_get_x86_pmu_capability that misreads the
> performance monitoring capabilities provided by ESX. Please attach
> dmesg logs from starting the host with loglevel=9, as well as "x86info
> -a" output from the host, to see if perf misses some problematic
> CPUID/MSR combination.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1661386
>
> Title:
> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed
>
> Status in QEMU:
> New
>
> Bug description:
> Hello,
>
>
> I see the following when try to run qemu from master as the following:
>
> # ./x86_64-softmmu/qemu-system-x86_64 --version
> QEMU emulator version 2.8.50 (v2.8.0-1006-g4e9f524)
> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
> # ./x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -nodefaults
> -no-reboot -nographic -cpu host -vga none -kernel .build.kernel.kvm
> -initrd .build.initrd.kvm -append 'panic=1 no-kvmclock console=ttyS0
> loglevel=7' -m 1024 -serial stdio
> qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:1849:
> kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>
> First broken commit has been bisected:
>
> commit 48e1a45c3166d659f781171a47dabf4a187ed7a5
> Author: Paolo Bonzini <email address hidden>
> Date: Wed Mar 30 22:55:29 2016 +0200
>
> target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs
>
> This would have caught the bug in the previous patch.
>
> Signed-off-by: Paolo Bonzini <email address hidden>
>
> My cpuinfo is the following:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 44
> model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
> stepping : 2
> microcode : 0x14
> cpu MHz : 3066.775
> cache size : 12288 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 11
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes hype...

Read more...

Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :
Download full text (3.9 KiB)

2017-07-23 12:54 GMT+03:00 Matwey V. Kornilov <email address hidden>:
> 2017-02-08 11:49 GMT+03:00 Paolo Bonzini <email address hidden>:
>>> Does qemu follow recommendations from section 4.3?
>>
>> All that QEMU does is initialize MSR values and QEMU is talking to KVM,
>> not to the processor; KVM in turn talks to the host kernel's perf
>> subsystem.
>>
>> It's the host kernel's perf subsystem that needs to follow Intel's
>> recommendation. In particular, QEMU is setting CPUID to the values
>> retrieved by
>>
>> perf_get_x86_pmu_capability(&cap);
>
> I can not find this function mentioned in qemu master sources.
>

Ok, I found this place in kvm kernel module. But it doesn't do what
you expect it to do. It just reassembles 0xA EAX from previously
parsed data.
IA32_MISC_ENABLE is not accessed anywhere here.

> The only thing I see is that has_msr_architectural_pmu is set to be
> true in kvm_arch_init_vcpu() if 0xA EAX has non-zero version. This is
> not enough according to the Intel specs.
>
>>
>> so perhaps it's perf_get_x86_pmu_capability that misreads the
>> performance monitoring capabilities provided by ESX. Please attach
>> dmesg logs from starting the host with loglevel=9, as well as "x86info
>> -a" output from the host, to see if perf misses some problematic
>> CPUID/MSR combination.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1661386
>>
>> Title:
>> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed
>>
>> Status in QEMU:
>> New
>>
>> Bug description:
>> Hello,
>>
>>
>> I see the following when try to run qemu from master as the following:
>>
>> # ./x86_64-softmmu/qemu-system-x86_64 --version
>> QEMU emulator version 2.8.50 (v2.8.0-1006-g4e9f524)
>> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
>> # ./x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -nodefaults
>> -no-reboot -nographic -cpu host -vga none -kernel .build.kernel.kvm
>> -initrd .build.initrd.kvm -append 'panic=1 no-kvmclock console=ttyS0
>> loglevel=7' -m 1024 -serial stdio
>> qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:1849:
>> kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>>
>> First broken commit has been bisected:
>>
>> commit 48e1a45c3166d659f781171a47dabf4a187ed7a5
>> Author: Paolo Bonzini <email address hidden>
>> Date: Wed Mar 30 22:55:29 2016 +0200
>>
>> target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs
>>
>> This would have caught the bug in the previous patch.
>>
>> Signed-off-by: Paolo Bonzini <email address hidden>
>>
>> My cpuinfo is the following:
>>
>> processor : 0
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 44
>> model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
>> stepping : 2
>> microcode : 0x14
>> cpu MHz : 3066.775
>> cache size : 12288 KB
>> physical id : 0
>> siblings : 2
>> core id : 0
>> cpu cores : 2
>> apicid : 0
>> initial apicid : 0
>> fpu : yes
>> fpu_exce...

Read more...

Revision history for this message
Thomas Huth (th-huth) wrote :

There was a fix for this assertion message wrt PMU registers in December 2017 already:
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=0b368a10c71af96f6cf
Thus, can you still reproduce this issue with the latest version of QEMU, or is the problem gone now?

Changed in qemu:
status: New → Incomplete
Revision history for this message
Matwey V. Kornilov (matwey-kornilov) wrote :

Hi,

Thank you for your reply. I've checked that this commit fixed my issue.

вт, 11 февр. 2020 г. в 17:50, Thomas Huth <email address hidden>:
>
> There was a fix for this assertion message wrt PMU registers in December 2017 already:
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=0b368a10c71af96f6cf
> Thus, can you still reproduce this issue with the latest version of QEMU, or is the problem gone now?
>
> ** Changed in: qemu
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1661386
>
> Title:
> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed
>
> Status in QEMU:
> Incomplete
>
> Bug description:
> Hello,
>
>
> I see the following when try to run qemu from master as the following:
>
> # ./x86_64-softmmu/qemu-system-x86_64 --version
> QEMU emulator version 2.8.50 (v2.8.0-1006-g4e9f524)
> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
> # ./x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -nodefaults
> -no-reboot -nographic -cpu host -vga none -kernel .build.kernel.kvm
> -initrd .build.initrd.kvm -append 'panic=1 no-kvmclock console=ttyS0
> loglevel=7' -m 1024 -serial stdio
> qemu-system-x86_64: /home/matwey/lab/qemu/target/i386/kvm.c:1849:
> kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
>
> First broken commit has been bisected:
>
> commit 48e1a45c3166d659f781171a47dabf4a187ed7a5
> Author: Paolo Bonzini <email address hidden>
> Date: Wed Mar 30 22:55:29 2016 +0200
>
> target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs
>
> This would have caught the bug in the previous patch.
>
> Signed-off-by: Paolo Bonzini <email address hidden>
>
> My cpuinfo is the following:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 44
> model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
> stepping : 2
> microcode : 0x14
> cpu MHz : 3066.775
> cache size : 12288 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 11
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat epb dtherm tpr_shadow vnmi ept vpid
> bugs :
> bogomips : 6133.55
> clflush size : 64
> cache_alignment : 64
> address sizes : 40 bits physical, 48 bits virtual
> power management:
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1661386/+subscriptions

--
With best regards,
Matwey V. Kornilov

Thomas Huth (th-huth)
Changed in qemu:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.