[Feature] Memory Bandwidth Monitoring

Bug #1397880 reported by Yingying Zhao
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
intel
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Tim Gardner
Xenial
Fix Released
Undecided
Tim Gardner
xen (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned

Bug Description

Memory Bandwidth Monitoring (MBM) is a CPU feature included in the family of Platform QoS features. It is used to track memory bandwidth usage for a specific task, or group of tasks.
Memory Bandwidth Monitoring is an extension of the existing Cache QoS Monitoring (CQM) feature found in Haswell server. The mechanism used is the same, where tasks are associated with an Resource Monitoring ID (RMID), which the CPU uses to track the bandwidth usage.

Upstream status:
Kernel - 4.6
Xen - target 4.6

Tags: bdx vivid
description: updated
Revision history for this message
XiongZhang (xiong-y-zhang) wrote :

For Xen, merged into maitainer tree and be part of Xen 4.6 release

description: updated
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Per discussion between Canonical and Intel, retarget to 15.10.

description: updated
Revision history for this message
XiongZhang (xiong-y-zhang) wrote :

Xen:merged into maitainer tree and be part of Xen 4.6 release

Revision history for this message
XiongZhang (xiong-y-zhang) wrote :

For kernel part, patch pushed into community and under review:
https://lkml.org/lkml/2015/7/21/893

description: updated
description: updated
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

It looks like the kernel patch submitted upstream is still undergoing review. Additionally Xen is still at version 4.5 for Wily. Shall we retarget this feature to 16.04?

Revision history for this message
Keve Gabbert (keve-a-gabbert) wrote :

yes, development is targeting 4.4 kernel.

description: updated
description: updated
Revision history for this message
XiongZhang (xiong-y-zhang) wrote :

This is implemented in v4.6:
e7ee3e8 perf/x86/mbm: Add support for MBM counter overflow handling
2d4de83 perf/x86/mbm: Implement RMID recycling
87f01cc perf/x86/mbm: Add memory bandwidth monitoring event management
33c3cc7 perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init
ada2f63 perf/x86/cqm: Fix CQM memory leak and notifier leak
a223c1c perf/x86/cqm: Fix CQM handling of grouping events into a cache_group

Revision history for this message
XiongZhang (xiong-y-zhang) wrote :

Hi, Tim:
please back port this into 16.04 kernel, it is a very important feature for Intel server.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Xen is at version 4.6 for Xenial. Marking the xen task Fix Released.

https://launchpad.net/ubuntu/+source/xen/4.6.0-1ubuntu4

information type: Proprietary → Public
Changed in xen (Ubuntu):
status: New → Fix Released
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1397880

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → High
status: Incomplete → Triaged
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu):
assignee: nobody → Tim Gardner (timg-tpi)
importance: High → Undecided
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.9 KiB)

This bug was fixed in the package linux - 4.4.0-18.34

---------------
linux (4.4.0-18.34) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1566868

  * [i915_bpo] Fix RC6 on SKL GT3 & GT4 (LP: #1564759)
    - SAUCE: i915_bpo: drm/i915/skl: Fix rc6 based gpu/system hang
    - SAUCE: i915_bpo: drm/i915/skl: Fix spurious gpu hang with gt3/gt4 revs

  * CONFIG_ARCH_ROCKCHIP not enabled in armhf generic kernel (LP: #1566283)
    - [Config] CONFIG_ARCH_ROCKCHIP=y

  * [Feature] Memory Bandwidth Monitoring (LP: #1397880)
    - perf/x86/cqm: Fix CQM handling of grouping events into a cache_group
    - perf/x86/cqm: Fix CQM memory leak and notifier leak
    - x86/cpufeature: Carve out X86_FEATURE_*
    - Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
    - x86/topology: Create logical package id
    - perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init
    - perf/x86/mbm: Add memory bandwidth monitoring event management
    - perf/x86/mbm: Implement RMID recycling
    - perf/x86/mbm: Add support for MBM counter overflow handling

  * User namespace mount updates (LP: #1566505)
    - SAUCE: quota: Require that qids passed to dqget() be valid and map into s_user_ns
    - SAUCE: fs: Allow superblock owner to change ownership of inodes with unmappable ids
    - SAUCE: fuse: Don't initialize user_id or group_id in mount options
    - SAUCE: cgroup: Use a new super block when mounting in a cgroup namespace
    - SAUCE: fs: fix a posible leak of allocated superblock

  * [arm64] kernel BUG at /build/linux-StrpB2/linux-4.4.0/fs/ext4/inode.c:2394!
    (LP: #1566518)
    - arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings
    - arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission

  * [Feature]USB core and xHCI tasks for USB 3.1 SuperSpeedPlus (SSP) support
    for Alpine Ridge on SKL (LP: #1519623)
    - usb: define USB_SPEED_SUPER_PLUS speed for SuperSpeedPlus USB3.1 devices
    - usb: set USB 3.1 roothub device speed to USB_SPEED_SUPER_PLUS
    - usb: show speed "10000" in sysfs for USB 3.1 SuperSpeedPlus devices
    - usb: add device descriptor for usb 3.1 root hub
    - usb: Support USB 3.1 extended port status request
    - xhci: Make sure xhci handles USB_SPEED_SUPER_PLUS devices.
    - xhci: set roothub speed to USB_SPEED_SUPER_PLUS for USB3.1 capable controllers
    - xhci: USB 3.1 add default Speed Attributes to SuperSpeedPlus device capability
    - xhci: set slot context speed field to SuperSpeedPlus for USB 3.1 SSP devices
    - usb: Add USB3.1 SuperSpeedPlus Isoc Endpoint Companion descriptor
    - usb: Parse the new USB 3.1 SuperSpeedPlus Isoc endpoint companion descriptor
    - usb: Add USB 3.1 Precision time measurement capability descriptor support
    - xhci: refactor and cleanup endpoint initialization.
    - xhci: Add SuperSpeedPlus high bandwidth isoc support to xhci endpoints
    - xhci: cleanup isoc tranfers queuing code
    - xhci: Support extended burst isoc TRB structure used by xhci 1.1 for USB 3.1
    - SAUCE: (noup) usb: fix regression in SuperSpeed endpoint descriptor parsing

  * wrong/missing permissions for device f...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Upstream commit 1f12e32f4cd5243ae46d8b933181be0d022c6793(Commit 31c2013e4 in Xenial) has introduced a regression, reported in bug 1573231. The commit message for that commit has a bug link to this bug.

I'll ping upstream about the regression and request feedback.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote : [v4.6-rc1 Regression] x86/topology: Create logical package id

Hi Thomas,

A kernel bug report was opened against Ubuntu [0]. After a kernel
bisect, it was found that reverting the following commit resolved this bug:

commit 1f12e32f4cd5243ae46d8b933181be0d022c6793
Author: Thomas Gleixner <email address hidden>
Date: Mon Feb 22 22:19:15 2016 +0000

    x86/topology: Create logical package id

To build successfully with this commit reverted, I also had to revert
commits: e7ee3e8,2d4de83,87f01cc and 33c3cc7.

The regression was introduced as of v4.6-rc1.

I was hoping to get your feedback, since you are the patch author. Do
you think gathering any additional data will help diagnose this issue,
or would it be best to submit a revert request?

Thanks,

Joe

[0] http://pad.lv/1573231

Revision history for this message
tglx (tglx) wrote :

On Fri, 6 May 2016, Joseph Salisbury wrote:
> A kernel bug report was opened against Ubuntu [0]. After a kernel
> bisect, it was found that reverting the following commit resolved this bug:
>
> commit 1f12e32f4cd5243ae46d8b933181be0d022c6793
> Author: Thomas Gleixner <email address hidden>
> Date: Mon Feb 22 22:19:15 2016 +0000
>
> x86/topology: Create logical package id
>
> To build successfully with this commit reverted, I also had to revert
> commits: e7ee3e8,2d4de83,87f01cc and 33c3cc7.
>
> The regression was introduced as of v4.6-rc1.
>
> I was hoping to get your feedback, since you are the patch author. Do
> you think gathering any additional data will help diagnose this issue,
> or would it be best to submit a revert request?

Yuck. That dies with a divide error. And that looks like XEN is supplying crap
data in the CPUID.

Does the patch below cure the issue?

Thanks,

        tglx

8<---------------

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -332,6 +332,11 @@ static void __init smp_init_package_map(
   * primary cores.
   */
  ncpus = boot_cpu_data.x86_max_cores;
+ if (!ncpus) {
+ pr_warn("x86_max_cores == zero !?!?");
+ ncpus = 1;
+ }
+
  __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);

  /*

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

On 05/06/2016 02:48 PM, Thomas Gleixner wrote:
> On Fri, 6 May 2016, Joseph Salisbury wrote:
>> A kernel bug report was opened against Ubuntu [0]. After a kernel
>> bisect, it was found that reverting the following commit resolved this bug:
>>
>> commit 1f12e32f4cd5243ae46d8b933181be0d022c6793
>> Author: Thomas Gleixner <email address hidden>
>> Date: Mon Feb 22 22:19:15 2016 +0000
>>
>> x86/topology: Create logical package id
>>
>> To build successfully with this commit reverted, I also had to revert
>> commits: e7ee3e8,2d4de83,87f01cc and 33c3cc7.
>>
>> The regression was introduced as of v4.6-rc1.
>>
>> I was hoping to get your feedback, since you are the patch author. Do
>> you think gathering any additional data will help diagnose this issue,
>> or would it be best to submit a revert request?
> Yuck. That dies with a divide error. And that looks like XEN is supplying crap
> data in the CPUID.

Joe, do you have

ed6069b xen/apic: Provide Xen-specific version of cpu_present_to_apicid
APIC op

-boris

>
> Does the patch below cure the issue?
>
> Thanks,
>
> tglx
>
> 8<---------------
>
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -332,6 +332,11 @@ static void __init smp_init_package_map(
> * primary cores.
> */
> ncpus = boot_cpu_data.x86_max_cores;
> + if (!ncpus) {
> + pr_warn("x86_max_cores == zero !?!?");
> + ncpus = 1;
> + }
> +
> __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>
> /*

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

On 05/06/2016 03:13 PM, Boris Ostrovsky wrote:
> On 05/06/2016 02:48 PM, Thomas Gleixner wrote:
>> On Fri, 6 May 2016, Joseph Salisbury wrote:
>>> A kernel bug report was opened against Ubuntu [0]. After a kernel
>>> bisect, it was found that reverting the following commit resolved this bug:
>>>
>>> commit 1f12e32f4cd5243ae46d8b933181be0d022c6793
>>> Author: Thomas Gleixner <email address hidden>
>>> Date: Mon Feb 22 22:19:15 2016 +0000
>>>
>>> x86/topology: Create logical package id
>>>
>>> To build successfully with this commit reverted, I also had to revert
>>> commits: e7ee3e8,2d4de83,87f01cc and 33c3cc7.
>>>
>>> The regression was introduced as of v4.6-rc1.
>>>
>>> I was hoping to get your feedback, since you are the patch author. Do
>>> you think gathering any additional data will help diagnose this issue,
>>> or would it be best to submit a revert request?
>> Yuck. That dies with a divide error. And that looks like XEN is supplying crap
>> data in the CPUID.
> Joe, do you have
>
> ed6069b xen/apic: Provide Xen-specific version of cpu_present_to_apicid
> APIC op
>
> -boris
Yes the commit is in the 4.4 based Ubuntu kernel. This bug also happens
with the vanilla 4.6-rc5 kernel, which also has that commit.

>
>
>> Does the patch below cure the issue?
>>
>> Thanks,
>>
>> tglx
>>
>> 8<---------------
>>
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -332,6 +332,11 @@ static void __init smp_init_package_map(
>> * primary cores.
>> */
>> ncpus = boot_cpu_data.x86_max_cores;
>> + if (!ncpus) {
>> + pr_warn("x86_max_cores == zero !?!?");
>> + ncpus = 1;
>> + }
>> +
>> __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>>
>> /*
>

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

On 05/06/2016 02:48 PM, Thomas Gleixner wrote:
> On Fri, 6 May 2016, Joseph Salisbury wrote:
>> A kernel bug report was opened against Ubuntu [0]. After a kernel
>> bisect, it was found that reverting the following commit resolved this bug:
>>
>> commit 1f12e32f4cd5243ae46d8b933181be0d022c6793
>> Author: Thomas Gleixner <email address hidden>
>> Date: Mon Feb 22 22:19:15 2016 +0000
>>
>> x86/topology: Create logical package id
>>
>> To build successfully with this commit reverted, I also had to revert
>> commits: e7ee3e8,2d4de83,87f01cc and 33c3cc7.
>>
>> The regression was introduced as of v4.6-rc1.
>>
>> I was hoping to get your feedback, since you are the patch author. Do
>> you think gathering any additional data will help diagnose this issue,
>> or would it be best to submit a revert request?
> Yuck. That dies with a divide error. And that looks like XEN is supplying crap
> data in the CPUID.
>
> Does the patch below cure the issue?
>
> Thanks,
>
> tglx
>
> 8<---------------
>
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -332,6 +332,11 @@ static void __init smp_init_package_map(
> * primary cores.
> */
> ncpus = boot_cpu_data.x86_max_cores;
> + if (!ncpus) {
> + pr_warn("x86_max_cores == zero !?!?");
> + ncpus = 1;
> + }
> +
> __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>
> /*
I'll have this patch tested and report back.

Thanks,

Joe

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

On 05/06/2016 03:38 PM, Joseph Salisbury wrote:
> On 05/06/2016 03:13 PM, Boris Ostrovsky wrote:
>> On 05/06/2016 02:48 PM, Thomas Gleixner wrote:
>>>
>>> Yuck. That dies with a divide error. And that looks like XEN is supplying crap
>>> data in the CPUID.
>> Joe, do you have
>>
>> ed6069b xen/apic: Provide Xen-specific version of cpu_present_to_apicid
>> APIC op
>>
>> -boris
> Yes the commit is in the 4.4 based Ubuntu kernel. This bug also happens
> with the vanilla 4.6-rc5 kernel, which also has that commit.

Can you post guest's cpuid -1 -r ? (I guess after you verify Thomas' patch)

Thanks.
-boris

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

On 05/06/2016 03:38 PM, Joseph Salisbury wrote:
> On 05/06/2016 02:48 PM, Thomas Gleixner wrote:
>> On Fri, 6 May 2016, Joseph Salisbury wrote:
>>> A kernel bug report was opened against Ubuntu [0]. After a kernel
>>> bisect, it was found that reverting the following commit resolved this bug:
>>>
>>> commit 1f12e32f4cd5243ae46d8b933181be0d022c6793
>>> Author: Thomas Gleixner <email address hidden>
>>> Date: Mon Feb 22 22:19:15 2016 +0000
>>>
>>> x86/topology: Create logical package id
>>>
>>> To build successfully with this commit reverted, I also had to revert
>>> commits: e7ee3e8,2d4de83,87f01cc and 33c3cc7.
>>>
>>> The regression was introduced as of v4.6-rc1.
>>>
>>> I was hoping to get your feedback, since you are the patch author. Do
>>> you think gathering any additional data will help diagnose this issue,
>>> or would it be best to submit a revert request?
>> Yuck. That dies with a divide error. And that looks like XEN is supplying crap
>> data in the CPUID.
>>
>> Does the patch below cure the issue?
>>
>> Thanks,
>>
>> tglx
>>
>> 8<---------------
>>
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -332,6 +332,11 @@ static void __init smp_init_package_map(
>> * primary cores.
>> */
>> ncpus = boot_cpu_data.x86_max_cores;
>> + if (!ncpus) {
>> + pr_warn("x86_max_cores == zero !?!?");
>> + ncpus = 1;
>> + }
>> +
>> __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>>
>> /*
> I'll have this patch tested and report back.
>
> Thanks,
>
> Joe
Yes, your patch does in fact fix the bug. Would you like any additional
information regarding the bug?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

On 05/06/2016 04:46 PM, Boris Ostrovsky wrote:
> On 05/06/2016 03:38 PM, Joseph Salisbury wrote:
>> On 05/06/2016 03:13 PM, Boris Ostrovsky wrote:
>>> On 05/06/2016 02:48 PM, Thomas Gleixner wrote:
>>>> Yuck. That dies with a divide error. And that looks like XEN is supplying crap
>>>> data in the CPUID.
>>> Joe, do you have
>>>
>>> ed6069b xen/apic: Provide Xen-specific version of cpu_present_to_apicid
>>> APIC op
>>>
>>> -boris
>> Yes the commit is in the 4.4 based Ubuntu kernel. This bug also happens
>> with the vanilla 4.6-rc5 kernel, which also has that commit.
>
> Can you post guest's cpuid -1 -r ? (I guess after you verify Thomas' patch)
>
> Thanks.
> -boris
>
>
>
Thomas' patch does resolve the bug. The cpuid info can be seen here:
https://launchpadlibrarian.net/258234267/cpuid_full.txt

Thanks,

Joe

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

On 05/06/2016 04:51 PM, Joseph Salisbury wrote:
> Thomas' patch does resolve the bug. The cpuid info can be seen here:
> https://launchpadlibrarian.net/258234267/cpuid_full.txt

Any chance you could post it raw (cpuid -1 -r)?

Thanks.
-boris

Changed in intel:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.