[Regression] Nova's 'enabled_perf_events' feature will be broken with Linux Kernel 4.14+

Bug #1751073 reported by Kashyap Chamarthy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Kashyap Chamarthy

Bug Description

Upstream Linux kernel has removed[*] the 'perf cqm' (Cache
Quality-of-Service Monitoring) from the following kernels onwards:

    [linux]$> git tag --contains c39a0e2
    v4.14

Impact for OpenStack / Nova
---------------------------

Quoting the summary from Dan Berrangé from a downstream bug (with some
edits, references and formatting):

  - Libvirt supports enabling perf event reporting per guest using <perf
    ../> XML in guest XML
    https://libvirt.org/formatdomain.html#elementsPerf

  - OpenStack has abiity to enable this support by using
    /etc/nova/nova.conf setting "enabled_perf_events" in [libvirt]
    section

  - Although libvirt supports many events, OpenStack only supports the
    'cmt', 'mbmt' and 'mbml' perf events

  - Upstream Linux kernel decided the perf framework integration with
    'cmt', 'mbmt' and 'mbml' events was broken by design and entirely
    deleted it[*]

  - Upstream kernel has provided a new approach to 'cmt', 'mbmt' and
    'mbml' info reporting that is *not* using perf framework

  - There's unlikely to be any way for libvirt to make this
    functionality magically re-appear, given the kernel changes. The new
    approach is completely incompatible with what was done before.

IOW, if someone has set "enabled_perf_events" in /etc/nova/nova.conf
previously, they will be unable to start any guest, once they upgrade to
any kernels that has backported the commit: c39a0e2 ("x86/perf/cqm: Wipe
out perf based cqm")[*].

[*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c39a0e2

Tags: libvirt
tags: added: libvirt
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

A further comment from Dan (Berrangé) where he writes: for upstream
Nova, there's three possible directions:

   1. Extend Nova's support for perf events, so it can enable more than
      just the 'cmt', 'mbmt', 'mbml' features, to make it useful again.
      I'm unclear if there's any real benefit to this though - depends
      if there's any monitoring apps that actually care about collecting
      other perf data items

   2. Simply delete the perf events feature code from Nova entirely

   3. Change to support whatever new way of reporting cmt/mbmt/mbml info
      libvirt provides (if any)

   I'm leaning towards (2), but before doing that we should wait to see
   what, if anything, libvirt does WRT the new infrastructure for
   reporting cmt/mbmt/mbml information, so we can see if (3) is
   appropriate. It may take a while before this becomes clear.

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Changed in nova:
importance: Low → High
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

An update on the above point of "Although libvirt supports many events, OpenStack only supports the 'cmt', 'mbmt' and 'mbml' perf events" -- which is wrong.

After auditing the code, Nova supports more than just those three Intel Cache Monitoring Technology based events ('cmt', 'mbmt' and 'mbml'), as the `enabled_perf_events` config attributes takes a string list.

Details:

(Looking at Git/master; `git describe`: 17.0.0.0rc1-648-g8b081453c5)

In the nova/virt/libvirt/driver.py, we see:

[...]
PERF_EVENTS_CPU_FLAG_MAPPING = {'cmt': 'cmt',
                                'mbml': 'mbm_local',
                                'mbmt': 'mbm_total',
                               }
[...]

But when you look at the _supported_perf_event() method in libvirt/driver.py,

   4816 def _supported_perf_event(self, event, cpu_features):
   4817
   4818 libvirt_perf_event_name = LIBVIRT_PERF_EVENT_PREFIX + event.upper()
   4819
   4820 if not hasattr(libvirt, libvirt_perf_event_name):
   4821 LOG.warning("Libvirt doesn't support event type %s.", event)
   4822 return False
   4823
   4824 if (event in PERF_EVENTS_CPU_FLAG_MAPPING
   4825 and PERF_EVENTS_CPU_FLAG_MAPPING[event] not in cpu_features):
   4826 LOG.warning("Host does not support event type %s.", event)
   4827 return False
   4828
   4829 return True

We will skip the `in cpu_features` check (line 4825) if an event is not in the PERF_EVENTS_CPU_FLAG_MAPPING list.

So maybe we can't delete this feature from Nova wholesale. But we can question if anyone uses what's left.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/565242

Changed in nova:
assignee: nobody → Kashyap Chamarthy (kashyapc)
status: Confirmed → In Progress
Changed in nova:
assignee: Kashyap Chamarthy (kashyapc) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Kashyap Chamarthy (kashyapc)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/565242
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fc4794acc6b13afade1bb72a1ae9f574707d2f0d
Submitter: Zuul
Branch: master

commit fc4794acc6b13afade1bb72a1ae9f574707d2f0d
Author: Kashyap Chamarthy <email address hidden>
Date: Tue May 8 10:52:17 2018 +0200

    libvirt: Deprecate support for monitoring Intel CMT `perf` events

    Upstream Linux kernel has deleted[*] the `perf` framework integration
    with Intel CMT (Cache Monitoring Technology; or "CQM" in Linux kernel
    parlance), because the feature was broken by design -- an
    incompatibility between Linux's `perf` infrastructure and Intel CMT
    hardware support. It was removed in upstream kernel version v4.14; but
    bear in mind that downstream Linux distributions with lower kernel
    versions than 4.14 have backported the said change.

    Nova supports monitoring of the above mentioned Intel CMT events
    (namely: 'cmt', 'mbm_local', and 'mbm_total') via the configuration
    attribute `[libvirt]/enabled_perf_events`. Given that the underlying
    Linux kernel infrastructure for Intel CMT is removed, we should remove
    support for it in Nova too. Otherwise enabling them in Nova, and
    updating to a Linux kernel 4.14 (or above) will result in instances
    failing to boot.

    To that end, deprecate support for the three Intel CMT events in "Rocky"
    release, with the intention to remove support for it in the upcoming
    "Stein" release. Note that we cannot deprecate / remove
    `enabled_perf_events` config attribute altogether -- since there are
    other[+] `perf` events besides Intel CMT. Whether anyone is using those
    other events with Nova is a good question to which we don't have an
    equally good answer for, if at all.

    [*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c39a0e2
    [+] https://libvirt.org/formatdomain.html#elementsPerf

    Closes-Bug: #1751073
    Change-Id: I7e77f87650d966d605807c7be184e670259a81c1
    Signed-off-by: Kashyap Chamarthy <email address hidden>

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b2

This issue was fixed in the openstack/nova 18.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.