nova diagnostics command is not working with all interfaces

Bug #1821798 reported by François Palin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
François Palin
Queens
Fix Committed
Medium
François Palin
Rocky
Fix Released
Medium
François Palin
Stein
Fix Released
Medium
Lee Yarwood

Bug Description

When running nova diagnostics on instances with SR-IOV interfaces, we get:

$ nova diagnostics iperf-server
ERROR (ClientException): Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<type 'exceptions.IndexError'> (HTTP 500) (Request-ID: req-ae9445f6-558c-45c3-bdb2-b9fe6bbf186c)

Revision history for this message
François Palin (francois.palin) wrote :

The extract below was taken from the domain xml for an instance that failed diagnostics command
and shows that "target dev" is present under regular bridge interface, but is missing from vfio interface:

      <interface type='bridge'>
        <mac address='fa:16:3e:c1:e3:fe'/>
        <source bridge='qbr275587c1-50'/>
        <target dev='tap275587c1-50'/> <<<<<<< target dev only present under this interface
        <model type='virtio'/>
        <mtu size='9000'/>
        <alias name='net0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
      </interface>
      <interface type='hostdev' managed='yes'>
        <mac address='fa:16:3e:6f:4d:bc'/>
        <driver name='vfio'/>
        <source>
          <address type='pci' domain='0x0000' bus='0x04' slot='0x17' function='0x4'/>
          <origstates>
            <unbind/>
          </origstates>
        </source>
        <vlan>
          <tag id='0'/>
        </vlan>
        <alias name='hostdev0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
      </interface>

As a result, method _get_io_devices() only returns the bridge interface when called by get_instance_diagnostics() and
the following loop:
        for interface in dom_io["ifaces"]:
only gets executed once.
This in turn results in having only diags.nic_details[0] created.

In the failing loop at the end of method get_instance_diagnostics, the code looks in the domain xml for ./devices/interface/mac,
and finds the 2 interface occurences, therefore exceeding the max index by 1 when trying to write to diags.nic_details[index].mac_address

Changed in nova:
assignee: nobody → François Palin (francois.palin)
Revision history for this message
François Palin (francois.palin) wrote :

Here is an extract from the nova compute logs, when that error occurs (RH OSP 13):

INFO nova.compute.manager [req-... ... ... - default default] [instance: ...] Retrieving diagnostics
ERROR oslo_messaging.rpc.server [req-... ... ... - default default] Exception during message handling: IndexError: list index out of range
ERROR oslo_messaging.rpc.server Traceback (most recent call last):
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
ERROR oslo_messaging.rpc.server self.force_reraise()
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 214, in decorated_function
ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
ERROR oslo_messaging.rpc.server self.force_reraise()
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 202, in decorated_function
ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4624, in get_instance_diagnostics
ERROR oslo_messaging.rpc.server return self.driver.get_instance_diagnostics(instance)
ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 8519, in get_instance_diagnostics
ERROR oslo_messaging.rpc.server diags.nic_details[index].mac_address = node.get('address')
ERROR oslo_messaging.rpc.server IndexError: list index out of range
ERROR oslo_messaging.rpc.server

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/648123

Changed in nova:
status: New → In Progress
Revision history for this message
sean mooney (sean-k-mooney) wrote :

this is a clone of the following downstream bug https://bugzilla.redhat.com/show_bug.cgi?id=1649688

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/648123
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ab7c968b6f66404c032f62a952e353f94d3be165
Submitter: Zuul
Branch: master

commit ab7c968b6f66404c032f62a952e353f94d3be165
Author: Francois Palin <email address hidden>
Date: Tue Mar 26 15:22:40 2019 -0400

    Include all network devices in nova diagnostics

    get_instance_diagnostics expected all interfaces
    to have a <target> element with a "dev" attribute in
    the instance XML. This is not the case for VFIO
    interfaces (<interface type="hostdev">).
    This caused an IndexError when looping over
    the interfaces.

    This patch fixes this issue by retrieving interfaces
    data directly from the guest XML and adding nics
    appropriately to the diagnostics object.

    Change-Id: I8ef852d449e9e637d45e4ac92ffc5d1abd8d31c5
    Closes-Bug: #1821798

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/657125

Matt Riedemann (mriedem)
tags: added: libvirt
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/657125
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1d4f64b190afc60b0c2a56de718209869c41cfb3
Submitter: Zuul
Branch: stable/stein

commit 1d4f64b190afc60b0c2a56de718209869c41cfb3
Author: Francois Palin <email address hidden>
Date: Tue Mar 26 15:22:40 2019 -0400

    Include all network devices in nova diagnostics

    get_instance_diagnostics expected all interfaces
    to have a <target> element with a "dev" attribute in
    the instance XML. This is not the case for VFIO
    interfaces (<interface type="hostdev">).
    This caused an IndexError when looping over
    the interfaces.

    This patch fixes this issue by retrieving interfaces
    data directly from the guest XML and adding nics
    appropriately to the diagnostics object.

    Change-Id: I8ef852d449e9e637d45e4ac92ffc5d1abd8d31c5
    Closes-Bug: #1821798
    (cherry picked from commit ab7c968b6f66404c032f62a952e353f94d3be165)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/661962

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.1

This issue was fixed in the openstack/nova 19.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/661962
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=19ca8bcc2232e1d81efc349948a21cc1c3fc811d
Submitter: Zuul
Branch: stable/rocky

commit 19ca8bcc2232e1d81efc349948a21cc1c3fc811d
Author: Francois Palin <email address hidden>
Date: Tue Mar 26 15:22:40 2019 -0400

    Include all network devices in nova diagnostics

    get_instance_diagnostics expected all interfaces
    to have a <target> element with a "dev" attribute in
    the instance XML. This is not the case for VFIO
    interfaces (<interface type="hostdev">).
    This caused an IndexError when looping over
    the interfaces.

    This patch fixes this issue by retrieving interfaces
    data directly from the guest XML and adding nics
    appropriately to the diagnostics object.

    The new functional test has been left out of this
    cherry-pick, since a lot of the test code that
    supports the test is missing and would have to be
    back-ported just for that one test, including a
    ramification of other commit dependencies.
    The functional code change itself is rather simple,
    and not having this functional test present in
    Rocky is considered to be low risk.

    Change-Id: I8ef852d449e9e637d45e4ac92ffc5d1abd8d31c5
    Closes-Bug: #1821798
    (cherry picked from commit 1d4f64b190afc60b0c2a56de718209869c41cfb3)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.1

This issue was fixed in the openstack/nova 18.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/666152

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/666152
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2009859b5225541d8560e5ca1efb91de644d12be
Submitter: Zuul
Branch: stable/queens

commit 2009859b5225541d8560e5ca1efb91de644d12be
Author: Francois Palin <email address hidden>
Date: Tue Mar 26 15:22:40 2019 -0400

    Include all network devices in nova diagnostics

    get_instance_diagnostics expected all interfaces
    to have a <target> element with a "dev" attribute in
    the instance XML. This is not the case for VFIO
    interfaces (<interface type="hostdev">).
    This caused an IndexError when looping over
    the interfaces.

    This patch fixes this issue by retrieving interfaces
    data directly from the guest XML and adding nics
    appropriately to the diagnostics object.

    The new functional test has been left out of this
    cherry-pick, since a lot of the test code that
    supports the test is missing and would have to be
    back-ported just for that one test, including a
    ramification of other commit dependencies.
    The functional code change itself is rather simple,
    and not having this functional test present in
    Queens is considered to be low risk.

    Change-Id: I8ef852d449e9e637d45e4ac92ffc5d1abd8d31c5
    Closes-Bug: #1821798
    (cherry picked from commit ab7c968b6f66404c032f62a952e353f94d3be165)
    (cherry picked from commit 1d4f64b190afc60b0c2a56de718209869c41cfb3)
    (cherry picked from commit 19ca8bcc2232e1d81efc349948a21cc1c3fc811d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.11

This issue was fixed in the openstack/nova 17.0.11 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.