No LLDP information available for Fortville i40e NIC

Bug #1923665 reported by Cole Walker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Cole Walker

Bug Description

Brief Description
-----------------
No LLDP information is available for certain i40e network devices - Fortville family.
Output is missing from system host-lldp-neighbor-show command

Severity
--------
Major

Steps to Reproduce
------------------
system host-lldp-neighbor-list 1
note lldp neighbours present

manually disable firmware level lldp feature on missing interfaces
(ie. sudo ethtool --set-priv-flags ens3f1 disable-fw-lldp on)

After a few minutes system host-lldp-neighbor-list 1 will contain information for neighbours on that interface

Expected Behavior
------------------
LLDP information should be available by default

Actual Behavior
----------------
LLDP information is not present for some i40e interfaces

Reproducibility
---------------
100%

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
Master

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Sanity

Workaround
----------
running the command
ethtool --set-priv-flags <interface_name> disable-fw-lldp on
allows lldp information to show up
Does not persist across reboots

Cole Walker (cwalops)
Changed in starlingx:
assignee: nobody → Cole Walker (cwalops)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/786127

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.networking
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as medium priority - would be nice to fix for stx.5.0, but will not hold up the release given this is a pre-existing issue from previous releases.

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.5.0
Cole Walker (cwalops)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/786127
Committed: https://opendev.org/starlingx/integ/commit/de263f633ea359507357d3d4c53e98a71bff5afc
Submitter: "Zuul (22348)"
Branch: master

commit de263f633ea359507357d3d4c53e98a71bff5afc
Author: Cole Walker <email address hidden>
Date: Tue Apr 13 16:47:24 2021 -0400

    Add alternative command to disable lldp agent for i40e devices

    LLDP information is not available for certain i40e network devices when
    running system host-lldp-neighbor-show.

    This is caused by the firmware lldp agent on the devices not getting
    disabled by the i40e-lldp-configure.sh script which is invoked by lldpd.

    The command used to disable the firmware lldp agent in the script works
    for some firmware versions found on devices, but not others. This change
    adds an ethtool command to disable the lldp agent which works for these
    other firmware versions.

    From testing, the ethtool method is used for firmware versions 5.05 and
    8.10. The sysfs method is used for firmware version 7.10. In all cases,
    the driver version is 2.14.13

    Closes-Bug: 1923665

    Signed-off-by: Cole Walker <email address hidden>
    Change-Id: Ifac34091599bd4020bf55cc1b8ba3119edccb297

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Cole Walker, please cherrypick this change to the r/stx.5.0 release branch asap and no later than April 30

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (r/stx.5.0)

Fix proposed to branch: r/stx.5.0
Review: https://review.opendev.org/c/starlingx/integ/+/787614

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (r/stx.5.0)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/787614
Committed: https://opendev.org/starlingx/integ/commit/118cf39d4ab05a268dce7019f1db28a26eb7f68f
Submitter: "Zuul (22348)"
Branch: r/stx.5.0

commit 118cf39d4ab05a268dce7019f1db28a26eb7f68f
Author: Cole Walker <email address hidden>
Date: Tue Apr 13 16:47:24 2021 -0400

    Add alternative command to disable lldp agent for i40e devices

    LLDP information is not available for certain i40e network devices when
    running system host-lldp-neighbor-show.

    This is caused by the firmware lldp agent on the devices not getting
    disabled by the i40e-lldp-configure.sh script which is invoked by lldpd.

    The command used to disable the firmware lldp agent in the script works
    for some firmware versions found on devices, but not others. This change
    adds an ethtool command to disable the lldp agent which works for these
    other firmware versions.

    From testing, the ethtool method is used for firmware versions 5.05 and
    8.10. The sysfs method is used for firmware version 7.10. In all cases,
    the driver version is 2.14.13

    Closes-Bug: 1923665

    Signed-off-by: Cole Walker <email address hidden>
    Change-Id: Ifac34091599bd4020bf55cc1b8ba3119edccb297
    (cherry picked from commit de263f633ea359507357d3d4c53e98a71bff5afc)

Ghada Khalil (gkhalil)
tags: added: in-r-stx50
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/integ/+/793754

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (f/centos8)
Download full text (37.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/793754
Committed: https://opendev.org/starlingx/integ/commit/a13966754d4e19423874ca31bf1533f057380c52
Submitter: "Zuul (22348)"
Branch: f/centos8

commit b310077093fd567944c6a46b7d0adcabe1f2b4b9
Author: Mihnea Saracin <email address hidden>
Date: Sat May 22 18:19:54 2021 +0300

    Fix resize of filesystems in puppet logical_volume

    After system reinstalls there is stale data on the disk
    and puppet fails when resizing, reporting some wrong filesystem
    types. In our case docker-lv was reported as drbd when
    it should have been xfs.

    This problem was solved in some cases e.g:
    when doing a live fs resize we wipe the last 10MB
    at the end of partition:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L146

    Our issue happened here:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L65
    Resize can happen at unlock when a bigger size is detected for the
    filesystem and the 'logical_volume' will resize it.
    To fix this we have to wipe the last 10MB of the partition after the
    'lvextend' cmd in the 'logical_volume' module.

    Tested the following scenarios:

    B&R on SX with default sizes of filesystems and cgts-vg.

    B&R on SX with with docker-lv of size 50G, backup-lv also 50G and
    cgts-vg with additional physical volumes:

    - name: cgts-vg
        physicalVolumes:
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk

    B&R on DX system with backup of size 70G and cgts-vg
    with additional physical volumes:

    physicalVolumes:
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk

    Closes-Bug: 1926591
    Change-Id: I55ae6954d24ba32e40c2e5e276ec17015d9bba44
    Signed-off-by: Mihnea Saracin <email address hidden>

commit 3225570530458956fd642fa06b83360a7e4e2e61
Author: Mihnea Saracin <email address hidden>
Date: Thu May 20 14:33:58 2021 +0300

    Execute once the ceph services script on AIO

    The MTC client manages ceph services via ceph.sh which
    is installed on all node types in
    /etc/service.d/{controller,worker,storage}/ceph.sh

    Since the AIO controllers have both controller and worker
    personalities, the MTC client will execute the ceph script
    twice (/etc/service.d/worker/ceph.sh,
    /etc/service.d/controller/ceph.sh).
    This behavior will generate some issues.

    We fix this by exiting the ceph script if it is the one from
    /etc/services.d/worker on AIO systems.

    Closes-Bug: 1928934
    Change-Id: I3e4dc313cc3764f870b8f6c640a60338...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.