sriovdp generates many log files when not configured properly

Bug #2002447 reported by Lucas Ratusznei Fonseca
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Lucas Ratusznei Fonseca

Bug Description

*+Brief Description+*

When the label "sriovdp=enabled" is assigned to the host and no PCI-SRIOV interfaces are enabled, the sriov plugin pod remains in a crash loop and generates a large number of log files in /var/log/sriovdp.

*+Severity+*

Minor: No negative system impact other than too many logs being generated which then causes collect to timeout.

*+Steps to Reproduce+*

Step1:
Add host label "sriovdp" to the host, but not enable any of PCI-SRIOV interface.
  ->  Then sriov-device-plugin was failed with "CrashLoopBackOff".  This will create a lot of files under /var/log/sriovdp/ on the target node.

kube-system                   kube-sriov-device-plugin-amd64-97tw8              0/1     CrashLoopBackOff   6983       24d

*+Expected Behavior+*

It should not be possible to unlock the host if the label "sriovdp=enabled" is assigned and no PCI-SRIOV interfaces are enabled.
The sriov plugin pod should not crash, even if the system is not configured properly.

*+Actual Behavior+*

Excessive log files are generated by sriovdp

*+Reproducibility+*

Reproducible

*+System Configuration+*

Any

*+Last Pass+*

N/A

*+Timestamp/Logs+*

Not Required

*+Alarms+*

No alarm

*+Test Activity+*

N/A

Changed in starlingx:
assignee: nobody → Lucas Ratusznei Fonseca (lratuszn)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/869774

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/869777

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tools (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/tools/+/869778

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/869852

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on integ (master)

Change abandoned by "Lucas Ratusznei Fonseca <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/869777
Reason: The solution is not effective, a better one was proposed here: https://review.opendev.org/c/starlingx/stx-puppet/+/869852

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tools (master)

Change abandoned by "Lucas Ratusznei Fonseca <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/tools/+/869778
Reason: The solution is not effective, a better one was proposed here: https://review.opendev.org/c/starlingx/stx-puppet/+/869852

Ghada Khalil (gkhalil)
tags: added: stx.networking
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/869774
Committed: https://opendev.org/starlingx/config/commit/16e800bcf9a9fc2bd97cc05ba0213091352c6130
Submitter: "Zuul (22348)"
Branch: master

commit 16e800bcf9a9fc2bd97cc05ba0213091352c6130
Author: Lucas Ratusznei Fonseca <email address hidden>
Date: Tue Jan 10 19:19:57 2023 -0300

    Semantic check for proper sriov plugin configuration

    This commit adds a semantic check to host-unlock to prevent unlocking
    if the label "sriovdp=enabled" is present and there are no PCI-SRIOV
    interfaces enabled.

    Test plan

    PASS Unlock system with sriovdp label absent and 0 PCI-SRIOV
         interfaces enabled
    PASS Unlock system with sriovdp label absent and 1 PCI-SRIOV
         interface enabled
    PASS Unlock system with sriovdp label present and 0 PCI-SRIOV
         interfaces enabled
    PASS Try to unlock system with sriovdp label present and 1 PCI-SRIOV
         interface enabled (command fails)

    Closes-Bug: #2002447

    Signed-off-by: Lucas Ratusznei Fonseca
                   <email address hidden>
    Change-Id: I8d9cba7354f066ad334af425fe26f57a93f58d3d

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/869852
Committed: https://opendev.org/starlingx/stx-puppet/commit/c39047149a5ab9e5b26be32280f336f360ef6fd1
Submitter: "Zuul (22348)"
Branch: master

commit c39047149a5ab9e5b26be32280f336f360ef6fd1
Author: Lucas Ratusznei Fonseca <email address hidden>
Date: Wed Jan 11 12:12:46 2023 -0300

    Add minimal config for the sriov device plugin

    This commit adds a minimal configuration for the sriov device plugin.
    If the config file (/etc/pcidp/config.json) has invalid contents or
    is empty, the plugin remains in a crash loop, generating a large
    number of log files. This change ensures that the config file has
    valid content, thus preventing such an issue.

    Test plan

    PASS Run Puppet in host with 0 PCI-SRIOV interfaces configured
    PASS Run Puppet in host with 1 PCI-SRIOV interface configured

    Closes-Bug: #2002447

    Signed-off-by: Lucas Ratusznei Fonseca
                   <email address hidden>
    Change-Id: I0ec6e9ebaad7b8ed7981f2f1af7b15195a3bfd43

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
tags: added: stx.8.0
Revision history for this message
Ghada Khalil (gkhalil) wrote (last edit ):

Re-opening to revert the semantic check introduced in https://review.opendev.org/c/starlingx/config/+/869774 as it's causing unlocks to fail for some configurations where the sriovdp=enabled label is defined, but the interfaces are not configured yet.

See review: https://review.opendev.org/c/starlingx/config/+/902888 for more details

tags: added: stx.9.0
Changed in starlingx:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/902888
Committed: https://opendev.org/starlingx/config/commit/ce0959e71c76d2e8e3d96a293f855f85efad4e14
Submitter: "Zuul (22348)"
Branch: master

commit ce0959e71c76d2e8e3d96a293f855f85efad4e14
Author: Andre Kantek <email address hidden>
Date: Thu Dec 7 16:12:32 2023 -0300

    Revert "Semantic check for proper sriov plugin configuration"

    This reverts commit 16e800bcf9a9fc2bd97cc05ba0213091352c6130.

    Reason for revert:
    This validation is preventing a 2 step approach to the configuration,
    first configuring labels and later adding the interface configuration.
    With the change:
    https://review.opendev.org/c/starlingx/stx-puppet/+/869852
    there is no reason to keep this validation as the pod sriov device
    plugin will no longer generate log spam since a dummy config file
    will be created, avoiding the state of CrashBackLoop.

    Closes-Bug: 2002447

    Change-Id: Ice4def2a23c5fe5fc0311bafff1a8ea5051e1609
    Signed-off-by: Andre Kantek <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.