Shared NIC: System doesn't retain the rate-limit config when a pod is deleted

Bug #1915951 reported by Steven Webster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Steven Webster

Bug Description

Brief Description
-----------------
If an SR-IOV interface of type 'VF' has been configured with rate-limiting, eg:

system host-if-modify <host> <sriov_vf_interface> -r <rate>

And a VLAN is present in the network attachment definition, the rate limiting setting on the VF is removed when the pod is deleted.

ie. ip link show after launching pod:

...
vf 31 MAC 8a:1c:34:72:2a:df, vlan 2222, tx rate 200 (Mbps), max_tx_rate 200Mbps, spoof checking on, link-state auto, trust off
...

ip link show after deleting pod:

...
vf 31 MAC 8a:1c:34:72:2a:df, spoof checking on, link-state auto, trust off
...

Severity
--------
Major: System/Feature is usable but degraded

If a pod is launched which grabs the same VF that should be configured with rate-limiting, the traffic will no longer be rate-limited.

Steps to Reproduce
------------------
system host-if-modify <host> <sriov_vf_interface> -r <rate>

Create a network attachment definition with a VLAN configured:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/pci_sriov_net_group0_data1
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "sriov",
      "vlan": 2222
    }'

Create a pod which uses the network attachment definition:

apiVersion: v1
kind: Pod
metadata:
  name: pod1
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
            { "name": "sriov1" },
            { "name": "sriov1" }
    ]'
spec:
  containers:
  - name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
      requests:
        intel.com/pci_sriov_net_group0_data1: '2'
      limits:
        intel.com/pci_sriov_net_group0_data1: '2'

Expected Behavior
------------------
Rate limiting on the VF should be retained

Actual Behavior
----------------
Rate limiting configuration is removed

Reproducibility
---------------
100%

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
02/17/2021 master

Last Pass
---------
N/A Note: The issue was not seen when a VLAN was not applied to the VF.

Test Activity
-------------
Feature Testing

Workaround
----------
Lock/Unlock the host

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium - found during testing of this stx.5.0 feature: https://storyboard.openstack.org/#!/story/2008470

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.5.0
Revision history for this message
Ghada Khalil (gkhalil) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Merged on 2021-02-22

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to root (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/root/+/792232

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to root (f/centos8)
Download full text (24.5 KiB)

Reviewed: https://review.opendev.org/c/starlingx/root/+/792232
Committed: https://opendev.org/starlingx/root/commit/7cb55d28111dbf7458ed3f01a5eda5c4dfdb124b
Submitter: "Zuul (22348)"
Branch: f/centos8

commit f457bd15b9ce0be512ec96abcb9b858b0bab2ea8
Author: Scott Little <email address hidden>
Date: Fri Apr 16 01:39:17 2021 -0400

    Improved branching tools

    create_branches_and_tags.sh:
    - Update the .gitreview files in branched git repos.
    - When updating a manifest, add the ability to update and
      use the default revision field.
    - Create two levels of manifest lockdown, soft and hard.
      Soft lockdown only sets sha revisions on unbranched projects
      that lack a revision, or set the revision to master.
      Hard lockdown applies to all unbranched projects.

    push_branches_tags.sh:
    - opendev no longer accepts 'git push' for the delivery of
      new branches with updates. Instead we must now
      use separate commands to deliver the tag, the branch,
      and any updates.

    Closes-Bug: 1924762
    Signed-off-by: Scott Little <email address hidden>
    Change-Id: I6d669ddc80cc9b3cb9e72d65a64589dbccf43ae3

commit 0babd33b6d851dd11492c26e92c8a6ac2c2557de
Author: Rafael Jardim <email address hidden>
Date: Wed May 12 11:25:54 2021 -0300

    Update stx-platformclients tag to stx.5.0-v1.4.3

    This commit updates the image with the updated clients.

    Test:
    Some normal commands
    Commands related with https dcmanager that wasn't working
    System application-upload that wasn't working when executed
    from remote cli

    Closes-Bug: 1928233
    Closes-Bug: 1928231
    Signed-off-by: Rafael Jardim <email address hidden>
    Change-Id: I8a0d12f699336a4412be5ff3c73cfb8d59038780

commit a163d7723e659e89c37ec933d1b0f9aa638a6a73
Author: Cole Walker <email address hidden>
Date: Wed May 5 09:39:17 2021 -0400

    Update image tag for notificationservice-base

    Update image tag to stx.5.0-v1.0.4 for notificationservice-base

    Closes-Bug: 1924201
    Closes-Bug: 1924197

    Signed-off-by: Cole Walker <email address hidden>
    Change-Id: Id863c4f154cc0b39e30ee8986fc07f9856a22826

commit 84c45f5e3a241237887af9db89d8c0aa1f8923e0
Author: Charles Short <email address hidden>
Date: Sun May 2 12:04:04 2021 -0400

    Fix wheels tarball generation

    In d7c5a54ab94bce6635b83d91a807d28f97836a81, django was dropped in favor
    of rfc3986. However the wheel was not added to the build-wheels
    generation which breaks the docker images. Also add the migrate wheel
    since it was mising as well. Add the required wheels in order to
    build the docker image properly.

    Test:
    - Build new centos-stable-wheels tarball
    - Build stx-keystone-api-proxy container

    Closes-Bug: 1926795

    Signed-off-by: Charles Short <email address hidden>
    Change-Id: Ib6f0abfdcc82ca14f92ebc5b45fe8df961e804ee

commit 062353b6de7d4b0e017203c3e6086891bd6b9213
Author: Cole Walker <email address hidden>
Date: Mon May 3 09:48:54 2021 -0400

    Update image tag for notificationclient-base...

tags: added: in-f-centos8
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/integ/+/793754

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ansible-playbooks (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/794296

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ansible-playbooks (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/792195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (f/centos8)
Download full text (52.5 KiB)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/794324
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/163ec9989cc7360dba4c572b2c43effd10306048
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 4e96b762f549aadb0291cc9bcf3352ae923e94eb
Author: Mihnea Saracin <email address hidden>
Date: Sat May 22 15:48:19 2021 +0000

    Revert "Restore host filesystems with collected sizes"

    This reverts commit 255488739efa4ac072424b19f2dbb7a3adb0254e.

    Reason for revert: Did a rework to fix https://bugs.launchpad.net/starlingx/+bug/1926591. The original problem was in puppet, and this fix in ansible was not good enough, it generated some other problems.

    Change-Id: Iea79701a874effecb7fe995ac468d50081d1a84f
    Depends-On: I55ae6954d24ba32e40c2e5e276ec17015d9bba44

commit c064aacc377c8bd5336ceab825d4bcbf5af0b5e8
Author: Angie Wang <email address hidden>
Date: Fri May 21 21:28:02 2021 -0400

    Ensure apiserver keys are present before extract from tarball

    This is to fix the upgrade playbook issue that happens during
    AIO-SX upgrade from stx4.0 to stx5.0 which introduced by
    https://review.opendev.org/c/starlingx/ansible-playbooks/+/792093.
    The apiserver keys are not available in stx4.0 side so we need
    to ensure the keys under /etc/kubernetes/pki are present in the
    backed-up tarball before extracting, otherwise playbook fails
    because the keys are not found in the archive.

    Change-Id: I8602f07d1b1041a7fd3fff21e6f9a422b9784ab5
    Closes-Bug: 928925
    Signed-off-by: Angie Wang <email address hidden>

commit 0261f22ff7c23d2a8608fe3b51725c9f29931281
Author: Don Penney <email address hidden>
Date: Thu May 20 23:09:07 2021 -0400

    Update SX to DX migration to wait for coredns config

    This commit updates the SX to DX migration playbook to wait after
    modifying the system mode to duplex until the runtime manifest that
    updates coredns config has completed. The playbook will wait for up to
    20 minutes to allow for the possibilty that sysinv has multiple
    runtime manifests queued up, each of which could take several minutes.

    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/792494
    Depends-On: https://review.opendev.org/c/starlingx/config/+/792496
    Change-Id: I3bf94d3493ae20eeb16b3fdcb27576ee18c0dc4d
    Closes-Bug: 1929148
    Signed-off-by: Don Penney <email address hidden>

commit 7c4f17bd0d92fc1122823211e1c9787829d206a9
Author: Daniel Safta <email address hidden>
Date: Wed May 19 09:08:16 2021 +0000

    Fixed missing apiserver-etcd-client certs

    When controller-1 is the active controller
    the backup archive does not contain
    /etc/etcd/apiserver-etcd-client.{crt, key}

    This change adds a new task which brings
    the certs from /etc/kubernetes/pki

    Closes-bug: 1928925
    Signed-off-by: Daniel Safta <email address hidden>
    Change-Id: I3c68377603e1af9a71d104e5b1108e9582497a09

commit e221ef8fbe51aa6ca229b584fb5632fe512ad5cb
Author: David Sullivan <email address hidden>
Date: Wed May 19 16:01:27 2021 -0500

    Support boo...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (f/centos8)
Download full text (37.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/793754
Committed: https://opendev.org/starlingx/integ/commit/a13966754d4e19423874ca31bf1533f057380c52
Submitter: "Zuul (22348)"
Branch: f/centos8

commit b310077093fd567944c6a46b7d0adcabe1f2b4b9
Author: Mihnea Saracin <email address hidden>
Date: Sat May 22 18:19:54 2021 +0300

    Fix resize of filesystems in puppet logical_volume

    After system reinstalls there is stale data on the disk
    and puppet fails when resizing, reporting some wrong filesystem
    types. In our case docker-lv was reported as drbd when
    it should have been xfs.

    This problem was solved in some cases e.g:
    when doing a live fs resize we wipe the last 10MB
    at the end of partition:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L146

    Our issue happened here:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L65
    Resize can happen at unlock when a bigger size is detected for the
    filesystem and the 'logical_volume' will resize it.
    To fix this we have to wipe the last 10MB of the partition after the
    'lvextend' cmd in the 'logical_volume' module.

    Tested the following scenarios:

    B&R on SX with default sizes of filesystems and cgts-vg.

    B&R on SX with with docker-lv of size 50G, backup-lv also 50G and
    cgts-vg with additional physical volumes:

    - name: cgts-vg
        physicalVolumes:
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk

    B&R on DX system with backup of size 70G and cgts-vg
    with additional physical volumes:

    physicalVolumes:
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk

    Closes-Bug: 1926591
    Change-Id: I55ae6954d24ba32e40c2e5e276ec17015d9bba44
    Signed-off-by: Mihnea Saracin <email address hidden>

commit 3225570530458956fd642fa06b83360a7e4e2e61
Author: Mihnea Saracin <email address hidden>
Date: Thu May 20 14:33:58 2021 +0300

    Execute once the ceph services script on AIO

    The MTC client manages ceph services via ceph.sh which
    is installed on all node types in
    /etc/service.d/{controller,worker,storage}/ceph.sh

    Since the AIO controllers have both controller and worker
    personalities, the MTC client will execute the ceph script
    twice (/etc/service.d/worker/ceph.sh,
    /etc/service.d/controller/ceph.sh).
    This behavior will generate some issues.

    We fix this by exiting the ceph script if it is the one from
    /etc/services.d/worker on AIO systems.

    Closes-Bug: 1928934
    Change-Id: I3e4dc313cc3764f870b8f6c640a60338...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.