Two unlocks required when converting a single-nic system to enable SR-IOV on the underlying interface
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Low
|
Steven Webster |
Bug Description
Brief Description
-----------------
If a system is converted to a shared-nic configuration with SR-IOV enabled on the underlying physical interface, the system will undergo two reboots after the host is unlocked.
Severity
--------
Minor: The system will recover automatically, but the system will be rebooted twice from after the first unlock.
Steps to Reproduce
------------------
1. Consider a system with mgmt and oam vlan interfaces on-top of a physical ethernet platform interface:
system host-if-modify controller-0 eth0 -c platform
system host-if-add -V 10 controller-0 oam0 vlan eth0
system interface-
system host-if-add -V 11 controller-0 mgmt0 vlan eth0
system interface-
2. The system is then unlocked:
system host-unlock controller-0
3. When the system comes back up, the ethernet platform interface is then converted to be of class pci-sriov with 16 VFs:
system host-lock controller-0
system host-if-modify eth0 -c pci-sriov -N 16
system host-unlock controller-0
4. The system is then unlocked:
system host-unlock controller-0
5. When the controller manifest is applied, note that ceph-mon and pmond fail to bind to the management address, and the system is rebooted.
6. After the reboot, the system recovers.
Expected Behavior
------------------
The system should only require one unlock/reboot to apply the config
Actual Behavior
----------------
The system goes through another reboot, when the controller manifest fails on first reboot
Reproducibility
---------------
100%
System Configuration
-------
This should apply to all configs (AIO/Standard). In the case of an IPv6 system, it would be noticed that the vlan interfaces lose IPv6 addresses as well as the default route, if any. In the case of an IPv4 system, the default route related to the management interface would be lost.
Branch/Pull Time/Commit
-------
master 2021-01-27 or later
Last Pass
---------
N/A the allowance of a single-nic w/ SR-IOV is a recent feature
Timestamp/Logs
--------------
Observe the puppet logs from the controller manifest application (on an AIO) or worker manifest application (on a Standard system)
Test Activity
-------------
Feature testing
Workaround
----------
The workaround would be to configure the SR-IOV interface in Step 1 in the 'Steps to Reproduce'
CVE References
description: | updated |
Triage:
This issue is ultimately caused by the apply-network- config step of the controller/worker manifest. This step launches a script that detects differences between the puppet view of what the /etc/sysconfig/ network- scripts should be and what the value of the ifcfg files actually is on the system. If there are differences, the puppet view of the interface configuration is copied to the system network-scripts directory and the interface is brought down and up to apply the config. If there are no changes between the puppet view and the system view, the interface is left alone.
What happens when the underlying physical interface is configured for SR-IOV is that commands to set the number of virtual functions is added to the pre-up option in the corresponding network-script. Puppet detects this change, copies the config, and brings the interface down/up. This causes the upper vlan interfaces to lose IPv6 addresses + default route. In the case of an IPv4 system, the default route would be lost, which could be an issue in a distributed cloud environment.