IP is not set after system host-addr-add

Bug #1832697 reported by ChenjieXu
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
marvin Yu

Bug Description

Brief Description
-----------------
After executing command "system host-addr-add controller-1 eth1002 192.168.100.40 24" and unlock the controller-1. The interface eth1002 is not up and thus the IP 192.168.100.40 doesn't exist.

Severity
--------
Major

Steps to Reproduce
------------------
- virtual environment with StarlingX AIO DUPLEX
- system datanetwork-add net_vxlan vxlan --multicast_group 224.0.0.1 --ttl 255 --port_num 4789
  system host-if-list -a controller-1
  system host-if-modify -m 1500 -n data2 -d net_vxlan -c data controller-1 ${DATA2IFUUID}
  system host-if-modify --ipv4-mode static controller-1 ${DATA2IFUUID}
  system host-addr-add controller-1 eth1002 192.168.100.40 24
  system host-unlock controller-1
  system host-list
  After controller-1 unlocks successfully, check the interface eth1002
  ifconfig eth1002

Expected Behavior
------------------
- eth1002: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 192.168.100.40 netmask 255.255.255.0 broadcast 192.168.100.255
        inet6 fe80::a00:27ff:fe24:9637 prefixlen 64 scopeid 0x20<link>
        ether 08:00:27:24:96:37 txqueuelen 1000 (Ethernet)
        RX packets 13546 bytes 886709 (865.9 KiB)
        RX errors 0 dropped 23 overruns 0 frame 0
        TX packets 11856 bytes 594175 (580.2 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Actual Behavior
----------------
- eth1002: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        ether 08:00:27:24:96:37 txqueuelen 1000 (Ethernet)
        RX packets 13752 bytes 897241 (876.2 KiB)
        RX errors 0 dropped 23 overruns 0 frame 0
        TX packets 12059 bytes 603502 (589.3 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Reproducibility
---------------
100%

System Configuration
--------------------
Duplex system

Lab-name: WCP_113-121

Branch/Pull Time/Commit
-----------------------
stx master as of 20190607T142331Z

Last Pass
---------
Unclear

Timestamp/Logs
--------------
eth1002: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
       ether 08:00:27:24:96:37 txqueuelen 1000 (Ethernet)
       RX packets 13752 bytes 897241 (876.2 KiB)
       RX errors 0 dropped 23 overruns 0 frame 0
       TX packets 12059 bytes 603502 (589.3 KiB)
       TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Test Activity
-------------
Developer Testing

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.2.0 gating; seems to be an issue with vxlan configuration

tags: added: stx.networking
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Forrest Zhao (forrest.zhao)
tags: added: stx.2.0
Changed in starlingx:
assignee: Forrest Zhao (forrest.zhao) → Le, Huifeng (hle2)
Revision history for this message
Le, Huifeng (hle2) wrote :
Download full text (3.3 KiB)

Through debugging and analysis codes, this issue is caused as below:

the command create data2 as data interface:
system host-if-modify -m 1500 -n data2 -d net_vxlan -c data controller-1 ${DATA2IFUUID}

while data interface will not be configured by puppet according to current flow:
1. needs_interface_config will by-pass this interface as it is is_dpdk_compatible but not is_a_mellanox_device, so the puppet configuration (platform::interfaces::network_config) will not be generated for this interface.

def needs_interface_config(context, iface):
    """
    Determine whether an interface needs to be configured in the linux kernel.
    This is true if the interface is a platform interface, is required by a
    platform interface (i.e., an AE member, a VLAN lower interface), or is an
    unaccelerated data interface.
    """
    if is_platform_interface(context, iface):
        return True
    elif not is_worker_subfunction(context):
        return False
    elif is_data_interface(context, iface):
        if not is_dpdk_compatible(context, iface):
            # vswitch interfaces for devices that are not natively supported by
            # the DPDK are created as regular Linux devices and then bridged in
            # to vswitch in order for it to be able to use it indirectly.
            return True
        if is_a_mellanox_device(context, iface):
            # Check for Mellanox data interfaces. We must set the MTU sizes of
            # Mellanox data interfaces in case it is not the default. Normally
            # data interfaces are owned by DPDK, they are not managed through
            # Linux but in the Mellanox case, the interfaces are still visible
            # in Linux so in case one needs to set jumbo frames, it has to be
            # set in Linux as well. We only do this for combined nodes or
            # non-controller nodes.
            return True
    elif is_pci_interface(iface):
        return True
    return False

2. 'ipaddress' will not be filled even force generating the puppet configuration (platform::interfaces::network_config) for data interface
def get_common_network_config(context, iface, config, network_id=None):
   ...

    method = get_interface_address_method(context, iface, network_id)
    if method == STATIC_METHOD:
        address = get_interface_primary_address(context, iface, network_id)
        if address:
            config['ipaddress'] = address['address']
            config['netmask'] = address['netmask']
        else:
            LOG.info("Interface %s has no primary address" % iface['ifname'])

       ...
    return config

def get_interface_address_method(context, iface, network_id=None):
    """
    Determine what type of interface to configure for each network type.
    """
    networktype = find_networktype_by_network_id(context, network_id)

    ...
    elif iface.ifclass == constants.INTERFACE_CLASS_DATA:
        # All data interfaces configured in the kernel because they are not
        # natively supported in vswitch or need to be shared with the kernel
        # because of a platform VLAN should be left as manual config
        return MANUAL_METHOD
    elif iface.ifclass in PCI_INTERFACE_CLASS...

Read more...

Revision history for this message
Matt Peters (mpeters-wrs) wrote :

The logic for the interface configuration will need to take into account the vswitch type for the data interfaces. it will need to check if the vswitch type is set to VSWITCH_TYPE_NONE, and if so, permit it to generate a static configuration for interfaces with an ifclass of INTERFACE_CLASS_DATA.

Revision history for this message
Le, Huifeng (hle2) wrote :

Matt,
So do you mean if vswitch_type == VSWITCH_TYPE_NONE,
then
(1) get_interface_address_method should return STATIC_METHOD for ifclass = INTERFACE_CLASS_DATA
(2) needs_interface_config should return true (if is_dpdk_compatible is true)?

Changed in starlingx:
assignee: Le, Huifeng (hle2) → marvin Yu (marvin-yu)
Revision history for this message
Matt Peters (mpeters-wrs) wrote :

1) Yes
2) needs_interface_config should return true if vswitch_type == VSWITCH_TYPE_NONE and is_data_interface is True. I would just put the vswitch_type check before the other checks under the is_data_interface block.

Revision history for this message
marvin Yu (marvin-yu) wrote :

Hi matt,
As mentioned,in this way to fix the bug, I found it seems works when I tried it in the starlingx iso-image-0611. after unlocked,the interface address has been set correctly.
=================================================================
controller-0:~$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
        inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
        ether 02:42:db:98:83:82 txqueuelen 0 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 10.0.2.15 netmask 255.255.255.0 broadcast 10.0.2.255
        inet6 fe80::a00:27ff:fe41:b8b prefixlen 64 scopeid 0x20<link>
        ether 08:00:27:41:0b:8b txqueuelen 1000 (Ethernet)
        RX packets 150317 bytes 152918309 (145.8 MiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 73461 bytes 6907878 (6.5 MiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth1000: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet6 fe80::a00:27ff:fe3f:979e prefixlen 64 scopeid 0x20<link>
        ether 08:00:27:3f:97:9e txqueuelen 1000 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 867 bytes 234502 (229.0 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth1001: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet6 fe80::a00:27ff:fefa:ffe1 prefixlen 64 scopeid 0x20<link>
        ether 08:00:27:fa:ff:e1 txqueuelen 1000 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 846 bytes 233620 (228.1 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth1002: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 192.168.100.40 netmask 255.255.255.0 broadcast 192.168.100.255
        inet6 fe80::a00:27ff:fee9:9f76 prefixlen 64 scopeid 0x20<link>
        ether 08:00:27:e9:9f:76 txqueuelen 1000 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 847 bytes 232718 (227.2 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
......
=================================================

But I find some interfaces(e.g. eth1000 eth1001) what has no primary address have also been up. the reason
probably related with source code change. is this normal?

Revision history for this message
Matt Peters (mpeters-wrs) wrote :

What configuration did you specify for the eth1000 and eth1001 interfaces that did not get configured? Can you provide the system commands you ran (e.g. system host-addr-add) and the attach the resulting /opt/platform/puppet/XXXX/hieradata/<hostaddr>.yaml

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/670286

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Raising the priority to high after review with networking TL (Matt Peters) as this bug results in the data interfaces not being configured properly for any system using vswitch=none. This is a basic configuration which should be working.

Changed in starlingx:
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/670286
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=a90c0f52bbd8277374f76ac5b81c385da0896fe4
Submitter: Zuul
Branch: master

commit a90c0f52bbd8277374f76ac5b81c385da0896fe4
Author: marvin <email address hidden>
Date: Thu Jul 11 19:36:22 2019 +0800

    Enable data interface if vswitch is unaccelerated

    starlingx will by-pass data interfaces configuration by default,
    and this will cause issues such as fail to set the IP for data
    interfaces when vswitch type is VSWITCH_TYPE_NONE.
    this patch checks if the vswitch type is VSWITCH_TYPE_NONE,and if so,
    permit it to generate a static configuration for interfaces with an
    ifclass of INTERFACE_CLASS_DATA which will allow puppet to configure
    the data interface after host unlock.

    Change-Id: I66d8c7750a68b319bbfec2dcaa5d09aea41fa864
    Closes-bug: #1832697
    Signed-off-by: marvin <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.