VXLAN not enabled on StarlingX with containers

Bug #1821135 reported by ChenjieXu on 2019-03-21
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Low
ChenjieXu

Bug Description

Title
-----
VXLAN not enabled on StarlingX with containers

Brief Description
-----------------
By setting up VXLAN datanetwork, assigning IP to interface, creating VXLAN tenant network and creating VM on VXLAN tenant network, the VM can't ping another VM on different host. The tunneling_ip of all ovs agents are the same "172.17.0.1" which is the IP of interface docker0. The tunneling_ip should be the IP assigned to the interface. The tunnel port on br-tun is not created. Thus VXLAN traffic can't go to another host.

Severity
--------
Critical

Steps to Reproduce
------------------
1. On active controller:
   source /etc/platform/openrc
   system host-lock compute-0
   system host-lock compute-1
   system datanetwork-add tenant_vxlan vxlan --multicast_group 224.0.0.1 --ttl 255 --port_num 4789
   system host-if-list -a compute-0
   system host-if-list -a compute-1
   system host-if-modify -m 1500 -n data0 -d tenant_vxlan -c data compute-0 ${DATA0IFUUID}
   system host-if-modify -m 1500 -n data0 -d tenant_vxlan -c data compute-1 ${DATA0IFUUID}
   system host-if-modify --ipv4-mode static compute-0 ${DATA0IFUUID}
   system host-if-modify --ipv4-mode static compute-1 ${DATA0IFUUID}
   system host-addr-add compute-0 ${DATA0IFUUID} 192.168.100.30 24
   system host-addr-add compute-1 ${DATA0IFUUID} 192.168.100.40 24
   system host-unlock compute-0
   system host-unlock compute-1

2. After compute-0 and compute-1 rebooting, on active controller
   export OS_CLOUD=openstack_helm
   ADMINID=`openstack project list | grep admin | awk '{print $2}'`
   openstack network segment range create tenant-vxlan-range --network-type vxlan --minimum 400 --maximum 499 --private --project ${ADMINID}
   neutron net-create --tenant-id ${ADMINID} --provider:network_type=vxlan net1
   neutron subnet-create --tenant-id ${ADMINID} --name subnet1 net1 192.168.101.0/24
   openstack server create --image cirros --flavor m1.tiny --network net1 vm1
   openstack server create --image cirros --flavor m1.tiny --network net1 vm2
   Ensure vm1 and vm2 on different host.
   vm1 ping vm2

Expected Behavior
------------------
vm1 ping vm2 successfully.

Actual Behavior
----------------
vm1 ping vm2 unsuccessfully.

System Configuration
--------------------
System mode: Standard 2+2 on Bare metals

Reproducibility
---------------
100%

Branch/Pull Time/Commit
-----------------------
0306 ISO Image built for OVS DPDK Upgrade

Timestamp/Logs
--------------
+---------------------+-------------------------------------------------------+
| Field | Value |
+---------------------+-------------------------------------------------------+
| admin_state_up | True |
| agent_type | Open vSwitch agent |
| alive | True |
| availability_zone | |
| binary | neutron-openvswitch-agent |
| configurations | { |
| | "integration_bridge": "br-int", |
| | "ovs_hybrid_plug": false, |
| | "in_distributed_mode": false, |
| | "datapath_type": "netdev", |
| | "arp_responder_enabled": true, |
| | "resource_provider_inventory_defaults": { |
| | "min_unit": 1, |
| | "allocation_ratio": 1.0, |
| | "step_size": 1, |
| | "reserved": 0 |
| | }, |
| | "vhostuser_socket_dir": "/var/run/openvswitch", |
| | "resource_provider_bandwidths": {}, |
| | "devices": 5, |
| | "ovs_capabilities": { |
| | "datapath_types": [ |
| | "netdev", |
| | "system" |
| | ], |
| | "iface_types": [ |
| | "dpdk", |
| | "dpdkr", |
| | "dpdkvhostuser", |
| | "dpdkvhostuserclient", |
| | "erspan", |
| | "geneve", |
| | "gre", |
| | "internal", |
| | "ip6erspan", |
| | "ip6gre", |
| | "lisp", |
| | "patch", |
| | "stt", |
| | "system", |
| | "tap", |
| | "vxlan" |
| | ] |
| | }, |
| | "extensions": [], |
| | "l2_population": true, |
| | "tunnel_types": [ |
| | "vxlan" |
| | ], |
| | "log_agent_heartbeats": false, |
| | "enable_distributed_routing": false, |
| | "bridge_mappings": { |
| | "physnet0": "br-phy0" |
| | }, |
| | "tunneling_ip": "172.17.0.1" |
| | } |
| created_at | 2019-03-15 21:13:41 |
| description | |
| heartbeat_timestamp | 2019-03-20 21:25:16 |
| host | compute-0 |
| id | ec3192c9-224a-4c23-8c55-d30ad871a2d3 |
| started_at | 2019-03-20 16:15:27 |
| topic | N/A |
+---------------------+-------------------------------------------------------+

 Bridge br-tun
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port br-tun
            Interface br-tun
                type: internal

Last time install passed
------------------------
n/a

ChenjieXu (midone) on 2019-03-21
description: updated
ChenjieXu (midone) wrote :

The tunnel interface is hard coded as docker0. You can find the code by following command:
   On active controller:
      export OS_CLOUD=openstack_helm
      kubectl -n openstack edit cm neutron-bin

The code is listed below:
    tunnel_interface="docker0"
    if [ -z "${tunnel_interface}" ] ; then
        # search for interface with default routing
        # If there is not default gateway, exit
        tunnel_interface=$(ip -4 route list 0/0 | awk -F 'dev' '{ print $2; exit }' | awk '{ print $1 }') || exit 1
    fi

ChenjieXu (midone) wrote :

By changing docker0 to br-phy0, VXLAN can be enabled. But this workaround requires: the bridges "br-phyi" on different compute nodes share a same name. For example:
   compute-0 compute-1
    br-phy0 br-phy0 Work
    br-phy0 br-phy1 Don't Work (br-phy0 on compute-1 may not exist or don't have an IP address)

Matt Peters (mpeters-wrs) wrote :

Do you see the local_ip set correctly within the openstack helm overrides?

source /etc/platform/openrc
system helm-override-show neutron openstack

Matt Peters (mpeters-wrs) wrote :

Sample neutron helm overrides from a VxLAN system in Wind River attached.

Ghada Khalil (gkhalil) on 2019-03-21
tags: added: stx.networking
ChenjieXu (midone) wrote :

Hi Matt,

I tested VXLAN tenant network again and this time everything works fine.
   The VM on different hosts can ping each other.
   The tunnel port on br-tun has been created.
   local_ip has been set in ovs agent's openvswitch_agent.ini
   tunneling_ip showed by command "neutron agent-show $ovsagent" is correct.
   local_ip has been set correctly within the openstack helm overrides

It seems this bug doesn't occur 100% or maybe I miss some steps previoursly. I think we can wait to see whether Elio can reproduce this bug or not.

Ricardo Perez (richomx) wrote :

This issue is reproducible using 2+2 and Duplex configurations Bare Metal

We also observed the following behavior:

* Besides Horizon reports the proper creation of the VMs, and it shows an assigned IP, the VM console can't be opened via Horizon.

*Once that we be able to open the console, using 2 different methods (virsh console and port forwarding / tunneling), we figured out, that the VMs doesn't have an IP assigned for the eth0 interface. I believe this is issue is already described here: https://bugs.launchpad.net/starlingx/+bug/1820378

* We assigned manually the IP to each one of the VMs, and tried the ping without success.

Ghada Khalil (gkhalil) wrote :

Ricardo/Elio, please re-test the vxlan config again by using the proper VM flavor ("hw:mem_page_size=large") as described in https://bugs.launchpad.net/starlingx/+bug/1820378

Changed in starlingx:
status: New → Incomplete
Ghada Khalil (gkhalil) wrote :

Marking as Invalid. Issue was not reproduced by Ricardo/Elio in a month. They have concluded their ovs-dpdk testing and did not report this issue in their final report.

Changed in starlingx:
importance: Undecided → Low
assignee: nobody → ChenjieXu (midone)
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers