Duplicate declaration of br-ex in tripleo_network_config

Bug #1905640 reported by Brendan Shephard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
In Progress
Undecided
Brendan Shephard

Bug Description

Summary:
We see to be declaring br-ex twice in the multiple_nics.j2 template:
https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_network_config/templates/multiple_nics/multiple_nics_dvr.j2#L24-L26

And obviously after the tenant network is configured, we configure br-ex:
https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_network_config/templates/multiple_nics/multiple_nics_dvr.j2#L43-L44

But the second name: section under the tenant network seems to be breaking the deployment. See the details below

Environment:
[root@tripleo-director overcloud]# rpm -qa | grep tripleo-ansible
tripleo-ansible-2.1.0-0.20201102184033.62a7fa9.el8.noarch

Details:
This part of the network config:
https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_network_config/templates/multiple_nics/multiple_nics_dvr.j2#L24-L26

Results in two br-NAME bridges in OvS:
```
- type: ovs_bridge
  name: br-tenant
  name: br-ex
  mtu: 9000
  dns_servers: ['8.8.8.8']
  use_dhcp: false
  addresses:
  - ip_netmask:
      172.16.0.120/24
  routes: []
  members:
  - type: interface
    name: nic5
    mtu: 9000
    use_dhcp: false
    primary: true
- type: ovs_bridge
  name: br-ex
  mtu: 9000
  dns_servers: ['8.8.8.8']
  use_dhcp: false
  addresses:
  - ip_netmask:
      172.20.10.23/16
  routes: [{'default': True, 'next_hop': '172.20.0.254'}]
```

This doesn't create the br-tenant bridge, and the deployment fails. We can see the bridge missing in OvS:
```
[root@overcloud-controller-0 ~]# ovs-vsctl show
e970fec3-95a4-48e2-abcf-06b018392c52
    Bridge br-ex
        fail_mode: standalone
        Port enp6s0
            Interface enp6s0
        Port br-ex
            Interface br-ex
                type: internal
    ovs_version: "2.13.0"
```

If I remove the unnecessary br-ex declaration, it all works as expected:
```
- type: ovs_bridge
  name: br-tenant
  mtu: 9000
  dns_servers: ['8.8.8.8']
  use_dhcp: false
  addresses:
  - ip_netmask:
      172.16.0.120/24
  routes: []
  members:
  - type: interface
    name: nic5
    mtu: 9000
    use_dhcp: false
    primary: true
- type: ovs_bridge
  name: br-ex
  mtu: 9000
  dns_servers: ['8.8.8.8']
  use_dhcp: false
  addresses:
  - ip_netmask:
      172.20.10.23/16
  routes: [{'default': True, 'next_hop': '172.20.0.254'}]
```

Re-run os-net-config:
```
[root@overcloud-controller-0 ~]# os-net-config -c /etc/os-net-config/config.yaml -vvv
[2020/11/26 01:02:20 AM] [INFO] Using config file at: /etc/os-net-config/config.yaml
[2020/11/26 01:02:20 AM] [INFO] Ifcfg net config provider created.
[2020/11/26 01:02:20 AM] [INFO] Not using any mapping file.
[2020/11/26 01:02:21 AM] [INFO] Finding active nics
[2020/11/26 01:02:21 AM] [INFO] enp3s0 is an active nic
[2020/11/26 01:02:21 AM] [INFO] enp6s0 is an active nic
[2020/11/26 01:02:21 AM] [INFO] enp2s0 is an active nic
[2020/11/26 01:02:21 AM] [INFO] enp5s0 is an active nic
[2020/11/26 01:02:21 AM] [INFO] lo is not an active nic
[2020/11/26 01:02:21 AM] [INFO] enp1s0 is an active nic
[2020/11/26 01:02:21 AM] [INFO] enp4s0 is an active nic
[2020/11/26 01:02:21 AM] [INFO] ovs-system is not an active nic
[2020/11/26 01:02:21 AM] [INFO] br-ex is not an active nic
[2020/11/26 01:02:21 AM] [INFO] No DPDK mapping available in path (/var/lib/os-net-config/dpdk_mapping.yaml)
[2020/11/26 01:02:21 AM] [INFO] Active nics are ['enp1s0', 'enp2s0', 'enp3s0', 'enp4s0', 'enp5s0', 'enp6s0']
[2020/11/26 01:02:21 AM] [INFO] nic6 mapped to: enp6s0
[2020/11/26 01:02:21 AM] [INFO] nic1 mapped to: enp1s0
[2020/11/26 01:02:21 AM] [INFO] nic5 mapped to: enp5s0
[2020/11/26 01:02:21 AM] [INFO] nic2 mapped to: enp2s0
[2020/11/26 01:02:21 AM] [INFO] nic4 mapped to: enp4s0
[2020/11/26 01:02:21 AM] [INFO] nic3 mapped to: enp3s0
[2020/11/26 01:02:21 AM] [INFO] adding interface: enp1s0
[2020/11/26 01:02:21 AM] [INFO] adding interface: enp2s0
[2020/11/26 01:02:21 AM] [INFO] adding interface: enp3s0
[2020/11/26 01:02:21 AM] [INFO] adding interface: enp4s0
[2020/11/26 01:02:21 AM] [INFO] adding bridge: br-tenant
[2020/11/26 01:02:21 AM] [INFO] adding interface: enp5s0
[2020/11/26 01:02:21 AM] [INFO] adding bridge: br-ex
[2020/11/26 01:02:21 AM] [INFO] adding custom route for interface: br-ex
[2020/11/26 01:02:21 AM] [INFO] adding interface: enp6s0
[2020/11/26 01:02:21 AM] [INFO] applying network configs...
[2020/11/26 01:02:21 AM] [INFO] No changes required for interface: enp1s0
[2020/11/26 01:02:21 AM] [INFO] No changes required for interface: enp2s0
[2020/11/26 01:02:21 AM] [INFO] No changes required for interface: enp3s0
[2020/11/26 01:02:21 AM] [INFO] No changes required for interface: enp4s0
[2020/11/26 01:02:21 AM] [INFO] No changes required for interface: enp6s0
[2020/11/26 01:02:21 AM] [INFO] No changes required for bridge: br-ex
[2020/11/26 01:02:21 AM] [INFO] running ifdown on interface: enp5s0
[2020/11/26 01:02:21 AM] [INFO] running ifdown on bridge: br-tenant
[2020/11/26 01:02:21 AM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-enp5s0
[2020/11/26 01:02:21 AM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-br-tenant
[2020/11/26 01:02:21 AM] [INFO] running ifup on bridge: br-tenant
[2020/11/26 01:02:25 AM] [INFO] running ifup on interface: enp5s0
```

And we can now see the bridges and IP that was previously missing:
```
[root@overcloud-controller-0 ~]# ovs-vsctl show
e970fec3-95a4-48e2-abcf-06b018392c52
    Bridge br-tenant
        fail_mode: standalone
        Port enp5s0
            Interface enp5s0
        Port br-tenant
            Interface br-tenant
                type: internal
    Bridge br-ex
        fail_mode: standalone
        Port enp6s0
            Interface enp6s0
        Port br-ex
            Interface br-ex
                type: internal
    ovs_version: "2.13.0"
[root@overcloud-controller-0 ~]# ip -o a
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
1: lo inet6 ::1/128 scope host \ valid_lft forever preferred_lft forever
2: enp1s0 inet 192.168.24.16/24 brd 192.168.24.255 scope global enp1s0\ valid_lft forever preferred_lft forever
2: enp1s0 inet6 fe80::546f:14ff:fef3:7/64 scope link \ valid_lft forever preferred_lft forever
3: enp2s0 inet 172.16.1.179/24 scope global enp2s0\ valid_lft forever preferred_lft forever
3: enp2s0 inet6 fe80::546f:14ff:fef3:8/64 scope link \ valid_lft forever preferred_lft forever
4: enp3s0 inet 172.16.3.202/24 scope global enp3s0\ valid_lft forever preferred_lft forever
4: enp3s0 inet6 fe80::546f:14ff:fef3:9/64 scope link \ valid_lft forever preferred_lft forever
5: enp4s0 inet 172.16.2.6/24 scope global enp4s0\ valid_lft forever preferred_lft forever
5: enp4s0 inet6 fe80::546f:14ff:fef3:a/64 scope link \ valid_lft forever preferred_lft forever
6: enp5s0 inet6 fe80::546f:14ff:fef3:b/64 scope link \ valid_lft forever preferred_lft forever
7: enp6s0 inet6 fe80::546f:14ff:fef3:c/64 scope link \ valid_lft forever preferred_lft forever
11: br-ex inet 172.20.10.23/16 brd 172.20.255.255 scope global br-ex\ valid_lft forever preferred_lft forever
11: br-ex inet6 fe80::546f:14ff:fef3:c/64 scope link \ valid_lft forever preferred_lft forever
12: br-tenant inet 172.16.0.120/24 brd 172.16.0.255 scope global br-tenant\ valid_lft forever preferred_lft forever
12: br-tenant inet6 fe80::546f:14ff:fef3:b/64 scope link \ valid_lft forever preferred_lft forever
```

Steps to reproduce:
1. Follow the guide for deploying with network isolation:
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/network_isolation.html
2. Run the deployment. It fails for me with this error:
```
2020-11-26 10:44:06,395 p=700865 u=stack n=ansible | 2020-11-26 10:44:06.394124 | 566f14f3-0016-a095-f7d9-000000000be2 | FATAL | Check Controllers availability | overcloud-controller-0 | item=172.16.0.120 | error={"ansible_loop_var": "controller", "changed": false, "cmd": ["ping", "-w", "10", "-c", "1", "172.16.0.120"], "controller": "172.16.0.120", "delta": "0:00:10.007624", "end": "2020-11-26 00:44:06.340714", "msg": "non-zero return code", "rc": 1, "start": "2020-11-26 00:43:56.333090", "stderr": "", "stderr_lines": [], "stdout": "PING 172.16.0.120 (172.16.0.120) 56(84) bytes of data.\n\n--- 172.16.0.120 ping statistics ---\n10 packets transmitted, 0 received, 100% packet loss, time 218ms", "stdout_lines": ["PING 172.16.0.120 (172.16.0.120) 56(84) bytes of data.", "", "--- 172.16.0.120 ping statistics ---", "10 packets transmitted, 0 received, 100% packet loss, time 218ms"]}
```

The details section explains the reason for this.

Revision history for this message
Brendan Shephard (bshephar) wrote :

Sorry, correction. It results in two `name: br-NAME` bridges defined on after the other in `/etc/os-net-config/config.yaml`, not in OvS. The problem is that the second one doesn't get created in OvS.

So this output is from my config.yaml file:
```
- type: ovs_bridge
  name: br-tenant
  mtu: 9000
  dns_servers: ['8.8.8.8']
  use_dhcp: false
  addresses:
  - ip_netmask:
      172.16.0.120/24
  routes: []
  members:
  - type: interface
    name: nic5
    mtu: 9000
    use_dhcp: false
    primary: true
- type: ovs_bridge
  name: br-ex
  mtu: 9000
  dns_servers: ['8.8.8.8']
  use_dhcp: false
  addresses:
  - ip_netmask:
      172.20.10.23/16
  routes: [{'default': True, 'next_hop': '172.20.0.254'}]
```

Revision history for this message
Brendan Shephard (bshephar) wrote :

Doesn't seem to be an issue in the multiple_nics.j2, just multiple_nics_dvr.j2
https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_network_config/templates/multiple_nics/multiple_nics.j2#L22-L28

That one seems fine since we wrap each in the if statements. Just dvr needs to be fixed. I'll check the rest of the templates and send through a patch for review.

Revision history for this message
Brendan Shephard (bshephar) wrote :
Changed in tripleo:
assignee: nobody → Brendan Shephard (bshephar)
Revision history for this message
Brendan Shephard (bshephar) wrote :

Maybe a better solution here would be just to make the multiple_nics.j2 conform with the other templates and use the if statements for tenant and external. More like this:
https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_network_config/templates/multiple_nics/multiple_nics.j2#L22-L28

I can test that too. At least in my case, removing the extra name: allowed the deployment to proceed.

Revision history for this message
Rabi Mishra (rabi) wrote :
Changed in tripleo:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 2.1.0

This issue was fixed in the openstack/tripleo-ansible 2.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 3.0.0

This issue was fixed in the openstack/tripleo-ansible 3.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.