Change of neutron_external_network and neutron_bridge_name in globals.yml does not cause change in Ansible

Bug #1786712 reported by Eric Miller
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Unassigned
Rocky
Fix Released
Undecided
Unassigned

Bug Description

This may be expected, and rather I don't know how to properly make this change, but I thought I'd see if someone had any recommendations or mark this as a bug or feature request.

Kolla-Ansible version: 6.1.0
Built from: source
Distro: CentOS 7.5

I need to add a bridge to provide a physical connection for Octavia's management network, and the only way I could find to do this is to add another external bridge and interface.

So, for example, I added this to the globals.yml file:
neutron_bridge_name: "br-ex,br-mgmt"
neutron_external_interface: "team0.1000,team0.1001"

where team0.1000 is the existing external bridge and team0.1001 is the new bridge where Octavia Amphorae will talk to the Octavia control plane services.

However, making this change, and running a kolla-ansible "reconfigure" did not change the neutron-openvswitch-agent container's ml2_conf.ini file in any container.

I was expecting the task "Copying over ml2_conf.ini" in the kolla-ansible/roles/neutron/tasks/config.yml file would process the ml2_conf.ini.j2 file, resulting in a "change", but Ansible reports "change": false on all containers that are in the group (where "hosts_in_group": true and "key": "neutron-openvswitch-agent").

I even "touched" the ml2_conf.ini.j2 file, in case Ansible didn't see a file change, but that also did not trigger Ansible to change the file.

Also - if there is a better way to add the management network bridge for Octavia, I would love to know. :)

Thanks!

Eric

Revision history for this message
Eric Miller (erickmiller) wrote :

So it appears that the ml2_conf.ini file "is" getting updated on the network nodes in this directory: /etc/kolla/neutron-openvswitch-agent

And thus they appear in the docker volume attached to the neutron_openvswitch-agent container, mounted here: /var/lib/kolla/config_files/ (since the paths are one in the same on the host)

However, the ml2_conf.ini file is not being copied to the /etc/neutron/plugins/ml2 directory inside the container, which I "think" is done when the container is restarted.

So, i restarted the container manually with:
docker container restart neutron_openvswitch_agent

but this container restarts over and over within 10 seconds or less.

I removed the changes that were made, specifically:
[ml2_type_flat]
flat_networks = physnet1,physnet2
[ovs]
bridge_mappings = physnet1:br-ex,physnet2:br-mgmt

back to:

[ml2_type_flat]
flat_networks = physnet1
[ovs]
bridge_mappings = physnet1:br-ex

and the container started again. I also verified that these changes were copied to the /etc/neutron/plugins/ml2/ml2_conf.ini file during the container start, so the copy process appears to be working correctly.

This indicates that the config.yml file is not restarting the container - which may or may not be a bug?

However, I guess I'm back to the original goal of creating a provider network for the Octavia amphorae management. Any suggestions how to do this in Kolla-Ansible?

Thanks!

Eric

Revision history for this message
Eric Miller (erickmiller) wrote :

In case it helps someone else, since I didn't know this until just now, there is a config.json file on each host that defines what each container copies from the host. For example, /etc/kolla/neutron-openvswitch-agent/config.json lists the config copy commands that copy files like the ml2_conf.ini file from the container's attached volume to the container's local filesystem:

        {
            "source": "/var/lib/kolla/config_files/ml2_conf.ini",
            "dest": "/etc/neutron/plugins/ml2/ml2_conf.ini",
            "owner": "neutron",
            "perm": "0600"
        },

Eric

Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

Hi,

I see two main issues here:

1- Not restarting ovs-agent during reconfiguration:

Config.yml is copying the file here:

https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/neutron/tasks/config.yml#L127

Not sure why ansible doesnt gives a changed=true in this task, maybe have custom configs at /etc/kolla/neutron which overrides the values of the bridges?

If the task changes, handlers will be raised for ovs-agent and be restarted here https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/neutron/handlers/main.yml#L35

2- Bridges not created

Bridges are created in openvswitch role, only raised when handlers are called:

https://github.com/openstack/kolla-ansible/blob/cd03876e7d646fe7bc1842888ef79729f52a744b/ansible/roles/openvswitch/handlers/main.yml#L35

At this moment we will need to think how to add a check for the ovs bridges differs for raising a notify to handlers.

Regards

Changed in kolla-ansible:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

A workaround for issue 2 could be fixed by creating the bridges manually on each network host (computes too if using provider networks or dvr):

docker exec openvswitch_db /usr/local/bin/kolla_ensure_openvswitch_configured ${neutron_bridge_name} ${neutron_external_interface}

After the bridges are created, the ovs-agent container can be restarted with the proper config file changes.

Regards

Revision history for this message
Eric Miller (erickmiller) wrote :

Thanks for the response Eduardo! I will try to manually create the extra bridge and see if a restart of the container is successful.

Note that if #1 is fixed, it will cause the restart failure of the ovs-agent container if the bridge hasn't been configured first, so these two problems should be corrected together.

I'll be back with results shortly.

Revision history for this message
Eric Miller (erickmiller) wrote :

I finally had a chance to get back to this, and am still working on it. After adding another bridge to one of the compute nodes' OVS config using this, for example:

ssh compute001 docker exec openvswitch_db /usr/local/bin/kolla_ensure_openvswitch_configured br-mgmt ens160

where ens160 resides on the same interface as the OpenStack control plane APIs, including Octavia APIs, network connectivity to the compute node ceased.

I can open a console to the compute node VM (this is a test environment running in VMs on ESXi), and everything is responsive.

I can run: ovs-vsctl list-br
in the openvswitch_db container and get:

br-ex
br-int
br-mgmt
br-tun

as expected. And: ovs-vsctl list-ports br-mgmt
outputs:

ens160

so, that is also expected. So, I'm working on this at the moment. I thought I might have created a loop somewhere, but openvswitch does not have high cpu utilization, and ovs-dpctl -s show indicates that there are some dropped RX packets on br-mgmt, but nothing necessarily unusual on the ens160 interface.

Anyways, I'll report back if/when I figure it out.

Eric

Revision history for this message
Eric Miller (erickmiller) wrote :

I should also have mentioned that if I delete the bridge using: ovs-vsctl del-br br-mgmt

I can now pass traffic to this compute node.

Eric

Revision history for this message
Eric Miller (erickmiller) wrote :

If I create another vNIC on the VM, attached to the same VLAN, and add a bridge and port manually with: ovs-vsctl add-port br-mgmt ens224

there are no issues with traffic to the compute node.

After deleting this port and adding the management vNIC instead using: ovs-vsctl add-port br-mgmt ens160

all traffic to the compute node ceases. So, it must be something internal to the compute node.

Revision history for this message
Eric Miller (erickmiller) wrote :

I found the problem pretty quickly. :) In this FAQ:
http://docs.openvswitch.org/en/latest/faq/issues/

The first question/answer indicates:

A physical Ethernet device that is part of an Open vSwitch bridge should not have an IP address. If one does, then that IP address will not be fully functional.

So, this makes sense since the network interface outside of openvswitch has an IP address (the IP of the compute node).

Thus, it looks like I need to use a 3rd NIC, preferably with a different VLAN, for Octavia traffic. I'll need to look at what it takes to get the APIs to bind to this VLAN instead of the management VLAN, though.

Revision history for this message
Eric Miller (erickmiller) wrote :

I'm still trying to figure out a solution to this, but it appears that the Octavia configuration in Kolla-Ansible is a bit flawed since the configuration file here:
/kolla-ansible/ansible/roles/octavia/templates/octavia.conf.j2

places the Octavia API on {{ api_interface_address }} which is an IP on the same interface that a compute node uses from the configuration file:
/kolla-ansible/ansible/roles/nova/templates/nova.conf.j2

which will have a host IP address bound to it, and thus won't work with an OVS bridge attached as mentioned in the previous comment.

Maybe there should be an {{ octavia_api_interface_address }} on an interface defined by {{ octavia_interface }}? Similar to what can be specified in globals.yml for other services here:

#api_interface: "{{ network_interface }}"
#storage_interface: "{{ network_interface }}"
#cluster_interface: "{{ network_interface }}"
#tunnel_interface: "{{ network_interface }}"
#dns_interface: "{{ network_interface }}"

except Octavia would "require" a different interface than {{ network_interface }}.

Or is there something I have missed that is much simpler?

Revision history for this message
Eric Miller (erickmiller) wrote :

Hi Eduardo,

The issue that I had with OVS that caused host traffic blocking, when an OVS bridge was created with a port on the host's management network interface, made me wonder if this was the cause of a second external bridge not being created that I mentioned in my second comment of this ticket.

So, I created a 3rd NIC on each node VM in our lab with freshly created VMs (bare CentOS 7.5), and assigned the NICs to their respective bridges using this in the globals.yml file:

neutron_bridge_name: "br-ex,br-prov001"
neutron_external_interface: "ens192,ens224"

And surprisingly, the second bridge was created properly. Note that this was during an "install" with Kolla-Ansible, not a "reconfigure", and thus not an "addition" of a bridge to an existing deployment. I will test that soon with a yet another NIC and see if the bridge is created and ovs agent container restarted.

I will see if I can get Octavia APIs and Amphorae to talk on this new interface (ens224 connected to bridge br-prov001, short for provider network 001). If it works, I will provide the respective changes to the globals.yml and octavia.conf.j2 files that I have made.

I likely won't have an answer until Monday.

Eric

Revision history for this message
Mark Goddard (mgoddard) wrote :

I think this was fixed by https://review.openstack.org/#/c/606033/, and backported to rocky. Please reopen if not.

Changed in kolla-ansible:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.