kolla-ansible

Low number of forks results in slow Neutron L3 agent restarts

Bug #2047045 reported by Adam Oswick on 2023-12-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	kolla-ansible	In Progress	Undecided	Adam Oswick

Bug Description

What happened
-------------

When running a Kolla Ansible deployment with a number of compute nodes greater than the number of forks, the time taken to complete was much greater than n*neutron_l3_agent_failover_delay (where n is the number of compute nodes).

What you expected to happen
---------------------------

The time taken to perform Neutron L3 agent container restarts is approximately equal to n*neutron_l3_agent_failover_delay.

How to reproduce it
-------------------

- Provision an environment with Kolla Ansible that has more than 1 compute node
- Set neutron_l3_agent_failover_delay to 30s
- Modify the Neutron L3 agent neutron.conf files to ensure the containers are restarted on the next run
- Run another Kolla Ansible deployment with the number of forks set to 1

The container restarts will take approximately n*n*neutron_l3_agent_failover_delay rather than n*neutron_l3_agent_failover_delay

Environment
-----------
Kolla-Ansible version -> Antelope

Revision history for this message

Adam Oswick (adamoswick) wrote on 2023-12-20:

This appears to be due to https://opendev.org/openstack/kolla-ansible/commit/391aa4677f394f1581df17fe74da968f19981e9d . As this loops through every host, there are hosts*hosts number of tasks to be run with a delay of neutron_l3_agent_failover_delay between each.

That has the expected result when number of forks == number of hosts but if forks is less than hosts then this takes much longer than expected. However, if number of forks is less than this, the process takes much longer even though no changes are actually happening (as the tasks being skipped occupy available forks).

Adam Oswick (adamoswick) on 2023-12-20

Changed in kolla-ansible:
assignee:	nobody → Adam Oswick (adamoswick)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-12-20: Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/904134

Changed in kolla-ansible:
status:	New → In Progress

Revision history for this message

Adam Oswick (adamoswick) wrote on 2023-12-20:

Right now, the Ansible output looks like the following with tasks on all hosts running simultaenously (if the number of forks allows for it).
------------------

RUNNING HANDLER [neutron : Restart running neutron-l3-agent container] *********
skipping: [compute02] => (item=compute01)
skipping: [compute03] => (item=compute01)
changed: [compute01] => (item=compute01)
skipping: [compute03] => (item=compute02)
skipping: [compute01] => (item=compute02)
changed: [compute02] => (item=compute02)
skipping: [compute01] => (item=compute03)
skipping: [compute02] => (item=compute03)
changed: [compute03] => (item=compute03)

The proposed fix should hopefully mean it instead looks like this instead, running just one task at a time.
-----------------------------------

RUNNING HANDLER [neutron : Restart running neutron-l3-agent container] *********
changed: [compute01] => (item=restart)
skipping: [compute01] => (item=pause)
changed: [compute02] => (item=restart)
skipping: [compute02] => (item=pause)
changed: [compute03] => (item=restart)
skipping: [compute03] => (item=pause)

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.