Low number of forks results in slow Neutron L3 agent restarts

Bug #2047045 reported by Adam Oswick
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
In Progress
Undecided
Adam Oswick

Bug Description

What happened
-------------

When running a Kolla Ansible deployment with a number of compute nodes greater than the number of forks, the time taken to complete was much greater than n*neutron_l3_agent_failover_delay (where n is the number of compute nodes).

What you expected to happen
---------------------------

The time taken to perform Neutron L3 agent container restarts is approximately equal to n*neutron_l3_agent_failover_delay.

How to reproduce it
-------------------

- Provision an environment with Kolla Ansible that has more than 1 compute node
- Set neutron_l3_agent_failover_delay to 30s
- Modify the Neutron L3 agent neutron.conf files to ensure the containers are restarted on the next run
- Run another Kolla Ansible deployment with the number of forks set to 1

The container restarts will take approximately n*n*neutron_l3_agent_failover_delay rather than n*neutron_l3_agent_failover_delay

Environment
-----------
Kolla-Ansible version -> Antelope

Revision history for this message
Adam Oswick (adamoswick) wrote :

This appears to be due to https://opendev.org/openstack/kolla-ansible/commit/391aa4677f394f1581df17fe74da968f19981e9d . As this loops through every host, there are hosts*hosts number of tasks to be run with a delay of neutron_l3_agent_failover_delay between each.

That has the expected result when number of forks == number of hosts but if forks is less than hosts then this takes much longer than expected. However, if number of forks is less than this, the process takes much longer even though no changes are actually happening (as the tasks being skipped occupy available forks).

Adam Oswick (adamoswick)
Changed in kolla-ansible:
assignee: nobody → Adam Oswick (adamoswick)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)
Changed in kolla-ansible:
status: New → In Progress
Revision history for this message
Adam Oswick (adamoswick) wrote :

Right now, the Ansible output looks like the following with tasks on all hosts running simultaenously (if the number of forks allows for it).
------------------

RUNNING HANDLER [neutron : Restart running neutron-l3-agent container] *********
skipping: [compute02] => (item=compute01)
skipping: [compute03] => (item=compute01)
changed: [compute01] => (item=compute01)
skipping: [compute03] => (item=compute02)
skipping: [compute01] => (item=compute02)
changed: [compute02] => (item=compute02)
skipping: [compute01] => (item=compute03)
skipping: [compute02] => (item=compute03)
changed: [compute03] => (item=compute03)

The proposed fix should hopefully mean it instead looks like this instead, running just one task at a time.
-----------------------------------

RUNNING HANDLER [neutron : Restart running neutron-l3-agent container] *********
changed: [compute01] => (item=restart)
skipping: [compute01] => (item=pause)
changed: [compute02] => (item=restart)
skipping: [compute02] => (item=pause)
changed: [compute03] => (item=restart)
skipping: [compute03] => (item=pause)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.