Steps to reproduce with devstack, on devstack master commit 9be4ceeaa10f6ed92291e77ec52794acfb67c147
The `AggregateInstanceExtraSpecsFilter` is only added to trigger a log message and/or scheduling failures from the stale aggregate info, extra debug logging in _update_aggregates will show the inconsistent state even without the added filter.
### Adding logging to the host_manager helps to see what's going on:
```
diff --git a/nova/scheduler/host_manager.py b/nova/scheduler/host_manager.py
index 8cb775a923..c9894c79fa 100644
--- a/nova/scheduler/host_manager.py
+++ b/nova/scheduler/host_manager.py
@@ -392,6 +392,8 @@ class HostManager(object):
def _update_aggregate(self, aggregate): self.aggs_by_id[aggregate.id] = aggregate
+
+ LOG.debug(f"update for {aggregate.id} called with {aggregate.hosts}")
for host in aggregate.hosts: self.host_aggregates_map[host].add(aggregate.id)
# Refreshing the mapping dict to remove all hosts that are no longer
```
# just addition of AggregateInstanceExtraSpecsFilter to exercise the issue
[filter_scheduler]
enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,SameHostFilter,DifferentHostFilter,AggregateInstanceExtraSpecsFilter
```
### aggregate and flavor setup for AggregateInstanceExtraSpecsFilter
### add hosts to aggregate in parallel
It is not guaranteed to trigger the issue, so several attempts may be needed.
Looking at the debug logs from host manager will show if the last applied RPC has an incomplete list of hosts in the aggregate.
The issue seems easier to trigger the more closely spaced in time the requests are, such as doing it via openstacksdk and reusing the session and avoiding the python startup time.
```
openstack hypervisor list -c "Hypervisor Hostname" -f value \
| xargs -I {} -P 10 -n 1 \
openstack aggregate add host test_agg -c hosts -f value {}
```
This will show responses like the following:
```
['devstack8']
['devstack8', 'devstack1']
['devstack8', 'devstack1', 'devstack2']
['devstack8', 'devstack3']
['devstack8', 'devstack1', 'devstack7']
['devstack8', 'devstack4']
['devstack8', 'devstack6']
['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack10']
['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack9']
['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack5']
```
At this point, viewing the aggregate info directly does show the correct memebership
```
$ openstack aggregate show test_agg --max-width=80
+-------------------+----------------------------------------------------------+
| Field | Value |
+-------------------+----------------------------------------------------------+
| availability_zone | None |
| created_at | 2024-04-25T15:43:45.000000 |
| deleted_at | None |
| hosts | devstack1, devstack10, devstack2, devstack3, devstack4, |
| | devstack5, devstack6, devstack7, devstack8, devstack9 |
| id | 1 |
| is_deleted | False |
| name | test_agg |
| properties | test='true' |
| updated_at | None |
| uuid | 6700b896-34fb-4e49-9057-e1d40ce185ec |
+-------------------+----------------------------------------------------------+
```
If the extra logging was applied, we will now see the following in the nova scheduler debug logs:
```
...
Apr 25 15:48:01 devstack nova-scheduler[172360]: DEBUG nova.scheduler.host_manager [None req-37a6f8bd-f1b1-4fb9-af2a-b0f66aff54cf admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172360) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172320]: DEBUG nova.scheduler.host_manager [None req-3126c7d9-b7f3-4408-aaec-800a78236bb6 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172320) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172316]: DEBUG nova.scheduler.host_manager [None req-37a6f8bd-f1b1-4fb9-af2a-b0f66aff54cf admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172316) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172326]: DEBUG nova.scheduler.host_manager [None req-6d438ed6-e35d-48c3-b618-d3c62de50ac0 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack10'] {{(pid=172326) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172273]: DEBUG nova.scheduler.host_manager [None req-6d438ed6-e35d-48c3-b618-d3c62de50ac0 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack10'] {{(pid=172273) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172360]: DEBUG nova.scheduler.host_manager [None req-3126c7d9-b7f3-4408-aaec-800a78236bb6 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172360) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172320]: DEBUG nova.scheduler.host_manager [None req-37a6f8bd-f1b1-4fb9-af2a-b0f66aff54cf admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172320) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172273]: DEBUG nova.scheduler.host_manager [None req-3126c7d9-b7f3-4408-aaec-800a78236bb6 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172273) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172326]: DEBUG nova.scheduler.host_manager [None req-3126c7d9-b7f3-4408-aaec-800a78236bb6 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172326) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172320]: DEBUG nova.scheduler.host_manager [None req-6d438ed6-e35d-48c3-b618-d3c62de50ac0 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack10'] {{(pid=172320) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172273]: DEBUG nova.scheduler.host_manager [None req-37a6f8bd-f1b1-4fb9-af2a-b0f66aff54cf admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172273) _update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
```
and if we now schedule some instances, we'll see log entries indicating that the host_state is still inconsistent.
There's still something I'm not understanding about the nova_fake driver and/or the AggregateInstanceExtraSpecsFilter, as all 10 hosts still become "active", even though the filter is excluding 3 of them.
Steps to reproduce with devstack, on devstack master commit 9be4ceeaa10f6ed 92291e77ec52794 acfb67c147
The `AggregateInsta nceExtraSpecsFi lter` is only added to trigger a log message and/or scheduling failures from the stale aggregate info, extra debug logging in _update_aggregates will show the inconsistent state even without the added filter.
### Adding logging to the host_manager helps to see what's going on:
``` scheduler/ host_manager. py b/nova/ scheduler/ host_manager. py .c9894c79fa 100644 scheduler/ host_manager. py scheduler/ host_manager. py object) :
diff --git a/nova/
index 8cb775a923.
--- a/nova/
+++ b/nova/
@@ -392,6 +392,8 @@ class HostManager(
def _update_ aggregate( self, aggregate):
self. aggs_by_ id[aggregate. id] = aggregate
self. host_aggregates _map[host] .add(aggregate. id)
+
+ LOG.debug(f"update for {aggregate.id} called with {aggregate.hosts}")
for host in aggregate.hosts:
# Refreshing the mapping dict to remove all hosts that are no longer
```
### Local.conf:
``` secret PASSWORD= $ADMIN_ PASSWORD PASSWORD= $ADMIN_ PASSWORD PASSWORD= $ADMIN_ PASSWORD
[[local|localrc]]
ADMIN_PASSWORD=
DATABASE_
RABBIT_
SERVICE_
VIRT_DRIVER=fake FAKE_NOVA_ COMPUTE= 10
NUMBER_
[[post- config| $NOVA_CONF] ]
# just addition of AggregateInstan ceExtraSpecsFil ter to exercise the issue ComputeCapabili tiesFilter, ImageProperties Filter, ServerGroupAnti AffinityFilter, ServerGroupAffi nityFilter, SameHostFilter, DifferentHostFi lter,AggregateI nstanceExtraSpe csFilter
[filter_scheduler]
enabled_filters = ComputeFilter,
```
### aggregate and flavor setup for AggregateInstan ceExtraSpecsFil ter
```
openstack aggregate create test_agg
openstack aggregate set --property "test=true" test_agg
openstack flavor create --ram 512 --disk 1 --vcpus 1 test_flavor instance_ extra_specs: test=true" test_flavor
openstack flavor set --property "aggregate_
```
### add hosts to aggregate in parallel
It is not guaranteed to trigger the issue, so several attempts may be needed.
Looking at the debug logs from host manager will show if the last applied RPC has an incomplete list of hosts in the aggregate.
The issue seems easier to trigger the more closely spaced in time the requests are, such as doing it via openstacksdk and reusing the session and avoiding the python startup time.
```
openstack hypervisor list -c "Hypervisor Hostname" -f value \
| xargs -I {} -P 10 -n 1 \
openstack aggregate add host test_agg -c hosts -f value {}
```
This will show responses like the following:
```
['devstack8']
['devstack8', 'devstack1']
['devstack8', 'devstack1', 'devstack2']
['devstack8', 'devstack3']
['devstack8', 'devstack1', 'devstack7']
['devstack8', 'devstack4']
['devstack8', 'devstack6']
['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack10']
['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack9']
['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack5']
```
At this point, viewing the aggregate info directly does show the correct memebership ------- ------+ ------- ------- ------- ------- ------- ------- ------- ------- --+ ------- ------+ ------- ------- ------- ------- ------- ------- ------- ------- --+ 25T15:43: 45.000000 | 34fb-4e49- 9057-e1d40ce185 ec | ------- ------+ ------- ------- ------- ------- ------- ------- ------- ------- --+
```
$ openstack aggregate show test_agg --max-width=80
+------
| Field | Value |
+------
| availability_zone | None |
| created_at | 2024-04-
| deleted_at | None |
| hosts | devstack1, devstack10, devstack2, devstack3, devstack4, |
| | devstack5, devstack6, devstack7, devstack8, devstack9 |
| id | 1 |
| is_deleted | False |
| name | test_agg |
| properties | test='true' |
| updated_at | None |
| uuid | 6700b896-
+------
```
If the extra logging was applied, we will now see the following in the nova scheduler debug logs:
``` 172360] : DEBUG nova.scheduler. host_manager [None req-37a6f8bd- f1b1-4fb9- af2a-b0f66aff54 cf admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172360) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} } 172320] : DEBUG nova.scheduler. host_manager [None req-3126c7d9- b7f3-4408- aaec-800a78236b b6 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172320) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} } 172316] : DEBUG nova.scheduler. host_manager [None req-37a6f8bd- f1b1-4fb9- af2a-b0f66aff54 cf admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172316) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} } 172326] : DEBUG nova.scheduler. host_manager [None req-6d438ed6- e35d-48c3- b618-d3c62de50a c0 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack10'] {{(pid=172326) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} } 172273] : DEBUG nova.scheduler. host_manager [None req-6d438ed6- e35d-48c3- b618-d3c62de50a c0 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack10'] {{(pid=172273) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} } 172360] : DEBUG nova.scheduler. host_manager [None req-3126c7d9- b7f3-4408- aaec-800a78236b b6 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172360) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} } 172320] : DEBUG nova.scheduler. host_manager [None req-37a6f8bd- f1b1-4fb9- af2a-b0f66aff54 cf admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172320) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} } 172273] : DEBUG nova.scheduler. host_manager [None req-3126c7d9- b7f3-4408- aaec-800a78236b b6 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172273) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} } 172326] : DEBUG nova.scheduler. host_manager [None req-3126c7d9- b7f3-4408- aaec-800a78236b b6 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172326) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} } 172320] : DEBUG nova.scheduler. host_manager [None req-6d438ed6- e35d-48c3- b618-d3c62de50a c0 admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack10'] {{(pid=172320) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} } 172273] : DEBUG nova.scheduler. host_manager [None req-37a6f8bd- f1b1-4fb9- af2a-b0f66aff54 cf admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172273) _update_aggregate /opt/stack/ nova/nova/ scheduler/ host_manager. py:396} }
...
Apr 25 15:48:01 devstack nova-scheduler[
Apr 25 15:48:01 devstack nova-scheduler[
Apr 25 15:48:01 devstack nova-scheduler[
Apr 25 15:48:01 devstack nova-scheduler[
Apr 25 15:48:01 devstack nova-scheduler[
Apr 25 15:48:01 devstack nova-scheduler[
Apr 25 15:48:01 devstack nova-scheduler[
Apr 25 15:48:01 devstack nova-scheduler[
Apr 25 15:48:01 devstack nova-scheduler[
Apr 25 15:48:01 devstack nova-scheduler[
Apr 25 15:48:01 devstack nova-scheduler[
```
and if we now schedule some instances, we'll see log entries indicating that the host_state is still inconsistent.
``` 0.6.2-x86_ 64-disk \
openstack server create \
--image cirros-
--network private \
--min=10 --max=10 \
--flavor test_flavor \
instance1
```
``` 172268] : DEBUG nova.filters [None req-4afd37c0- ec15-4aae- 8f2b-a5ae7920aa c8 admin admin] Filter AggregateInstan ceExtraSpecsFil ter returned 7 host(s) {{(pid=172268) get_filtered_ objects /opt/stack/ nova/nova/ filters. py:102} }
Apr 25 15:50:56 devstack nova-scheduler[
```
There's still something I'm not understanding about the nova_fake driver and/or the AggregateInstan ceExtraSpecsFil ter, as all 10 hosts still become "active", even though the filter is excluding 3 of them.