1. "Failed to bind port on host..." in neutron server logs are red herrings - that is the same also in the Centos 8 jobs and later that port is bound properly with baremetal mech_driver,
Node 635a9c6c-f912-4d71-a1ac-bfd2999595d7 reached failure state \"deploy failed\"; the last error is Timeout reached while waiting for callback for node 635a9c6c-f912-4d71-a1ac-bfd2999595d7
3. So I checked neutron server log looking for that ID and I found port which was bound on that node:
4. So from neutron perspective all seems ok for me here, I took quick look at ironic-conductor log and for that node I saw:
2022-01-26 11:22:56.909 2 INFO ironic.conductor.utils [req-07fc14b2-3dcd-4bce-83fe-047c74a90e11 admin - - - -] Successfully set node 635a9c6c-f912-4d71-a1ac-bfd2999595d7 power state to power on by rebooting.
2022-01-26 11:22:56.920 2 INFO ironic.conductor.deployments [req-07fc14b2-3dcd-4bce-83fe-047c74a90e11 admin - - - -] Deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node 635a9c6c-f912-4d71-a1ac-bfd2999595d7 being executed asynchronously, waiting for driver.
which is around the time when port was bound in neutron-server. Later there are only logs like:
2022-01-26 11:23:42.787 2 DEBUG ironic.conductor.task_manager [req-e518b3a7-dca1-43b1-8017-58cf8a497fcc - - - - -] Attempting to get shared lock on node 635a9c6c-f912-4d71-a1ac-bfd2999595d7 (for power state sync) __init__ /usr/lib/python3.9/site-packages/ironic/conductor/task_manager.py:235
2022-01-26 11:23:42.788 2 DEBUG ironic.conductor.task_manager [req-e518b3a7-dca1-43b1-8017-58cf8a497fcc - - - - -] Successfully released shared lock for power state sync on node 635a9c6c-f912-4d71-a1ac-bfd2999595d7 (lock was held 0.00 sec) release_resources /usr/lib/python3.9/site-packages/ironic/conductor/task_manager.py:448
And finally it is switched to deploy failed.
I don't know how Ironic works and how Neutron should works with Ironic so maybe I'm missing something there but for me currently it looks like someone from Ironic team should take a look into that issue.
I took a look again on the results of https:/ /logserver. rdoproject. org/82/ 38582/2/ check/periodic- tripleo- ci-centos- 9-ovb-1ctlr_ 1comp-featurese t002-master/ 3506780/ job-output. txt
Here are my findings:
1. "Failed to bind port on host..." in neutron server logs are red herrings - that is the same also in the Centos 8 jobs and later that port is bound properly with baremetal mech_driver,
2. In the log https:/ /logserver. rdoproject. org/82/ 38582/2/ check/periodic- tripleo- ci-centos- 9-ovb-1ctlr_ 1comp-featurese t002-master/ 3506780/ logs/undercloud /home/zuul/ overcloud_ node_provision. log.txt. gz I found error:
Node 635a9c6c- f912-4d71- a1ac-bfd2999595 d7 reached failure state \"deploy failed\"; the last error is Timeout reached while waiting for callback for node 635a9c6c- f912-4d71- a1ac-bfd2999595 d7
3. So I checked neutron server log looking for that ID and I found port which was bound on that node:
2022-01-26 11:21:27.169 15 DEBUG neutron. plugins. ml2.managers [req-eb2fe8f4- f2ab-4b7c- ae5b-0fd0e47297 11 - fake_project_id - - -] Bound port: 1baac803- 390e-4479- 833f-a03906167b 93, host: 635a9c6c- f912-4d71- a1ac-bfd2999595 d7, vif_type: other, vif_details: {"connectivity": "legacy"}, binding_levels: [{'bound_driver': 'baremetal', 'bound_segment': {'id': '40635e99- cbc6-444f- a2ba-a94b3f2ba0 f2', 'network_type': 'flat', 'physical_network': 'ctlplane', 'segmentation_id': None, 'network_id': '55fd99bc- 2c12-483d- 9b61-a45ae5714b af'}}] _bind_port_level /usr/lib/ python3. 9/site- packages/ neutron/ plugins/ ml2/managers. py:932
4. So from neutron perspective all seems ok for me here, I took quick look at ironic-conductor log and for that node I saw:
2022-01-26 11:22:56.909 2 INFO ironic. conductor. utils [req-07fc14b2- 3dcd-4bce- 83fe-047c74a90e 11 admin - - - -] Successfully set node 635a9c6c- f912-4d71- a1ac-bfd2999595 d7 power state to power on by rebooting. conductor. deployments [req-07fc14b2- 3dcd-4bce- 83fe-047c74a90e 11 admin - - - -] Deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node 635a9c6c- f912-4d71- a1ac-bfd2999595 d7 being executed asynchronously, waiting for driver.
2022-01-26 11:22:56.920 2 INFO ironic.
which is around the time when port was bound in neutron-server. Later there are only logs like:
2022-01-26 11:23:42.787 2 DEBUG ironic. conductor. task_manager [req-e518b3a7- dca1-43b1- 8017-58cf8a497f cc - - - - -] Attempting to get shared lock on node 635a9c6c- f912-4d71- a1ac-bfd2999595 d7 (for power state sync) __init__ /usr/lib/ python3. 9/site- packages/ ironic/ conductor/ task_manager. py:235 conductor. task_manager [req-e518b3a7- dca1-43b1- 8017-58cf8a497f cc - - - - -] Successfully released shared lock for power state sync on node 635a9c6c- f912-4d71- a1ac-bfd2999595 d7 (lock was held 0.00 sec) release_resources /usr/lib/ python3. 9/site- packages/ ironic/ conductor/ task_manager. py:448
2022-01-26 11:23:42.788 2 DEBUG ironic.
And finally it is switched to deploy failed.
I don't know how Ironic works and how Neutron should works with Ironic so maybe I'm missing something there but for me currently it looks like someone from Ironic team should take a look into that issue.