Comment 8 for bug 1958955

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I took a look again on the results of https://logserver.rdoproject.org/82/38582/2/check/periodic-tripleo-ci-centos-9-ovb-1ctlr_1comp-featureset002-master/3506780/job-output.txt

Here are my findings:

1. "Failed to bind port on host..." in neutron server logs are red herrings - that is the same also in the Centos 8 jobs and later that port is bound properly with baremetal mech_driver,

2. In the log https://logserver.rdoproject.org/82/38582/2/check/periodic-tripleo-ci-centos-9-ovb-1ctlr_1comp-featureset002-master/3506780/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz I found error:

Node 635a9c6c-f912-4d71-a1ac-bfd2999595d7 reached failure state \"deploy failed\"; the last error is Timeout reached while waiting for callback for node 635a9c6c-f912-4d71-a1ac-bfd2999595d7

3. So I checked neutron server log looking for that ID and I found port which was bound on that node:

2022-01-26 11:21:27.169 15 DEBUG neutron.plugins.ml2.managers [req-eb2fe8f4-f2ab-4b7c-ae5b-0fd0e4729711 - fake_project_id - - -] Bound port: 1baac803-390e-4479-833f-a03906167b93, host: 635a9c6c-f912-4d71-a1ac-bfd2999595d7, vif_type: other, vif_details: {"connectivity": "legacy"}, binding_levels: [{'bound_driver': 'baremetal', 'bound_segment': {'id': '40635e99-cbc6-444f-a2ba-a94b3f2ba0f2', 'network_type': 'flat', 'physical_network': 'ctlplane', 'segmentation_id': None, 'network_id': '55fd99bc-2c12-483d-9b61-a45ae5714baf'}}] _bind_port_level /usr/lib/python3.9/site-packages/neutron/plugins/ml2/managers.py:932

4. So from neutron perspective all seems ok for me here, I took quick look at ironic-conductor log and for that node I saw:

2022-01-26 11:22:56.909 2 INFO ironic.conductor.utils [req-07fc14b2-3dcd-4bce-83fe-047c74a90e11 admin - - - -] Successfully set node 635a9c6c-f912-4d71-a1ac-bfd2999595d7 power state to power on by rebooting.
2022-01-26 11:22:56.920 2 INFO ironic.conductor.deployments [req-07fc14b2-3dcd-4bce-83fe-047c74a90e11 admin - - - -] Deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node 635a9c6c-f912-4d71-a1ac-bfd2999595d7 being executed asynchronously, waiting for driver.

which is around the time when port was bound in neutron-server. Later there are only logs like:

2022-01-26 11:23:42.787 2 DEBUG ironic.conductor.task_manager [req-e518b3a7-dca1-43b1-8017-58cf8a497fcc - - - - -] Attempting to get shared lock on node 635a9c6c-f912-4d71-a1ac-bfd2999595d7 (for power state sync) __init__ /usr/lib/python3.9/site-packages/ironic/conductor/task_manager.py:235
2022-01-26 11:23:42.788 2 DEBUG ironic.conductor.task_manager [req-e518b3a7-dca1-43b1-8017-58cf8a497fcc - - - - -] Successfully released shared lock for power state sync on node 635a9c6c-f912-4d71-a1ac-bfd2999595d7 (lock was held 0.00 sec) release_resources /usr/lib/python3.9/site-packages/ironic/conductor/task_manager.py:448

And finally it is switched to deploy failed.

I don't know how Ironic works and how Neutron should works with Ironic so maybe I'm missing something there but for me currently it looks like someone from Ironic team should take a look into that issue.