I am trying to deploy OpenStack 2023.2 with Kayobe/Kolla 2023.2
This is a fresh install, wiping everything.
The node PXE boots successfully, but then during deploy it fails quite quickly. I didn't catch errors during the deployment failure, but when I do `baremetal node rebuild` it reboots, PXE boots, and then promptly shuts down.
This is in the ironic log:
2024-08-23 21:22:31.508 4370 ERROR ironic.conductor.utils [-] Failed to load in-band deploy steps: Connection to agent failed: Failed to connect to the agent
running on node 2f3520d5-0d63-411b-87b4-1e44c87c326c for invoking command deploy.get_deploy_steps. Error: HTTPSConnectionPool(host='10.2.85.205', port=9999):
Max retries exceeded with url: /v1/commands/?wait=true&agent_token=GFRcmsKi3REaZdK2lSys1v4WtZvA8bZ3sXZUSWiG2nw (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate is not yet valid (_ssl.c:1007)'))): ironic.common.exception.AgentConnectionFailed: Connection to agent failed: Failed to connect to the agent running on node 2f3520d5-0d63-411b-87b4-1e44c87c326c for invoking command deploy.get_deploy_steps. Error: HTTPSConnectionPool(host='10.2.85.205', port=9999): Max retries exceeded with url: /v1/commands/?wait=true&agent_token=GFRcmsKi3REaZdK2lSys1v4WtZvA8bZ3sXZUSWiG2nw (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate is not yet valid (_ssl.c:1007)')))
2024-08-23 21:22:31.537 4370 DEBUG ironic.common.states [-] Exiting old state 'wait call-back' in response to event 'fail' on_exit /var/lib/kolla/venv/lib/python3.10/site-packages/ironic/common/states.py:360
2024-08-23 21:22:31.537 4370 DEBUG ironic.common.states [-] Entering new state 'deploy failed' in response to event 'fail' on_enter /var/lib/kolla/venv/lib/python3.10/site-packages/ironic/common/states.py:366
2024-08-23 21:22:31.547 4370 ERROR ironic.conductor.task_manager [-] Node 2f3520d5-0d63-411b-87b4-1e44c87c326c moved to provision state "deploy failed" from state "wait call-back"; target provision state is "active": ironic.common.exception.AgentConnectionFailed: Connection to agent failed: Failed to connect to the agent running on node 2f3520d5-0d63-411b-87b4-1e44c87c326c for invoking command deploy.get_deploy_steps. Error: HTTPSConnectionPool(host='10.2.85.205', port=9999): Max retries exceeded with url: /v1/commands/?wait=true&agent_token=GFRcmsKi3REaZdK2lSys1v4WtZvA8bZ3sXZUSWiG2nw (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate is not yet valid (_ssl.c:1007)')))
2024-08-23 21:22:31.549 4370 DEBUG oslo_concurrency.processutils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - -
-] Running cmd (subprocess): ipmitool -I lanplus -H 10.2.89.120 -L ADMINISTRATOR -U admin -R 1 -N 5 -f /tmp/tmp9e2u_hbc power status execute /var/lib/kolla/venv/lib/python3.10/site-packages/oslo_concurrency/processutils.py:384
2024-08-23 21:22:32.101 4370 INFO eventlet.wsgi.server [None req-8c7113c5-ca10-4db5-9d4b-b335d6feae0d admin - - - - -] 10.2.85.19,unix "GET /v1/nodes?fields=uuid HTTP/1.0" status: 200 len: 11428 time: 0.2877827
2024-08-23 21:22:34.997 4370 DEBUG oslo_concurrency.processutils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - -
-] CMD "ipmitool -I lanplus -H 10.2.89.120 -L ADMINISTRATOR -U admin -R 1 -N 5 -f /tmp/tmp9e2u_hbc power status" returned: 0 in 3.448s execute /var/lib/kolla/venv/lib/python3.10/site-packages/oslo_concurrency/processutils.py:422
2024-08-23 21:22:34.997 4370 DEBUG ironic.common.utils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - - -] Execution completed, command line is "ipmitool -I lanplus -H 10.2.89.120 -L ADMINISTRATOR -U admin -R 1 -N 5 -f /tmp/tmp9e2u_hbc power status" execute /var/lib/kolla/venv/lib/python3.10/site-packages/ironic/common/utils.py:90
2024-08-23 21:22:34.998 4370 DEBUG ironic.common.utils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - - -] Command stdout is: "Chassis Power is on
2024-08-23 21:22:34.998 4370 DEBUG ironic.common.utils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - - -] Command stderr is: "" execute /var/lib/kolla/venv/lib/python3.10/site-packages/ironic/common/utils.py:93
2024-08-23 21:22:34.999 4370 INFO ironic.conductor.utils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - - -] Node
2f3520d5-0d63-411b-87b4-1e44c87c326c current power state is 'power on', requested state is 'power off'.
10.2.89.120 is the ILO IP, and the node in question has UUID 2f3520d5-0d63-411b-87b4-1e44c87c326c. It seems that just a minute or two after the `rebuild` command starts, it decides the certificate is from the future, and promptly shuts the server down before it's even had a chance to be deployed.
I did a fresh pull of the Kolla images earlier today, so it's using the most recent bifrost-deploy:2023.2-ubuntu-jammy container.
I'm not sure if this should be under Kolla-Ansible or Ironic, please advise. Thanks!
Realized that the hardware clock was off. Would be nice if that could be checked or even adjusted during build/rebuild.