Fresh deployment, rebuild fails: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate is not yet valid

Bug #2077764 reported by Martin Ananda Boeker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
New
Undecided
Unassigned

Bug Description

I am trying to deploy OpenStack 2023.2 with Kayobe/Kolla 2023.2

This is a fresh install, wiping everything.

The node PXE boots successfully, but then during deploy it fails quite quickly. I didn't catch errors during the deployment failure, but when I do `baremetal node rebuild` it reboots, PXE boots, and then promptly shuts down.

This is in the ironic log:

2024-08-23 21:22:31.508 4370 ERROR ironic.conductor.utils [-] Failed to load in-band deploy steps: Connection to agent failed: Failed to connect to the agent
running on node 2f3520d5-0d63-411b-87b4-1e44c87c326c for invoking command deploy.get_deploy_steps. Error: HTTPSConnectionPool(host='10.2.85.205', port=9999):
Max retries exceeded with url: /v1/commands/?wait=true&agent_token=GFRcmsKi3REaZdK2lSys1v4WtZvA8bZ3sXZUSWiG2nw (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate is not yet valid (_ssl.c:1007)'))): ironic.common.exception.AgentConnectionFailed: Connection to agent failed: Failed to connect to the agent running on node 2f3520d5-0d63-411b-87b4-1e44c87c326c for invoking command deploy.get_deploy_steps. Error: HTTPSConnectionPool(host='10.2.85.205', port=9999): Max retries exceeded with url: /v1/commands/?wait=true&agent_token=GFRcmsKi3REaZdK2lSys1v4WtZvA8bZ3sXZUSWiG2nw (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate is not yet valid (_ssl.c:1007)')))
2024-08-23 21:22:31.537 4370 DEBUG ironic.common.states [-] Exiting old state 'wait call-back' in response to event 'fail' on_exit /var/lib/kolla/venv/lib/python3.10/site-packages/ironic/common/states.py:360
2024-08-23 21:22:31.537 4370 DEBUG ironic.common.states [-] Entering new state 'deploy failed' in response to event 'fail' on_enter /var/lib/kolla/venv/lib/python3.10/site-packages/ironic/common/states.py:366
2024-08-23 21:22:31.547 4370 ERROR ironic.conductor.task_manager [-] Node 2f3520d5-0d63-411b-87b4-1e44c87c326c moved to provision state "deploy failed" from state "wait call-back"; target provision state is "active": ironic.common.exception.AgentConnectionFailed: Connection to agent failed: Failed to connect to the agent running on node 2f3520d5-0d63-411b-87b4-1e44c87c326c for invoking command deploy.get_deploy_steps. Error: HTTPSConnectionPool(host='10.2.85.205', port=9999): Max retries exceeded with url: /v1/commands/?wait=true&agent_token=GFRcmsKi3REaZdK2lSys1v4WtZvA8bZ3sXZUSWiG2nw (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate is not yet valid (_ssl.c:1007)')))
2024-08-23 21:22:31.549 4370 DEBUG oslo_concurrency.processutils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - -
-] Running cmd (subprocess): ipmitool -I lanplus -H 10.2.89.120 -L ADMINISTRATOR -U admin -R 1 -N 5 -f /tmp/tmp9e2u_hbc power status execute /var/lib/kolla/venv/lib/python3.10/site-packages/oslo_concurrency/processutils.py:384
2024-08-23 21:22:32.101 4370 INFO eventlet.wsgi.server [None req-8c7113c5-ca10-4db5-9d4b-b335d6feae0d admin - - - - -] 10.2.85.19,unix "GET /v1/nodes?fields=uuid HTTP/1.0" status: 200 len: 11428 time: 0.2877827
2024-08-23 21:22:34.997 4370 DEBUG oslo_concurrency.processutils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - -
-] CMD "ipmitool -I lanplus -H 10.2.89.120 -L ADMINISTRATOR -U admin -R 1 -N 5 -f /tmp/tmp9e2u_hbc power status" returned: 0 in 3.448s execute /var/lib/kolla/venv/lib/python3.10/site-packages/oslo_concurrency/processutils.py:422
2024-08-23 21:22:34.997 4370 DEBUG ironic.common.utils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - - -] Execution completed, command line is "ipmitool -I lanplus -H 10.2.89.120 -L ADMINISTRATOR -U admin -R 1 -N 5 -f /tmp/tmp9e2u_hbc power status" execute /var/lib/kolla/venv/lib/python3.10/site-packages/ironic/common/utils.py:90
2024-08-23 21:22:34.998 4370 DEBUG ironic.common.utils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - - -] Command stdout is: "Chassis Power is on
2024-08-23 21:22:34.998 4370 DEBUG ironic.common.utils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - - -] Command stderr is: "" execute /var/lib/kolla/venv/lib/python3.10/site-packages/ironic/common/utils.py:93
2024-08-23 21:22:34.999 4370 INFO ironic.conductor.utils [req-1ee1e044-465c-4ca0-8547-1906b18d87c6 req-8744094e-b9ad-445e-90f0-4c9fa86aaad4 - - - - - -] Node
2f3520d5-0d63-411b-87b4-1e44c87c326c current power state is 'power on', requested state is 'power off'.

10.2.89.120 is the ILO IP, and the node in question has UUID 2f3520d5-0d63-411b-87b4-1e44c87c326c. It seems that just a minute or two after the `rebuild` command starts, it decides the certificate is from the future, and promptly shuts the server down before it's even had a chance to be deployed.

I did a fresh pull of the Kolla images earlier today, so it's using the most recent bifrost-deploy:2023.2-ubuntu-jammy container.

I'm not sure if this should be under Kolla-Ansible or Ironic, please advise. Thanks!

Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

Realized that the hardware clock was off. Would be nice if that could be checked or even adjusted during build/rebuild.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.