OpenStack Compute (nova)

Bug #2006467
Comment #1

Comment 1 for bug 2006467

Revision history for this message

Sylvain Bauza (sylvain-bauza) wrote on 2023-02-15:

Okay, I did a bit of digging today for some other CI failure I saw on another change and eventually, I found this was related.

So, lemme explain the issue here. First, I was looking at https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6f9/868236/5/gate/nova-next/6f9f3d0/ and I was wondering why the SSH connection wasn't working.

When I looked at the nova logs, I found that the instance was spawned at 18:18:56 :
Feb 14 18:18:56.514945 np0033093378 nova-compute[83239]: INFO nova.compute.manager [None req-053318ab-09ad-4a3a-8ddb-633cc0002c3e tempest-AttachVolumeNegativeTest-1605485622 tempest-AttachVolumeNegativeTest-1605485622-project] [instance: 6a265379-ebfd-4aea-a081-8b271f32c0ea] Took 8.58 seconds to build instance.

Then, Tempest tried to ssh the instance at 18:18:59 :
2023-02-14 18:22:39.102680 | controller | 2023-02-14 18:18:59,630 92653 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.161:22' as 'cirros' with public key authentication

And eventually, 2mins32sec after that (18:22:31), it stopped :
2023-02-14 18:22:39.103394 | controller | 2023-02-14 18:22:31,398 92653 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.161 after 16 attempts. Proxy client: no proxy client

So, I was wondering how long the DHCP discovery was done and eventually, I found that cirros dhcp client actually hangs for 1 min before requesting again.

So, now I'm wondering why it takes so much time to get a DHCP address and why the 2nd DHCP call doesn't get the IP address.

Adding Neutron team to this bug report because maybe we have something about our DHCP controller.

Okay, I did a bit of digging today for some other CI failure I saw on another change and eventually, I found this was related.

Then, Tempest tried to ssh the instance at 18:18:59 :
2023-02-14 18:22:39.102680 | controller | 2023-02-14 18:18:59,630 92653 INFO     [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.161:22' as 'cirros' with public key authentication

And eventually, 2mins32sec after that (18:22:31), it stopped :
2023-02-14 18:22:39.103394 | controller | 2023-02-14 18:22:31,398 92653 ERROR    [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.161 after 16 attempts. Proxy client: no proxy client

So, I was wondering how long the DHCP discovery was done and eventually, I found that cirros dhcp client actually hangs for 1 min before requesting again.

So, now I'm wondering why it takes so much time to get a DHCP address and why the 2nd DHCP call doesn't get the IP address.

Adding Neutron team to this bug report because maybe we have something about our DHCP controller.