When I looked at the nova logs, I found that the instance was spawned at 18:18:56 :
Feb 14 18:18:56.514945 np0033093378 nova-compute[83239]: INFO nova.compute.manager [None req-053318ab-09ad-4a3a-8ddb-633cc0002c3e tempest-AttachVolumeNegativeTest-1605485622 tempest-AttachVolumeNegativeTest-1605485622-project] [instance: 6a265379-ebfd-4aea-a081-8b271f32c0ea] Took 8.58 seconds to build instance.
Then, Tempest tried to ssh the instance at 18:18:59 :
2023-02-14 18:22:39.102680 | controller | 2023-02-14 18:18:59,630 92653 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.161:22' as 'cirros' with public key authentication
And eventually, 2mins32sec after that (18:22:31), it stopped :
2023-02-14 18:22:39.103394 | controller | 2023-02-14 18:22:31,398 92653 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.161 after 16 attempts. Proxy client: no proxy client
Then, I tried to look at the guest console, and I saw that udhcpc tried 3 times :
2023-02-14 18:22:39.129636 | controller | [ 12.638156] sr 0:0:0:0: Attached scsi generic sg0 type 5
[...]
2023-02-14 18:22:39.130384 | controller | Starting network: udhcpc: started, v1.29.3
2023-02-14 18:22:39.130415 | controller | udhcpc: sending discover
2023-02-14 18:22:39.130439 | controller | udhcpc: sending discover
2023-02-14 18:22:39.130461 | controller | udhcpc: sending discover
So, I was wondering how long the DHCP discovery was done and eventually, I found that cirros dhcp client actually hangs for 1 min before requesting again.
So, now I'm wondering why it takes so much time to get a DHCP address and why the 2nd DHCP call doesn't get the IP address.
Adding Neutron team to this bug report because maybe we have something about our DHCP controller.
Okay, I did a bit of digging today for some other CI failure I saw on another change and eventually, I found this was related.
So, lemme explain the issue here. First, I was looking at https:/ /storage. bhs.cloud. ovh.net/ v1/AUTH_ dcaab5e32b234d5 6b626f72581e364 4c/zuul_ opendev_ logs_6f9/ 868236/ 5/gate/ nova-next/ 6f9f3d0/ and I was wondering why the SSH connection wasn't working.
When I looked at the nova logs, I found that the instance was spawned at 18:18:56 : 83239]: INFO nova.compute. manager [None req-053318ab- 09ad-4a3a- 8ddb-633cc0002c 3e tempest- AttachVolumeNeg ativeTest- 1605485622 tempest- AttachVolumeNeg ativeTest- 1605485622- project] [instance: 6a265379- ebfd-4aea- a081-8b271f32c0 ea] Took 8.58 seconds to build instance.
Feb 14 18:18:56.514945 np0033093378 nova-compute[
Then, Tempest tried to ssh the instance at 18:18:59 : lib.common. ssh] Creating ssh connection to '172.24.5.161:22' as 'cirros' with public key authentication
2023-02-14 18:22:39.102680 | controller | 2023-02-14 18:18:59,630 92653 INFO [tempest.
And eventually, 2mins32sec after that (18:22:31), it stopped : lib.common. ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.161 after 16 attempts. Proxy client: no proxy client
2023-02-14 18:22:39.103394 | controller | 2023-02-14 18:22:31,398 92653 ERROR [tempest.
Then, I tried to look at the guest console, and I saw that udhcpc tried 3 times :
2023-02-14 18:22:39.129636 | controller | [ 12.638156] sr 0:0:0:0: Attached scsi generic sg0 type 5
[...]
2023-02-14 18:22:39.130384 | controller | Starting network: udhcpc: started, v1.29.3
2023-02-14 18:22:39.130415 | controller | udhcpc: sending discover
2023-02-14 18:22:39.130439 | controller | udhcpc: sending discover
2023-02-14 18:22:39.130461 | controller | udhcpc: sending discover
So, I was wondering how long the DHCP discovery was done and eventually, I found that cirros dhcp client actually hangs for 1 min before requesting again.
So, now I'm wondering why it takes so much time to get a DHCP address and why the 2nd DHCP call doesn't get the IP address.
Adding Neutron team to this bug report because maybe we have something about our DHCP controller.