I'm also taking a look in this one, I couldn't reproduce it (tried 11 times, with the same az-cli command-line provided by gjolly.
Found 2 interesting things after Gauthier provide me access to one of his failing instances:
(a) Regarding cloud-init, I see the following in the logs (comparing a GOOD and BAD instance):
GOOD:
2021-04-13 19:19:09,341 - DataSourceAzure.py[DEBUG]: Retrieving public SSH keys
2021-04-13 19:19:09,341 - azure.py[DEBUG]: Unable to get keys from IMDS, falling back to OVF
2021-04-13 19:19:09,341 - DataSourceAzure.py[DEBUG]: Retrieved keys from OVF
2021-04-13 19:19:09,342 - handlers.py[DEBUG]: finish: azure-ds/get_public_ssh_keys: SUCCESS: get_public_ssh_keys
2021-04-13 19:19:09,342 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/.ssh to 1000:1000
2021-04-13 19:19:09,343 - util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False)
2021-04-13 19:19:09,343 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/sshd_config
2021-04-13 19:19:09,343 - util.py[DEBUG]: Writing to /home/ubuntu/.ssh/authorized_keys - wb: [600] 381 bytes
2021-04-13 19:19:09,343 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/.ssh/authorized_keys to 1000:1000
2021-04-13 19:19:09,344 - util.py[DEBUG]: Changing the ownership of /root/.ssh to 0:0
2021-04-13 19:19:09,344 - util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False)
2021-04-13 19:19:09,344 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/sshd_config
2021-04-13 19:19:09,344 - util.py[DEBUG]: Writing to /root/.ssh/authorized_keys - wb: [600] 545 bytes
BAD:
2021-04-12 08:25:07,412 - DataSourceAzure.py[DEBUG]: Retrieving public SSH keys
2021-04-12 08:25:07,412 - azure.py[DEBUG]: Unable to get keys from IMDS, falling back to OVF
2021-04-12 08:25:07,412 - azure.py[DEBUG]: No keys available from OVF
2021-04-12 08:25:07,412 - handlers.py[DEBUG]: finish: azure-ds/get_public_ssh_keys: SUCCESS: get_public_ssh_keys
2021-04-12 08:25:07,413 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/.ssh to 1000:1000
2021-04-12 08:25:07,413 - util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False)
2021-04-12 08:25:07,413 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/sshd_config
2021-04-12 08:25:07,414 - util.py[DEBUG]: Writing to /home/ubuntu/.ssh/authorized_keys - wb: [600] 0 bytes
2021-04-12 08:25:07,414 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/.ssh/authorized_keys to 1000:1000
2021-04-12 08:25:07,414 - util.py[DEBUG]: Changing the ownership of /root/.ssh to 0:0
2021-04-12 08:25:07,414 - util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False)
2021-04-12 08:25:07,415 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/sshd_config
2021-04-12 08:25:07,415 - util.py[DEBUG]: Writing to /root/.ssh/authorized_keys - wb: [600] 0 bytes
So, the main difference here is:
2021-04-13 19:19:09,341 - DataSourceAzure.py[DEBUG]: Retrieved keys from OVF
vs
2021-04-12 08:25:07,412 - azure.py[DEBUG]: No keys available from OVF
Why one method executes from DataSourceAzure.py whereas the other from azure.py?
I'm far from expert in cloud-init, so I'll defer that questions to cloud-init folks.
I'm also taking a look in this one, I couldn't reproduce it (tried 11 times, with the same az-cli command-line provided by gjolly.
Found 2 interesting things after Gauthier provide me access to one of his failing instances:
(a) Regarding cloud-init, I see the following in the logs (comparing a GOOD and BAD instance):
GOOD: .py[DEBUG] : Retrieving public SSH keys .py[DEBUG] : Retrieved keys from OVF get_public_ ssh_keys: SUCCESS: get_public_ssh_keys sshd_config (quiet=False) sshd_config .ssh/authorized _keys - wb: [600] 381 bytes .ssh/authorized _keys to 1000:1000 sshd_config (quiet=False) sshd_config ssh/authorized_ keys - wb: [600] 545 bytes
2021-04-13 19:19:09,341 - DataSourceAzure
2021-04-13 19:19:09,341 - azure.py[DEBUG]: Unable to get keys from IMDS, falling back to OVF
2021-04-13 19:19:09,341 - DataSourceAzure
2021-04-13 19:19:09,342 - handlers.py[DEBUG]: finish: azure-ds/
2021-04-13 19:19:09,342 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/.ssh to 1000:1000
2021-04-13 19:19:09,343 - util.py[DEBUG]: Reading from /etc/ssh/
2021-04-13 19:19:09,343 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/
2021-04-13 19:19:09,343 - util.py[DEBUG]: Writing to /home/ubuntu/
2021-04-13 19:19:09,343 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/
2021-04-13 19:19:09,344 - util.py[DEBUG]: Changing the ownership of /root/.ssh to 0:0
2021-04-13 19:19:09,344 - util.py[DEBUG]: Reading from /etc/ssh/
2021-04-13 19:19:09,344 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/
2021-04-13 19:19:09,344 - util.py[DEBUG]: Writing to /root/.
BAD: .py[DEBUG] : Retrieving public SSH keys get_public_ ssh_keys: SUCCESS: get_public_ssh_keys sshd_config (quiet=False) sshd_config .ssh/authorized _keys - wb: [600] 0 bytes .ssh/authorized _keys to 1000:1000 sshd_config (quiet=False) sshd_config ssh/authorized_ keys - wb: [600] 0 bytes
2021-04-12 08:25:07,412 - DataSourceAzure
2021-04-12 08:25:07,412 - azure.py[DEBUG]: Unable to get keys from IMDS, falling back to OVF
2021-04-12 08:25:07,412 - azure.py[DEBUG]: No keys available from OVF
2021-04-12 08:25:07,412 - handlers.py[DEBUG]: finish: azure-ds/
2021-04-12 08:25:07,413 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/.ssh to 1000:1000
2021-04-12 08:25:07,413 - util.py[DEBUG]: Reading from /etc/ssh/
2021-04-12 08:25:07,413 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/
2021-04-12 08:25:07,414 - util.py[DEBUG]: Writing to /home/ubuntu/
2021-04-12 08:25:07,414 - util.py[DEBUG]: Changing the ownership of /home/ubuntu/
2021-04-12 08:25:07,414 - util.py[DEBUG]: Changing the ownership of /root/.ssh to 0:0
2021-04-12 08:25:07,414 - util.py[DEBUG]: Reading from /etc/ssh/
2021-04-12 08:25:07,415 - util.py[DEBUG]: Read 3287 bytes from /etc/ssh/
2021-04-12 08:25:07,415 - util.py[DEBUG]: Writing to /root/.
So, the main difference here is:
2021-04-13 19:19:09,341 - DataSourceAzure .py[DEBUG] : Retrieved keys from OVF
vs
2021-04-12 08:25:07,412 - azure.py[DEBUG]: No keys available from OVF
Why one method executes from DataSourceAzure.py whereas the other from azure.py?
I'm far from expert in cloud-init, so I'll defer that questions to cloud-init folks.
Will continue in next comment.