Comment 7 for bug 1808010

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote : Re: Tempest cirros boots fail due to lack of disk space

O.k., so I did some further local testing with cirros 0.3.5 and config-drive. If I add a keypair that contains 4 pubkeys, I get /run to be 100% filled. If I use a keypair with 8 pubkeys, I'm getting the same "write error" sequence as above, _but_ cirros seems to manage to clean up after that and reverts to using the network based metadata successfully. Also /run is only 32% filled then, so the "/run full" issue only seems to affect instances with a config drive attached. I'll open up a separate cirros bug for that.

So then the real issue seems to be flaky metadata responses, which may have different causes on the network and/or API side. However, we could also try to make our testing more robust against this by teaching tempest to login using the well-known default password if ssh using the key or password that is to have been set via metadata has failed. That way we might be able to gather some more data about the failure scenarios.

Another approach that I'll try to take a look at is to make cirros more robust against these failures by retrying failed attempts and also log more details about the errors.