Comment 3 for bug 1526855

Ed Bak (ed-bak2) wrote :

I reproduced this issue in a smaller ( 10 compute node ) environment than where this was first observed.

Using 10 simultaneous clients with a 20 second sleep between nova boot commands, 14/4000 instances failed to retrieve their metadata and as a result were not accessible via ssh. This was with a DVR router.

Slowing down the rate of VM creation to 5 simultaneous clients with a 30 second sleep between each nova boot did not improve the situation. Once again this was with a DVR router. Small numbers of metadata retries were observed in some of the vm console logs with less than 1000 vms in the environment. Retries first exceeded the 20 retry limit at around 2300 vms.

Using a non-distributed/centralized router, I did not observe the issue. All vms booted successfully and obtained their metadata on the first attempt.

Based on the log files, the instances seem to be unsuccessful reaching the metadata proxy for extended periods of time. Once the connection to the metadata proxy is accepted, the metadata is always successfully returned.