Metadata responses are very slow sometimes

Bug #1836642 reported by Slawek Kaplonski on 2019-07-15
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Slawek Kaplonski

Bug Description

It happens from time to time in CI that VM spawned in test don't get public-keys from metadata service. Due to that SSH to instance using ssh-key is not possible thus test fails.

Example of such failures:

In each of those cases it looks that neutron metadata agent was waiting more than 10 seconds for response from n-api-meta service: ~ 16 seconds, ~ 13 seconds, ~ 17 seconds.

I have no idea why nova is responding so slow but it would be worth if someone from Nova team would take a look into that.

Logstash query which can help to find another examples:

It is possible that in some cases of failed tests, reason of failure may be different but problem described above is quite common in those failed tests IMO.

Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
importance: Undecided → High
sean mooney (sean-k-mooney) wrote :
Download full text (3.9 KiB)

As part of building the instnace metadata nova calls neutron to retirve the security groups for an instnace.
that function first looks up all the ports associated with an instance then looks up the security groups for each port

in the case of the metadata request that took 16s to respond

if we look 16 second earlier in the log we fine this request that was dispatched to the api worker with pid 9232
which is the same as the request that took 16 seconds.

from that request we can extrtract the instance uuid which is 'f123a13b-f41f-4b32-a117-dd9ba2114832'

this is the first request that was recived by that api worker and it took about 1.5 seconds for it to setup the db connect which completeed at

if we look at the neutron log we can see the request form nova to look up the instace ports here

we know this is the request because the port device_id matches the instace uuid
then imeditely after we have request for the security groups of that port

that request does not complete for 12.166260 seconds

so neutron completes the security group request at 23:43:16.660751 and nova completes
rendering all the metadata 0.044228 seconds later at 23:43:16.704979

in the case of the instance uuid was 1a88f13c-5609-4a4e-9182-33654a7d4188

i was not able to find a similar directly related issue however between starting the metadata request and finsihing it there was a 12.5s network delete request interleaved

we get the same waring for the other case i was not able to root cause at about the right time.

im not sure there is much more that we can see from the logs so we would need a reliable way to repoduce this to debug it futher.

for the ...


Changed in nova:
status: New → Incomplete
tags: added: gate-failure loadim
tags: added: loadimpact
removed: loadim
Hongbin Lu ( on 2019-07-17
Changed in neutron:
status: New → Confirmed
Slawek Kaplonski (slaweq) wrote :

Hi Sean,

I was looking at another example of similar failure (13 seconds respond pointed out by LIU in his last comment).
It is here:

In n-meta-api logs it's here:

And in neutron, related calls to get e.g. security groups are quite fast: and

So it's not neutron issue this time and nova is still responding very slow.

Also I can't understand one thing: why nova needs to get security groups of instance when there is request GET /2009-04-04/meta-data/public-keys done from VM. It doesn't require security groups at all. Isn't that a bug in nova?

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master

sean mooney (sean-k-mooney) wrote :

nova build the full set of metadata then caches it rather then building the minimal set for a given endpoint. this is required for config drive so no i dont belive its a bug that nova is getting the secuirty groups in this instance as it only does so when there is no metadata object for that instnace in the cache.

Slawek Kaplonski (slaweq) wrote :

@Sean: so, assuming that this data is cached in nova than it's even more strange for me.

Let's get an example from my previous comment. In neutron metadata agent's log we see request for /public-keys which took 13 seconds:

but just before this call there is also call for instance-id:

which took about 2 seconds. So my understanding is that nova should get and cache all data related to this instance when first call was done (for instance-id), right or am I miss understood somethibg here? And if my understanding is correct, what could took for nova 13 seconds to process second call than?

Slawek Kaplonski (slaweq) wrote :

After some talk with Sean on IRC, here is another idea to fix this in gate:

Matt Riedemann (mriedem) wrote :

This is probably a duplicate of bug 1808010.

This issue was fixed in the openstack/tempest 22.0.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers