Comment 4 for bug 1763009

Revision history for this message
Dan Smith (danms) wrote :

Nothing in the metadata-api has changed since January that I can see (and even then it was a trivial thing). I assume this is something that has just started?

I don't see any lazy-loads in the logs, which would be the usual culprit for performance issues, but still not enough to take 12 seconds. Instance metadata is cached (using whatever oslo_cache is configured for) which should take the database out of the mix for whatever expiration_time is set to. Indeed the next hit (and subsequent ones) from the same instance take very little time:

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/1336714/overcloud-controller-0/var/log/containers/nova/nova-api-metadata.log.txt.gz#_2018-04-11_10_34_53_971

So I'd assume maybe they're coming out of the cache.

I see the other controllers had just a very few requests, but also took ~7 seconds to answer their first queries.

My thought would be to look at what the database is doing while this is going on, and any changes to that config that may have occurred recently. Here's a random api metadata log I pulled from another (non-tripleo) gate job; first request for an instance in 0.6 seconds:

http://logs.openstack.org/78/554078/11/check/tempest-full-py3/f3ccb99/controller/logs/screen-n-api-meta.txt.gz#_Apr_11_11_13_01_449164

I'd point out that the instance from your logs _does_ get what it needs out of metadata, configures its network and is happy. The tempest test is able to connect to it over SSH, but it's failing to authenticate because of key reasons. This goes on for three minutes or so before the test gives up.

2018-04-11 12:53:32,262 23696 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)

...

2018-04-11 12:55:40,096 23696 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.108 (Authentication failed.). Number attempts: 23. Retry after 24 seconds.
2018-04-11 12:56:04,623 23696 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2018-04-11 12:56:04,752 23696 INFO [paramiko.transport] Authentication (publickey) failed.
2018-04-11 12:56:04,868 23696 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.108 after 23 attempts

I see something in the cirros output about failing to generate a host key and non-executable user data. So that could be related. However, I think I can say with reasonable confidence that the time metadata-api takes to reply to the instance is not related to your test failure.