periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master failing due metadata timeout

Bug #1763009 reported by Arx Cruz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
yatin
Revision history for this message
yatin (yatinkarel) wrote :
Changed in tripleo:
importance: High → Critical
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Dan Smith (danms) wrote :

Nothing in the metadata-api has changed since January that I can see (and even then it was a trivial thing). I assume this is something that has just started?

I don't see any lazy-loads in the logs, which would be the usual culprit for performance issues, but still not enough to take 12 seconds. Instance metadata is cached (using whatever oslo_cache is configured for) which should take the database out of the mix for whatever expiration_time is set to. Indeed the next hit (and subsequent ones) from the same instance take very little time:

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/1336714/overcloud-controller-0/var/log/containers/nova/nova-api-metadata.log.txt.gz#_2018-04-11_10_34_53_971

So I'd assume maybe they're coming out of the cache.

I see the other controllers had just a very few requests, but also took ~7 seconds to answer their first queries.

My thought would be to look at what the database is doing while this is going on, and any changes to that config that may have occurred recently. Here's a random api metadata log I pulled from another (non-tripleo) gate job; first request for an instance in 0.6 seconds:

http://logs.openstack.org/78/554078/11/check/tempest-full-py3/f3ccb99/controller/logs/screen-n-api-meta.txt.gz#_Apr_11_11_13_01_449164

I'd point out that the instance from your logs _does_ get what it needs out of metadata, configures its network and is happy. The tempest test is able to connect to it over SSH, but it's failing to authenticate because of key reasons. This goes on for three minutes or so before the test gives up.

2018-04-11 12:53:32,262 23696 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)

...

2018-04-11 12:55:40,096 23696 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.108 (Authentication failed.). Number attempts: 23. Retry after 24 seconds.
2018-04-11 12:56:04,623 23696 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2018-04-11 12:56:04,752 23696 INFO [paramiko.transport] Authentication (publickey) failed.
2018-04-11 12:56:04,868 23696 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@192.168.24.108 after 23 attempts

I see something in the cirros output about failing to generate a host key and non-executable user data. So that could be related. However, I think I can say with reasonable confidence that the time metadata-api takes to reply to the instance is not related to your test failure.

Changed in tripleo:
assignee: nobody → Gabriele Cerami (gcerami)
Revision history for this message
Quique Llorente (quiquell) wrote :
Revision history for this message
Arx Cruz (arxcruz) wrote :

No, we haven't found the root cause...

Matt Young (halcyondude)
tags: removed: quickstart
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/561498

Changed in tripleo:
assignee: Gabriele Cerami (gcerami) → yatin (yatinkarel)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/561498
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=d3d27d7ea84fe01320dee3e4fa2e2fad11640eb7
Submitter: Zuul
Branch: master

commit d3d27d7ea84fe01320dee3e4fa2e2fad11640eb7
Author: yatin <email address hidden>
Date: Mon Apr 16 08:12:48 2018 +0530

    Use hiera interpolation for memcached_network

    After [1] iptables rules are not set for memcached service
    thus services relying on memcached were not functioning well.
    With [2] it's requrired to use hiera interpolation for service
    configs, this patch fixes it for memcached_network.

    [1] https://review.openstack.org/#/c/551292
    [2] https://review.openstack.org/#/c/526692

    Related-Bug: #1757556
    Closes-Bug: #1763009
    Change-Id: If9b274192ea4738f455a6106ff1a62eb4e7a5c91

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/561637

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/queens)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/561637
Reason: not needed in fact.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.0.0.0b2

This issue was fixed in the openstack/tripleo-heat-templates 9.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.opendev.org/561637
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=ada9bc36732649e34721e64a4a92de54fd4d8842
Submitter: Zuul
Branch: stable/queens

commit ada9bc36732649e34721e64a4a92de54fd4d8842
Author: yatin <email address hidden>
Date: Mon Apr 16 08:12:48 2018 +0530

    Use hiera interpolation for memcached_network

    After [1] iptables rules are not set for memcached service
    thus services relying on memcached were not functioning well.
    With [2] it's requrired to use hiera interpolation for service
    configs, this patch fixes it for memcached_network.

    [1] https://review.openstack.org/#/c/551292
    [2] https://review.openstack.org/#/c/526692

    Related-Bug: #1757556
    Closes-Bug: #1763009
    Change-Id: If9b274192ea4738f455a6106ff1a62eb4e7a5c91
    (cherry picked from commit d3d27d7ea84fe01320dee3e4fa2e2fad11640eb7)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates queens-eol

This issue was fixed in the openstack/tripleo-heat-templates queens-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.