periodic centos-7-ovb-1ctlr_1comp-featureset020-queens failing tempest "Endpoint not found"

Bug #1822080 reported by Marios Andreou on 2019-03-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Martin Kopec

Bug Description

In various queens periodic jobs examples [1][2][3] tempest is failing with trace as below. This blocks queens promotion as all 3 of the examples are in promotion criteria [4]

  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest Traceback (most recent call last):
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/bin/discover-tempest-config", line 10, in <module>
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest sys.exit(main())
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/config_tempest/main.py", line 494, in main
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest verbose=args.verbose
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/config_tempest/main.py", line 458, in config_tempest
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest services.set_service_availability()
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/config_tempest/services/services.py", line 194, in set_service_availability
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest self.is_service("volumev3"))
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/config_tempest/services/volume.py", line 75, in check_volume_backup_service
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest is_backup = volume_client.list_services(**params)
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/v2/services_client.py", line 31, in list_services
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest resp, body = self.get(url)
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 294, in get
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest return self.request('GET', url, extra_headers, headers)
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 653, in request
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest body=body, chunked=chunked)
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 542, in _request
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest method, url, headers, body, self.filters)
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/auth.py", line 188, in auth_request
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest filters, method, url, headers, body)
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/auth.py", line 279, in _decorate_request
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest base_url = self.base_url(filters=filters, auth_data=auth_data)
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/auth.py", line 571, in base_url
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest endpoint_type, catalog))
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest EndpointNotFound: Endpoint not found
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest Details: No matching service found in the catalog.
  2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest Scope: project, Credentials: {'username': 'admin', 'project_name': 'admin', 'project_domain_id': u'default', 'user_domain_id': u'default', 'tenant_id': u'695f9b5019ee4a1d8003eaff3c458c2b', 'user_domain_name': 'Default', 'domain_name': 'Default', 'tenant_name': 'admin', 'user_id': u'a5c83f1c88d949538693447b6eea955e', 'project_id': u'695f9b5019ee4a1d8003eaff3c458c2b', 'domain_id': None, 'project_domain_name': 'Default'}

[1] http://logs.rdoproject.org/openstack-periodic-24hr/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-queens/7cba8f8/logs/undercloud/home/zuul/tempest.log.txt.gz
[2] http://logs.rdoproject.org/openstack-periodic-24hr/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset017-queens/946a4f8/logs/undercloud/home/zuul/tempest.log.txt.gz
[3] http://logs.rdoproject.org/openstack-periodic-24hr/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/7f712d9/logs/undercloud/home/zuul/tempest.log.txt.gz
[4] https://github.com/rdo-infra/ci-config/blob/7f079b2ef012a686a548ead371e61ba6be1d8e9b/ci-scripts/dlrnapi_promoter/config/CentOS-7/queens.ini#L23

Changed in tripleo:
importance: Undecided → Critical
Changed in tripleo:
milestone: none → stein-rc1
Marios Andreou (marios-b) wrote :

just spent some time staring at the logs trying to make sense and failing I know nothing about tempest :/

brief chat with kopecmartin just now in #oooq - related to cinder-backup endpoint?

15:34 < marios_|ruck> kopecmartin: is that bit relevant {u'endpoints': [], u'type': u'metering', u'id':
u'3a36d5e0ea564a21a74d7998f8b5c21e', u'name': u'ceilometer'} i.e. empty endpoint for ceilo?
15:34 < marios_|ruck> kopecmartin: (me staring at
http://logs.rdoproject.org/openstack-periodic-24hr/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-queens/7cba8f8/logs/undercloud/home/zuul/tempest.log.txt.gz )
15:38 < kopecmartin> marios_|ruck, no, it's related to cinder, not ceilometer, python-tempestconf fails when it's trying to discover if cinder-backup service is on , that's all i can say from the logs

Martin Kopec (mkopec) wrote :
Download full text (4.2 KiB)

python-tempestconf fails when it's trying to discover if cinder-backup service is on

probably it's not on python-tempestconf side, looking at its code where tempest's volume_client.list_services method is called - tempest should either return the cinder-backup service information or None/[]/{} whatever, however, it raised an Exception

according to the documentation list_services shouldn't return such Exception:
https://developer.openstack.org/api-ref/block-storage/v3/?expanded=list-all-cinder-services-detail#list-all-cinder-services

Then looking at the traceback, we can see that tempest failed to list the services, maybe because the endpoint for listing them was wrongly deployed - I would look for recent changes (it's queens job, so maybe backports) to cinder service maybe?

Traceback (most recent call last):
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/bin/discover-tempest-config", line 10, in <module>
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest sys.exit(main())
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/config_tempest/main.py", line 494, in main
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest verbose=args.verbose
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/config_tempest/main.py", line 458, in config_tempest
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest services.set_service_availability()
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/config_tempest/services/services.py", line 194, in set_service_availability
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest self.is_service("volumev3"))
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/config_tempest/services/volume.py", line 75, in check_volume_backup_service
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest is_backup = volume_client.list_services(**params)
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/services/volume/v2/services_client.py", line 31, in list_services
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest resp, body = self.get(url)
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 294, in get
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest return self.request('GET', url, extra_headers, headers)
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 653, in request
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest body=body, chunked=chunked)
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest File "/usr/lib/python2.7/site-packages/tempest/lib/common/rest_client.py", line 542, in _request
2019-03-28 07:25:24 | 2019-03-28 07:25:24.678 80402 ERROR tempest m...

Read more...

Marios Andreou (marios-b) wrote :

ykarel just pointed at https://review.openstack.org/#/c/644550/ might be when this started

but given martin comments is it still an issue with tempest conf - i.e. even if the endpoint isn't there why is it exploding

Martin Kopec (mkopec) wrote :

python-tempestconf tries to search for cinder-backup service always when there is volumev3 service enabled ... and still searching for a service shouldn't (at least in this case) return an Exception.

Marios Andreou (marios-b) wrote :

looks to me like the v1 keystone endpoint is still being created. from the logs eg [1] you can see this entry for the v1 endpoint

  the cinderv1 entry is like that:
  u'endpoints': [{u'url': u'http://192.168.24.23:8776/v3/5553fc5af6cf45c094b522ea43d19acd', u'interface': u'public', u'region': u'regionOne', u'region_id': u'regionOne', u'id': u'5b88dd5311f6479f9293a3569b70a1ac'}, {u'url': u'http://192.168.24.23:8776/v3/5553fc5af6cf45c094b522ea43d19acd', u'interface': u'admin', u'region': u'regionOne', u'region_id': u'regionOne', u'id': u'e72288908ce0472bbf65e9367f7688dc'}, {u'url': u'http://192.168.24.23:8776/v3/5553fc5af6cf45c094b522ea43d19acd', u'interface': u'internal', u'region': u'regionOne', u'region_id': u'regionOne', u'id': u'f46cc8de58044525bdd9d1328ad7f135'}], u'type': u'volumev3', u'id': u'afe393ef8e8d4eb68b178650525dfd2c', u'name': u'cinderv3'}

  the actual v3 is like that:
  {u'endpoints': [{u'url': u'http://192.168.24.23:8776/v3/5553fc5af6cf45c094b522ea43d19acd', u'interface': u'public', u'region': u'regionOne', u'region_id': u'regionOne', u'id': u'5b88dd5311f6479f9293a3569b70a1ac'}, {u'url': u'http://192.168.24.23:8776/v3/5553fc5af6cf45c094b522ea43d19acd', u'interface': u'admin', u'region': u'regionOne', u'region_id': u'regionOne', u'id': u'e72288908ce0472bbf65e9367f7688dc'}, {u'url': u'http://192.168.24.23:8776/v3/5553fc5af6cf45c094b522ea43d19acd', u'interface': u'internal', u'region': u'regionOne', u'region_id': u'regionOne', u'id': u'f46cc8de58044525bdd9d1328ad7f135'}], u'type': u'volumev3', u'id': u'afe393ef8e8d4eb68b178650525dfd2c', u'name': u'cinderv3'},

So why are we still getting the v1 endpoint created if we are setting cinder::keystone::auth::configure_endpoint: false at https://review.openstack.org/#/c/644550/1/puppet/services/cinder-api.yaml

I think this would be solved if we cherrypick that back to queens https://review.openstack.org/#/c/636456/ in puppet-cinder so we just don't create v1 ? its still like that https://github.com/openstack/puppet-cinder/blob/100c6533fec066525504ae7c8940a873e2468456/manifests/keystone/auth.pp#L199 and doesn't have the https://review.openstack.org/#/c/636456/3/manifests/keystone/auth.pp which AFAICS means there is just no v1 created?

[1] http://logs.rdoproject.org/openstack-periodic-24hr/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-queens/7cba8f8/logs/undercloud/home/zuul/tempest.log.txt.gz

Marios Andreou (marios-b) wrote :

I just posted the rocky and queens backports for the puppet-cinder rocky https://review.openstack.org/#/c/648663 & queens https://review.openstack.org/#/c/648664

however if we merge those we also need to change that https://github.com/openstack/tripleo-heat-templates/blob/0555f57e1257b3762a075edfc27081408a5f75d3/puppet/services/cinder-api.yaml#L167 right?

will try and catchup with abishop about this today

Marios Andreou (marios-b) wrote :

via irc #tripleo just now abishop and kopecmartin... NO BACKPORTS will abandon those (comment #6 above)

The fix is needed tempest side and that comes in v19 whereas in queens we are using v18

15:46 < marios_|ruck> abishop: i see, thanks didn't know you were already talking/on it with kopecmartin. so the fix is needed tempest side then?
15:46 < abishop> marios_|ruck: fix is already in tempest, but it's in 19 and queens is pinned to 18
15:46 < marios_|ruck> abishop: i see kopecmartin is that something you're working on? ^
15:47 < kopecmartin> marios_|ruck, yes, I'm checking that
15:47 < abishop> marios_|ruck: proposed plan (at least in short term) is revert my stable/queens patch to unblock, then ponder if something else is warranted downstream

Marios Andreou (marios-b) wrote :

SO just discussed some more with abishop:

We can revert that https://review.openstack.org/#/c/644550/
OR
we can bump tempest in the belief that this fixes the bug

Waiting for feedback from tempest folks about the tempest version bump. Worst case when we really need to unblock Queens promote we revert

Martin Kopec (mkopec) wrote :

no bumping, what we can do is to find a change (which is available in tempest 19) and backport just that one change to tempest 18 in order to fix the issue - hopefully it's just one , not long change which will not cause any other problems :)

wes hayutin (weshayutin) wrote :

hrm.. not quite enough information for folks w/o full context.

My questions:
1. which patch from tempest 19 is needed in 18 to solve the issue.
2. Why are multinode scen01/02 non-voting in the queens branch?
tripleo-ci-centos-7-scenario001-multinode-oooq-container FAILURE in 2h 48m 28s (non-voting)
tripleo-ci-centos-7-scenario002-multinode-oooq-container FAILURE in 2h 32m 35s (non-voting)

I'll put in a revert to test to see if this makes scen01/02 pass.

Follow up on Monday

Arx Cruz (arxcruz) wrote :

This is a problem with the endpoint being populated for cinder.
Tempest and python-tempestconf only consumes the api called by openstack endpoint list. If cinder endpoint is wrong there, it will fail because tempestconf/tempest will get the wrong endpoint.

Martin Kopec (mkopec) wrote :

So based on the comment #12 the main question is, if this https://review.openstack.org/#/c/644550/ review is supposed to delete v1 endpoint from the puppet why is it still deployed then?

Marios Andreou (marios-b) wrote :

just discussed on the phone with the tempest folks thanks kopecmartin & arxcruz. I think we agree we don't like the revert, but then see comment #12/#13

Options:
1. merge Revert https://review.openstack.org/#/c/648749/
2. disable tempest
3. merge the cherrypick at https://review.openstack.org/#/c/648664/ (and rocky too in order to get to queens).

I will try catchup with Alan about this today again. I know I said 'sure lets disable tempest' thinking about it some more I am leaning more towards option 1, even though I don't like it. Unless we get a clear answer to comment #13 it looks like it isn't even doing what we want.

If the cherrypicks are wrong as commented by abishop maybe we can fix those and merge?

By my EOD today if there is no better plan I would vote for option 1.

Marios Andreou (marios-b) wrote :

and discussing again with abishop now, seems we all at least agree we need to revert that now to unblock queens - no promotion 7 days now.

lets merge https://review.openstack.org/#/c/648749/.

In terms of next steps, I would propose to tidy up https://review.openstack.org/#/c/648664/ - wrt @abishop comments there - if the cherrypick is wrong (there were many conflicts so this is entirely possible) lets fix it and merge that.

Otherwise, what will our next steps be.

Marios Andreou (marios-b) wrote :

so abishop wants to try that instead of the revert https://review.openstack.org/#/c/649084/

via irc #tripleo just now

Changed in tripleo:
status: Triaged → Fix Committed
status: Fix Committed → In Progress

Reviewed: https://review.openstack.org/649084
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=3bc041b47da5f23c57f97c49a66349b81ccc0740
Submitter: Zuul
Branch: stable/queens

commit 3bc041b47da5f23c57f97c49a66349b81ccc0740
Author: Alan Bishop <email address hidden>
Date: Mon Apr 1 11:51:29 2019 -0400

    Fix tempest volume tests on queens

    With [1], keystone catalog entries are no longer created for cinder's
    v1 API. This works fine with the latest version of tempest, but older
    versions of tempest assume a service named "volume" is present in the
    catalog. This patch ensures the service exists, but reuses the endpoints
    from cinder's v3 API.

    [1] https://review.openstack.org/644550

    Closes-Bug: #1822080
    Change-Id: If1ef8b1ad60151c0dfd0a7804ba7e697fc4ede28

tags: added: in-stable-queens
Changed in tripleo:
status: In Progress → Fix Released

Change abandoned by wes hayutin (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/648749

This issue was fixed in the openstack/tripleo-heat-templates 8.4.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers