[HEAT][KEYSTONE] Cached Token seems expired

Bug #1458532 reported by Leontii Istomin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
In Progress
High
Roman Podoliaka
6.1.x
In Progress
High
Roman Podoliaka
7.0.x
In Progress
High
Kairat Kushaev

Bug Description

api: '1.0'
astute_sha: b09729c64b695b2e6fcc88c31843321759ec45d5
auth_required: true
build_id: 2015-05-16_20-55-26
build_number: '425'
feature_groups:
- mirantis
fuel-library_sha: 1645fe45f226cdd6d2829bea9912d0baa3be5033
fuel-ostf_sha: 9ce1800749081780b8b2a4a7eab6586583ffaf33
fuelmain_sha: 0e970647a83d9a7d336c4cc253606d4dd0d59a60
nailgun_sha: 076566b5df37f681c3fd5b139c966d680d81e0a5
openstack_version: 2014.2.2-6.1
production: docker
python-fuelclient_sha: 38765563e1a7f14f45201fd47cf507393ff5d673
release: '6.1'

Successfully deployed the following configuration:
Baremetal,Ubuntu,IBP,HA, Neutron-gre,Ceph-all,Nova-debug,Nova-quotas, 6.1-425
Controllers:3 Computes:47

But during create-and-delete-stack_with_volume rally scenario we faced with the error:
rally.log:
http://paste.openstack.org/show/235638/
messages(haproxy):
http://paste.openstack.org/show/235639/
heat-all.log:
http://paste.openstack.org/show/235640/

Diagnostic snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2015-05-24_15-50-21.tar.xz

Tags: scale
Changed in mos:
milestone: none → 6.1
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

here is what heat-engine log says about an error:
http://paste.openstack.org/show/235673/.
Proceeding analysis of the error.

Changed in mos:
status: Confirmed → In Progress
assignee: MOS Heat (mos-heat) → Kairat Kushaev (kkushaev)
Revision history for this message
Leontii Istomin (listomin) wrote :

It can be duplicate of https://bugs.launchpad.net/fuel/+bug/1443913
Need to try to reproduce, then deisable keystonemidlware cach and try to reproduce again.
in /etc/heat/heat.conf and other OpenStack services:
token_cache_time=-1

Revision history for this message
Leontii Istomin (listomin) wrote :

I've performed the following on each controller node:
http://paste.openstack.org/show/236392/
And on one of them:
crm resource restart clone_p_heat-engine

After that I performed create-and-delete-stack_with_volume rally scenario with increased values. And chack logs:
haproxy public_vip:
http://paste.openstack.org/show/236395/
haproxy management_vip:
http://paste.openstack.org/show/236396/

some tokensare missing. For example fromnova-all.log:
http://paste.openstack.org/show/236397/
For neutron:
http://paste.openstack.org/show/236398/

Changed in mos:
assignee: Kairat Kushaev (kkushaev) → MOS Keystone (mos-keystone)
Revision history for this message
Boris Bobrov (bbobrov) wrote :

I see this in admin.log regarding the token from neutron:

./admin.log:2015-05-25 20:08:14.126 10589 DEBUG keystone.token.persistence.backends.kvs [-] Token `ee8ee5713be8423abb940a5ada152b76` is expired, removing from `usertokens-08b05359ffc9435f94859442d5582cc0`. _update_user_token_list /usr/lib/python2.7/dist-packages/keystone/token/persistence/backends/kvs.py:173

The line says that the token got expired 40+ minutes before it was used. Thus it failed correctly.

The question is why neutron has such an old token. I don't know how to find out who called neutron with it.

Revision history for this message
Alexander Makarov (amakarov) wrote :

The correct solution is to make client components to re-authenticate in such cases.
It is to be done through community so it's an opportunity for our component developers to file some valuable patches.

Revision history for this message
Boris Bobrov (bbobrov) wrote :
Revision history for this message
Leontii Istomin (listomin) wrote :

rally gives both tokens and username/password to OpenStack clients:
https://github.com/openstack/rally/blob/master/rally/osclients.py#L204-L219
If token is expired, openstack client should request a new one using username/password. Rally tests take more then 1h and tokens should be expired. It's normal behavior.

As we can see nova requests and uses new tokens correctly:
http://paste.openstack.org/show/237893/
But there is known issue with cinder:
https://bugs.launchpad.net/cinder/+bug/1298135

So, we need to fix cinder and check another OpenStack services. All of them should request new tokens if get 401.

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Leontiy, let me clarify how authentication works here. I will use that picture as an illustration:
https://docs.google.com/drawings/d/1Sr-ZC45-6GUSjLL0NKUPASEBwScTC9orYLJAlF2Wvck/edit?usp=sharing

Indeed, Rally creates Heat client with token, username and password, so that the client can later regenerate the token if it expires. But, when Heat client makes an API call to Heat (1st api call), it passes token only. When later Heat needs to do api call to an OpenStack component, e.g. Cinder (2nd api call), the call might fail because the token already expired. Cinder client can not regenerate the token as it does not have username/password. Hence the whole stack creation fails.

IMO, the right way to fix this is to ensure that Rally sends requests to Heat with 'fresh' tokens (not older than 10-20 minutes) and rely on Heat that it can finish stack creation in the remaining time (40-50 minutes).

I am assuming that Heat Rally tests are Heat team area and hence assigning this bug to them. Fixing tests is not release related, so I moving this bug to 7.0.

Changed in mos:
assignee: MOS Keystone (mos-keystone) → MOS Heat (mos-heat)
milestone: 6.1 → 7.0
status: In Progress → Triaged
tags: added: non-release
Changed in mos:
assignee: MOS Heat (mos-heat) → Kairat Kushaev (kkushaev)
Revision history for this message
Dan Hata (dhata) wrote :

This is an important bug towards the scale team's certification. This bug affects 5 scenarios which would be a 7% improvement in our tests. Success can go up from 67% to 74% in build 425 when we fix this issue.

Changed in mos:
status: Triaged → In Progress
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

Dmitry,
I consulted with rally guys about the issue. They said that rally creates a client before every iteration.
So the token should not expire in this case.
They also found another potential rootcause of the problem: keystone scenarios can run in parallel with heat scenarios and they might delete some of the tokens. I am going to investigate the logs about it.

Revision history for this message
Kairat Kushaev (kkushaev) wrote :

Ok, some updates related to issue.
It turns out that creat_and_delete_stack with volume is executing more than 1 hour.
You can see it in run_rally-debug.log:
http://paste.openstack.org/show/262935/.
That's why heat failed with authorization failed and after that cinder failed with authorization error.
It seems that heat have some problems with volume deletion because stack is hanging when trying to delete volume:
http://paste.openstack.org/show/235673/.
I asked Ivan Kolodyazny to analyze the root cause.
Waiting for feedback from him.

tags: removed: non-release
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.