Cinder should handle token expiration for long ops

Bug #1298135 reported by Andrew Kerr
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Cinder
In Progress
Medium
Unassigned

Bug Description

Certain operations, such as upload-to-image and backup-restore, have the possibility of being long-running operations. It is possible that the auth token provided by keystone could expire before the operations complete. This can result in operations failing simply because the api calls being used are now sending an expired token.

Cinder should attempt a reauthorization during long running operations if it receives a 401 response code.

Tags: bugsmash
Changed in cinder:
status: New → Confirmed
importance: Undecided → Medium
wanghong (w-wanghong)
Changed in cinder:
assignee: nobody → wanghong (w-wanghong)
Mike Perez (thingee)
Changed in cinder:
status: Confirmed → Triaged
milestone: none → next
Revision history for this message
Duncan Thomas (duncan-thomas) wrote :

Rescope of a token does not extend its expiry time, so I'm not sure there's an existing keystone API we can use.

There's a second problem for backups initiated from Horizon, in that Horizon invalidates tokens on log-out or project switch, so in its case the problem is even worse.

You might be able to use keystone trusts to extend a token, but that is outside of the intended use-case for trusts.

Revision history for this message
Duncan Thomas (duncan-thomas) wrote :
Revision history for this message
francis moorehead (francis-moorehead) wrote :

Can be reproduced in devstack by reducing the keystone token expiration to 20 seconds, restarting keystone and kicking off a backup of a 1GB volume. The token expires before the backup can complete.

Revision history for this message
Duncan Thomas (duncan-thomas) wrote :

https://review.openstack.org/#/c/96648/ is a blueprint for a token renewal interface in keystone, this might be usable for backups

Alan Hassett (ahassett)
Changed in cinder:
status: Triaged → Fix Released
Revision history for this message
Andrew Kerr (andrew-kerr) wrote :

Why was this set to Fix Released?

Thierry Carrez (ttx)
Changed in cinder:
status: Fix Released → Triaged
Revision history for this message
Alan Hassett (ahassett) wrote :

Sorry, realised I accidently set it to fixed when looking at bug, My apologies.

Mike Perez (thingee)
Changed in cinder:
status: Triaged → Confirmed
assignee: wanghong (w-wanghong) → nobody
Changed in cinder:
assignee: nobody → j_king (james-agentultra)
Revision history for this message
j_king (james-agentultra) wrote :
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Automatically unassigning due to inactivity.

Changed in cinder:
assignee: j_king (james-agentultra) → nobody
tags: added: bugsmash
Revision history for this message
wangxiyuan (wangxiyuan) wrote :

I guess that trust or service token have handle this problem. Maybe we need a bp to do this?

Changed in cinder:
assignee: nobody → Serhii Rusin (serhii-rusin)
Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

service token is much better and is already implemented by some other services like Nova.

IMO any OpenStack service talking to another OpenStack service must with user's token must implement ServiceToken auth support.

AFAIU Keystone is trying to move from trusts to app creds anyway.

Revision history for this message
Tejaswini Grandhe (grantejaswini) wrote :

I am trying to understand this bug. This issue seems to convert as a feature enhancement for Cinder and here is the blueprint links for implementing Cinder service tokens for both requirements:

Add service_token for cinder-nova interaction
https://review.opendev.org/#/c/524497/

Add service_token for cinder-glance interaction
https://review.opendev.org/#/c/526611/

Are we looking at the same implementations or anything else ? Can someone give clarity on persistence of this issue in recent Openstack release.

Please let me know if I am looking in wrong direction.

Changed in cinder:
assignee: Serhii Rusin (serhii-rusin) → nobody
Revision history for this message
Ben O'Hara (bohara) wrote :

I see issues with cinder-backup when configured to push the backup to swift.

It looks like it doesnt sent the service token so if you logout of the portal or the volume is large and takes longer to upload than the token expiry the volume backup fails with an error.

Revision history for this message
Dmitry Galkin (galkindmitrii) wrote :

We are facing the same issue as Ben O'Hara mentioned.
Long running Cinder backup with Swift backend will fail when user token expires.

However, Swift service tokens seem to be different.
This is also why Glance->Swift communication uses Keystone Trusts and not Swift Service Tokens: https://github.com/openstack/glance/blob/master/glance/common/trust_auth.py

Changed in cinder:
status: Confirmed → In Progress
Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

This is a major painpoint, nice to see somebody proposing a patch for it.

Revision history for this message
kay (kay-diam) wrote :

any updates on the patch review?

Revision history for this message
Tobias Urdin (tobias-urdin) wrote :
Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

I don't think it's a good idea to try to fix this by using trusts. It would be better to fix the service token issue (maybe on the swift side) rather than use trusts as a workaround.

Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

While that is true that Swift could be made aware of service token, one should also consider that implementation of the Swift API for example in Ceph RadosGW would probably not have such a mechanism yet or be easily implemented there.

Other than that, configuring Cinder service_auth, set sending service token config opt to true and verify that Cinder indeed sends service token (for cinder-backup service, by default) and then checking in native OpenStack Swift would be a good first step.

Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

Thinking about it, I'm actually inclined to agree that service token support is the best way forward instead of using trusts.

Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

Hello, a little update on my part.

This bug is very spread out since it covers multiple issues, I'll try to update my view on things. You can already today enable service user for Cinder talking to Glance and Nova by updating the [service_user] config section and enable send_service_user_token.

Then there is the Swift backup driver in per_user mode (meaning we use the user token to talk from the cinder-backup service to the Swift endpoint). There is a proposed fix [1] that involves creating a Keystone trust and then use that, I consider this a workaround that can be used to solve the issue shorterm but probably has a lot of edge-cases that could leave trusts left behind if it fails to cleanup.

I have proposed a patch [2] that adds support for sending a service user token (X-Service-Token header) from cinder-backup to the Swift endpoint that would make the Swift endpoint allow the X-Auth-Token (the user token we pass along) even if it's expired (based on the [token]/allow_expired_window window that Keystone allows).

The patch [2] should hopefully solve the issue for anybody running OpenStack Swift.

For us that is running Ceph RadosGW as a means of providing a Swift compatible API it's harder because the X-Service-Token support is not there, we are running Ceph RadosGW and I'm working on adding this support in [3].

Our goal is to solve the issue with [2] and [3], but potentially use [1] (if it's merged, I think it could be but maybe as a workaround the same way Nova has a [workarounds] section) until those two patches has been released in the future.

[1] https://review.opendev.org/c/openstack/cinder/+/785362
[2] https://review.opendev.org/c/openstack/cinder/+/840289
[3] https://github.com/ceph/ceph/pull/45395

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/840289
Committed: https://opendev.org/openstack/cinder/commit/77c886ab18ba241eaa7418f1e0d095fe6639ae19
Submitter: "Zuul (22348)"
Branch: master

commit 77c886ab18ba241eaa7418f1e0d095fe6639ae19
Author: Tobias Urdin <email address hidden>
Date: Tue May 3 13:27:15 2022 +0000

    backup/swift: Add support sending service user token

    This adds support to the Swift backup driver to send
    a service user token in the X-Service-Token header when
    talking to Swift which will support long running processes
    to continue functioning when the user token is expired if
    the target supports it. [1] [2]

    In the patch I'm favoring passing the X-Service-Token from
    Cinder as a header instead of passing the service user credentials
    down to the python-swiftclient, it makes more sense to not hand
    it off. We already have a auth plugin for the service user which
    ensures that the token is always valid, an invalid token would
    disrupt the process and cause the long running process to fail.

    The new config option to enable the service auth in the Swift
    driver serves the purpose of not enabling the feature by default
    for deployments already enabling service user for Nova and Glance.

    I'm working on implementing the X-Service-Token support
    in Ceph RadosGW's Swift API implementation [3], OpenStack Swift
    already supports service token.

    [1] https://specs.openstack.org/openstack/keystone-specs/specs/keystonemiddleware/juno/service-tokens.html
    [2] https://docs.openstack.org/cinder/latest/configuration/block-storage/service-token.html
    [3] https://github.com/ceph/ceph/pull/45395

    Related-Bug: #1298135
    Change-Id: I69a478dc18c18e6d67be83d61c9643afab72c118

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.