Nova volume-update leaves volumes stuck in attaching/detaching

Bug #1413610 reported by Ollie Leahy on 2015-01-22
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned

Bug Description

There is a problem with the nova command 'volume-update' that leaves cinder volumes in the states 'attaching' and 'deleting'.

If the nova command 'volume-update' is used by a non admin user the command fails and the volumes referenced in the command are left in the states 'attaching' and 'deleting'.

For example, if a non admin user runs the command
 $ nova volume-update d39dc7f2-929d-49bb-b22f-56adb3f378c7 f0c3ea8f-c00f-4db8-aa20-e2dc6a1ddc9b 59b0cf66-67c8-4041-a505-78000b9c71f6

 Will result in the two volumes stuck like this:

 $ cinder list
 +--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
 | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
 +--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
 | 59b0cf66-67c8-4041-a505-78000b9c71f6 | attaching | vol2 | 1 | None | false | |
 | f0c3ea8f-c00f-4db8-aa20-e2dc6a1ddc9b | detaching | vol1 | 1 | None | false | d39dc7f2-929d-49bb-b22f-56adb3f378c7 |
 +--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+

And the following in the cinder-api log:

2015-01-21 11:00:03.969 13588 DEBUG keystonemiddleware.auth_token [-] Received request from user: user_id None, project_id None, roles None service: user_id None, project_id None, roles None __call__ /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/keystonemiddleware/auth_token.py:746
2015-01-21 11:00:03.970 13588 DEBUG routes.middleware [req-d08bc456-14e9-4f97-9f4e-a9f87e7d6add a00b77059c8c4bbbb00a4ff9a8496b2d d40e3207e34a4b558bf2d58bd3fe268a - - -] Matched POST /d40e3207e34a4b558bf2d58bd3fe268a/volumes/f0c3ea8f-c00f-4db8-aa20-e2dc6a1ddc9b/action __call__ /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/routes/middleware.py:100
2015-01-21 11:00:03.971 13588 DEBUG routes.middleware [req-d08bc456-14e9-4f97-9f4e-a9f87e7d6add a00b77059c8c4bbbb00a4ff9a8496b2d d40e3207e34a4b558bf2d58bd3fe268a - - -] Route path: '/{project_id}/volumes/:(id)/action', defaults: {'action': u'action', 'controller': <cinder.api.openstack.wsgi.Resource object at 0x7febe7a89850>} __call__ /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/routes/middleware.py:102
2015-01-21 11:00:03.971 13588 DEBUG routes.middleware [req-d08bc456-14e9-4f97-9f4e-a9f87e7d6add a00b77059c8c4bbbb00a4ff9a8496b2d d40e3207e34a4b558bf2d58bd3fe268a - - -] Match dict: {'action': u'action', 'controller': <cinder.api.openstack.wsgi.Resource object at 0x7febe7a89850>, 'project_id': u'd40e3207e34a4b558bf2d58bd3fe268a', 'id': u'f0c3ea8f-c00f-4db8-aa20-e2dc6a1ddc9b'} __call__ /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/routes/middleware.py:103
2015-01-21 11:00:03.972 13588 INFO cinder.api.openstack.wsgi [req-d08bc456-14e9-4f97-9f4e-a9f87e7d6add a00b77059c8c4bbbb00a4ff9a8496b2d d40e3207e34a4b558bf2d58bd3fe268a - - -] POST http://192.0.2.24:8776/v1/d40e3207e34a4b558bf2d58bd3fe268a/volumes/f0c3ea8f-c00f-4db8-aa20-e2dc6a1ddc9b/action
2015-01-21 11:00:03.972 13588 DEBUG cinder.api.openstack.wsgi [req-d08bc456-14e9-4f97-9f4e-a9f87e7d6add a00b77059c8c4bbbb00a4ff9a8496b2d d40e3207e34a4b558bf2d58bd3fe268a - - -] Action body: {"os-migrate_volume_completion": {"new_volume": "59b0cf66-67c8-4041-a505-78000b9c71f6", "error": false}} get_method /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/api/openstack/wsgi.py:1010
2015-01-21 11:00:03.973 13588 INFO cinder.api.openstack.wsgi [req-d08bc456-14e9-4f97-9f4e-a9f87e7d6add a00b77059c8c4bbbb00a4ff9a8496b2d d40e3207e34a4b558bf2d58bd3fe268a - - -] http://192.0.2.24:8776/v1/d40e3207e34a4b558bf2d58bd3fe268a/volumes/f0c3ea8f-c00f-4db8-aa20-e2dc6a1ddc9b/action returned with HTTP 403
2015-01-21 11:00:03.975 13588 INFO eventlet.wsgi.server [req-d08bc456-14e9-4f97-9f4e-a9f87e7d6add a00b77059c8c4bbbb00a4ff9a8496b2d d40e3207e34a4b558bf2d58bd3fe268a - - -] 127.0.0.1 - - [21/Jan/2015 11:00:03] "POST /v1/d40e3207e34a4b558bf2d58bd3fe268a/volumes/f0c3ea8f-c00f-4db8-aa20-e2dc6a1ddc9b/action HTTP/1.1" 403 429 0.123613

The problem is that the nova policy.json file allows a non admin user to run the command 'volume-update', but the cinder policy.json file requires the admin role to run the action os-migrate_volume_completion, which is called by nova as part of the 'update-volume' process.

The operation will complete successfully if it is performed by an admin user, or if it is run by a non admin user after the cinder policy.json file is updated.

The simplest fix would be to remove the "rule:admin_api" requirement from the "os-migrate_volume_completion" action in cinder policy.json

Duncan Thomas (duncan-thomas) wrote :

Should nova do some tidying up if it gets a permission denied?

Currently 403 doesn't seem to be an expected code?

    @wsgi.response(202)
    @extensions.expected_errors((400, 404, 409))
    @validation.schema(volumes_schema.update_volume_attachment)
    def update(self, req, server_id, id, body):

Mike Perez (thingee) wrote :

Duncan, Nova does not clean up if this call fails:

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4783

While I agree Nova shouldn't assume this call will succeed as it does today, what I think Loganathan is explaining is we have nova expecting volume migration completion to be ran by a normal end user, not the administrator.

Volume migrate was never intended to be used by an end user, which is why by default it's required for the user running it to be within the admin role. In an OpenStack environment, the end user wouldn't know or care about the storage backends, they would just want to consume the resources based on requirements advertised by volume types, that's it.

With that in mind, the administrator cares/knows about the backends, so we expose this feature to an admin role only. However, I don't think that's the problem either.

Currently in the Nova code it does elevate the context which it talks to Cinder, so I'm not sure why this is happening:

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4796

Ollie Leahy (oliver-leahy-l) wrote :

In this case migrate_volume_completion is not being called as part of 'Volume migrate', it's being called by the python-novaclient operation 'volume-update' which can be called by a non-admin user. The command 'volume-update takes as parameters an instance, an attached volume and a new volume; and it replaces the attached volume with the new volume. That is it does a copy - detach oldvol - attach newvol

If an admin user runs 'nova volume-update ' the operation completes successfully. But if a user that does not have admin privs runs it, the we end up in the state described above.

So I think that what is happening is that nova performs all it's operations as the necessary, elevating context as required. Then nova makes a call to the cinder api migrate_volume_completion, with the non-admin user creds. The cinder api server runs authorize() on the non admin-creds and refuses to continue.

So I think there are three questions here

1. Should the python novaclient command 'volume-update' be a non-admin operation. The nova devs I've spoken to believe that it should be available to non-admin users

2. If volume-admin is available to non-admins then should nova call the os-migrate_volume_completion api, or should it call os-detach and os-attach seperately on the old and new volumes

3. Should nova clean up volume states in the case where an error occurs towards the end of a volume operation.

For the moment I'm assuming that cinder should support nova in allowing non-admin users to run update-volume, and I'm assuming cinder should allow the non-admin user to run the os-migrate_volume_completion command.

I think the issue of nova cleaning up is an independent bug which could be raised on nova.

So, I'm preparing a patch that makes os-migrate_volume_completion a non-admin operation,

Please let me know if you think this is the wrong approach.

Changed in cinder:
assignee: nobody → Ollie Leahy (oliver-leahy-l)
status: New → In Progress
Mike Perez (thingee) wrote :

Ollie, this is nothing new with cinder client. There are plenty of things that can be called that are admin only. See this search for 'admin only' in the client code:

http://paste.openstack.org/show/162316/

As explained in my last comment, Nova is elevating the context when making this call, so time should be spent on figuring out why that's not working as expeected:

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4796

Mike: I'm aware nova /doesn't/ do any clean-up, I'm suggesting that it
should.

Context.elevated, like cinder, does not use an admin context for
inter-project rest calls, only RPC/db calls. There was a new version of
elevate added to cinder recently to use a service (usually admin) user for
specific calls, I think nova has a similar concept but does not yet use it
for this call.

Ollie Leahy (oliver-leahy-l) wrote :

Mike: Duncan is correct, elevating the nova context does not mean that nova will use different credentials when it connects to the cinder API. That's what I was saying in my last post and I've confirmed it with nova developers.

Given that, it seems that 'nova volume-update' is working as expected. The user credentials passed with the os-migrate_volume_completion do not have admin rights, so cinder returns the status 403.

Do you agree with that? or is there something else you had in mind?

Pranali Deore (pranali-deore) wrote :

Hi,

I have tested this issue on latest master and it is reproducible by admin user as well i.e., volumes are remaining in 'attaching' and 'detaching' states by admin user and normal user after running nova volume-update command.

I have found one launchpad issue related to migrate attach volume which is already fixed,
https://bugs.launchpad.net/cinder/+bug/1316079

Two different patches were submitted for this issue, but the cinder patch ( https://review.openstack.org/#/c/101932/1 ) was merged in Juno and nova patch ( https://review.openstack.org/#/c/101933/3 ) was merged in Kilo1.
So, volume update works properly on stable/juno because nova patch was not merged.

In nova patch, attach and detach calls are moved from swap_volume() method of nova.compute.manager to migrate_volume_completion() method of cinder.volume.manager.

IMO, while fixing this cinder migrate attach volume issue, nova volume-update was not tested properly.

When we migrate volume which is attached to the instance, cinder calls update_server_volume() method of nova from cinder.volume.manager, which internally calls swap_volume() method of nova.manager(which also gets called in case of nova volume-update.)
https://github.com/openstack/cinder/blob/master/cinder/volume/manager.py#L1078

In swap_volume() of nova.compute, migrate_volume_completion() method of cinder.volume.manager is called which deletes the old volume after detaching it in case of cinder migrate attach volume, but in case of nova volume-update the new_volume_id gets returned from migrate_volume_completion() method of cinder.volume.api itself (It does not give call to manager because migration status of new volume and old volume are None).

https://github.com/openstack/cinder/blob/master/cinder/volume/api.py#L1136

Hence volumes are remaining in 'attaching' and 'detaching' states.

Pranali Deore (pranali-deore) wrote :

FYI, I have added the missing tempest test for nova os-volume-attachment-update api.
https://review.openstack.org/#/c/156515/1

Sean Dague (sdague) wrote :

Putting in incomplete for nova until we sort out what is actually needed on the nova side. It seems unclear if there is an issue on that side.

Changed in nova:
status: New → Incomplete
importance: Undecided → Medium
importance: Medium → Undecided
tags: added: vo
tags: added: volumes
removed: vo

Change abandoned by Mike Perez (<email address hidden>) on branch: master
Review: https://review.openstack.org/150086
Reason: Over a month with no update.

Mike Perez (thingee) on 2015-08-16
Changed in cinder:
status: In Progress → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
no longer affects: cinder
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers