Instance created by demo user(non-admin), shelved by admin and unshelved by demo user --> ends up in error state

Bug #1675791 reported by Nikolai Korablin
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Pike
Won't Fix
Medium
Unassigned
Queens
Won't Fix
Medium
Seyeong Kim
Rocky
Fix Committed
Medium
Seyeong Kim

Bug Description

Steps to reproduce
===========
1) Login as demo user and create an instance.
2) Login as a admin user navigate to admin panel and shelve the instance (as admin user is able shelve any instance).
3) Login as demo user and try to unshelve the instance shelved by admin user.

Expected : instance should be unshelved
Actual : instance is not shelved but is went to error state.

Concerns
===========
There are two conditions here 1.If this scenarios is not valid admin user should not have an option to shelve the instance , this option should be removed .
2.If this is a valid flow , instance should be unsheleved by the demo user. During the shelve process a snap shot will be created in the instance panel and it will be removed automatically when instance is unshelved. But when admin user is trying to shelve instance a snapshot is created under admin projects instead of demo project . This may be the reason for unshelve failure
Admin user is able to unsheleve an instance shelved by demo user as he is seeing both snapshots.

Environment
===========
Reproduced it with pure stable/Newton in devstack environment
Also reproduced it with Liberty.

Revision history for this message
Nikolai Korablin (nkorabli) wrote :

Please, say what you think about this? What is the valid workflow for this scenario?

Revision history for this message
Nikolai Korablin (nkorabli) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

The fact the instance went into error state on unshelve is not likely related to the tenant doing the unshelve, but rather the instance probably failed to spawn on the compute node during unshelve. Check your nova-conductor, nova-scheduler and/or nova-compute logs for errors.

Changed in nova:
status: New → Invalid
Revision history for this message
Nikolai Korablin (nkorabli) wrote :

Here are logs that I see from nova conductor, during the unshelving

Revision history for this message
Nikolai Korablin (nkorabli) wrote :

These are commands which produced the errors (logs attached above)

Revision history for this message
Nikolai Korablin (nkorabli) wrote :

And these

Revision history for this message
Nikolai Korablin (nkorabli) wrote :
Download full text (7.0 KiB)

-bash-4.2$ openstack server create --image 6b77c526-b408-43c9-8f90-d9fecaba55f3 --flavor m1.tiny test1
+--------------------------------------+----------------------------------------------------------------+
| Field | Value |
+--------------------------------------+----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-STS:power_state | NOSTATE |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | None |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | |
| adminPass | N25cs5PC6Cm3 |
| config_drive | |
| created | 2017-03-24T17:41:01Z |
| flavor | m1.tiny (1) |
| hostId | |
| id | c5e2b76d-46f3-40db-80d3-b0771f7609f9 |
| image | cirros-0.3.4-x86_64-uec (6b77c526-b408-43c9-8f90-d9fecaba55f3) |
| key_name | None |
| name | test1 |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| project_id | 967dab08b95048f9bbc1a42579245c39 |
| properties | |
| security_groups | [{u'name': u'default'}] |
| status | BUILD |
| updated | 2017-03-24T17:41:02Z |
| user_id ...

Read more...

Revision history for this message
Matt Riedemann (mriedem) wrote :

Yeah so I think the problem is what you described in the bug description. The instance is owned by the demo project, but when the admin shelves the instances, the resulting snapshot image is owned by the admin project. The shelve_image_id is stored on the instance, and when we go to unshelve we try to build the instance using that image, which is owned by the admin, so the demo user that is trying to unshelve does not have access to the snapshot image and we get the 404 from Glance, and we fail.

A way to workaround this is for the admin to add image membership access to the demo user on the snapshot image, but that's not great.

I'm not sure if this is intentional or not. I wanted to look at the snapshot flow to see which project ends up owning the snapshot, the context performing the snapshot or the owner of the instance. The snapshot case matters less because the instance owner doesn't necessarily need access to the snapshot image if the admin created the snapshot, but for unshelve the instance owner does actually need access to the snapshot image.

Revision history for this message
Nikolai Korablin (nkorabli) wrote :

Maybe demo user just shouldn't have permission to unshelve it once admin shelved it? I guess it make sense

Revision history for this message
Matt Riedemann (mriedem) wrote :

When Nova creates a snapshot image it passes is_public=False to the nova.image.glance client wrapper, which translates that to visibility=private in glance v2 parlance. This is legacy behavior from when nova only worked with glance v1.

Looking at the image v2 API, it looks like by default the visibility on an image is 'shared':

https://developer.openstack.org/api-ref/image/v2/index.html#images

Reading up on image sharing:

https://developer.openstack.org/api-ref/image/v2/index.html#sharing

That sounds like what we'd want to do here, is if the context.project_id != instance.project_id, then the owner of the image is the context.project_id (admin in this case), and we'd add a shared member to the image which would be the instance.project_id. That way on unshelve, the instance user (demo) has access to the image.

What I'm not sure about is if we have to do anything else besides create the image member, because reading further the docs say:

"When an image is shared, the member is given immediate access to the image. In order to prevent spamming other users’ image lists, a shared image does not appear in a member’s image list until the member “accepts” the image."

So does the demo user have to explicitly accept the image membership to be able to GET it?

I think this answers that question though:

http://specs.openstack.org/openstack/glance-specs/specs/api/v2/sharing-image-api-v2.html#one-one-sharing

"Let the “producer” be the owner of image 71c675ab-d94f-49cd-a114-e12490b328d9, and let the “consumer” be a user who would like to boot an instance from that image.

The producer can share the image with the consumer by making the consumer a member of that image.

To prevent spamming, the consumer must accept the image before it will be included in the consumer’s image list.

The consumer can still boot from the image, however, if the consumer knows the image ID."

In this case, the consumer (demo user) knows the image ID, it's in the instance system_metadata via the shelved_image_id key used in conductor when unshelving the instance:

https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L656

So I think that's the answer to this bug. When we shelve, if the user doing the shelving and owning the snapshot is not the owner of the instance, then we need to add the owner of the instance to have shared membership with the image so they can unshelve it.

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
tags: added: shelve
removed: multitenancy nova
Changed in nova:
status: Invalid → Triaged
Changed in nova:
assignee: nobody → Nikolai Korablin (nkorabli)
Changed in nova:
status: Triaged → In Progress
Revision history for this message
Nikolai Korablin (nkorabli) wrote :

Hey! I just thought - do we even need to share an image ? Basically, demo user doesn't need to manage image itself, right? All he needs is to boot an instance from that image, as I understand.
Or is it better to make it more explicit and share it among "consumer" and "producer"?

Revision history for this message
Matt Riedemann (mriedem) wrote :

The producer in this case is the admin that created the snapshot via the shelve operation. The consumer is the user (owner of the instance) that wants to unshelve the instance and needs access to the image snapshot to unshelve the instance.

As discussed in IRC, I think a simple first step here is writing a test in Tempest that will recreate the bug. I can help with that if needed.

Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → Triaged
assignee: Nikolai Korablin (nkorabli) → nobody
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version newton in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.newton
tags: added: openstack-version.liberty
Damini Chopra (damini)
Changed in nova:
assignee: nobody → Damini Chopra (damini)
Damini Chopra (damini)
Changed in nova:
assignee: Damini Chopra (damini) → nobody
Damini Chopra (damini)
Changed in nova:
assignee: nobody → Damini Chopra (damini)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Damini Chopra (damini) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/630769

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/630769
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=35cc0f5e943642dd8d9dacbf0dac6e260f708d7d
Submitter: Zuul
Branch: master

commit 35cc0f5e943642dd8d9dacbf0dac6e260f708d7d
Author: Matt Riedemann <email address hidden>
Date: Mon Jan 14 15:29:37 2019 -0500

    Share snapshot image membership with instance owner

    When an admin creates a snapshot of another project owners
    instance, either via the createImage API directly, or via the
    shelve or createBackup APIs, the admin project is the owner
    of the image and the owner of the instance (in another project)
    cannot "see" the image. This is a problem, for example, if an
    admin shelves a tenant user's server and then the user tries to
    unshelve the server because the user will not have access to
    get the shelved snapshot image.

    This change fixes the problem by leveraging the sharing feature [1]
    in the v2 image API. When a snapshot is created where the request
    context project_id does not match the owner of the instance project_id,
    the instance owner project_id is granted sharing access to the image.
    By default, this means the instance owner (tenant user) can get the
    image directly via the image ID if they know it, but otherwise the image
    is not listed for the user to avoid spamming their image listing. In the
    case of unshelve, the end user does not need to know the image ID since
    it is stored in the instance system_metadata. Regardless, the user could
    accept the pending image membership if they want to see the snapshot
    show up when listing available images.

    Note that while the non-admin project has access to the snapshot
    image, they cannot delete it. For example, if the user tries to
    delete or unshelve a shelved offloaded server, nova will try to
    delete the snapshot image which will fail and log a warning since
    the user does not own the image (the admin does). However, the
    delete/unshelve operations will not fail because the image cannot
    be deleted, which is an acceptable trade-off.

    Due to some very old legacy virt driver code which started in the
    libvirt driver and was copied to several other drivers, several virt
    drivers had to be modified to not overwrite the "visibility=shared"
    image property by passing "is_public=False" when uploading the image
    data. There was no point in the virt drivers setting is_public=False
    since the API already controls that. It does mean, however, that
    the bug fix is not really in effect until both the API and compute
    service code has this fix.

    A functional test is added which depends on tracking the owner/member
    values in the _FakeImageService fixture. Impacted unit tests are
    updated accordingly.

    [1] https://developer.openstack.org/api-ref/image/v2/index.html#sharing

    Change-Id: If53bc8fa8ab4a8a9072061af7afed53fc12c97a5
    Closes-Bug: #1675791

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/643853

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova-powervm 8.0.0.0rc2

This issue was fixed in the openstack/nova-powervm 8.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/661667

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)
Download full text (3.4 KiB)

Reviewed: https://review.opendev.org/643853
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6ca6f6fce691863901b01d37b8f6e3eadb2bcec4
Submitter: Zuul
Branch: stable/rocky

commit 6ca6f6fce691863901b01d37b8f6e3eadb2bcec4
Author: Matt Riedemann <email address hidden>
Date: Mon Jan 14 15:29:37 2019 -0500

    Share snapshot image membership with instance owner

    When an admin creates a snapshot of another project owners
    instance, either via the createImage API directly, or via the
    shelve or createBackup APIs, the admin project is the owner
    of the image and the owner of the instance (in another project)
    cannot "see" the image. This is a problem, for example, if an
    admin shelves a tenant user's server and then the user tries to
    unshelve the server because the user will not have access to
    get the shelved snapshot image.

    This change fixes the problem by leveraging the sharing feature [1]
    in the v2 image API. When a snapshot is created where the request
    context project_id does not match the owner of the instance project_id,
    the instance owner project_id is granted sharing access to the image.
    By default, this means the instance owner (tenant user) can get the
    image directly via the image ID if they know it, but otherwise the image
    is not listed for the user to avoid spamming their image listing. In the
    case of unshelve, the end user does not need to know the image ID since
    it is stored in the instance system_metadata. Regardless, the user could
    accept the pending image membership if they want to see the snapshot
    show up when listing available images.

    Note that while the non-admin project has access to the snapshot
    image, they cannot delete it. For example, if the user tries to
    delete or unshelve a shelved offloaded server, nova will try to
    delete the snapshot image which will fail and log a warning since
    the user does not own the image (the admin does). However, the
    delete/unshelve operations will not fail because the image cannot
    be deleted, which is an acceptable trade-off.

    Due to some very old legacy virt driver code which started in the
    libvirt driver and was copied to several other drivers, several virt
    drivers had to be modified to not overwrite the "visibility=shared"
    image property by passing "is_public=False" when uploading the image
    data. There was no point in the virt drivers setting is_public=False
    since the API already controls that. It does mean, however, that
    the bug fix is not really in effect until both the API and compute
    service code has this fix.

    A functional test is added which depends on tracking the owner/member
    values in the _FakeImageService fixture. Impacted unit tests are
    updated accordingly.

    [1] https://developer.openstack.org/api-ref/image/v2/index.html#sharing

    Conflicts:
            nova/compute/api.py
            nova/compute/utils.py

    NOTE(seyeongkim): The conflict is due to not having change
    7e229ba40df60b963e0e9450af276c62d4b6bf60 in Rocky.

            nova/tests/functional/test_images.py
 ...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.2

This issue was fixed in the openstack/nova 18.2.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova-powervm 9.0.0.0rc1

This issue was fixed in the openstack/nova-powervm 9.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/queens)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/661667
Reason: Queens is now in extended maintenance mode upstream [1] so I don't think this is something we need to keep around in the review dashboard, especially because it looks abandoned at this point. If someone wants it downstream they can cherry-pick it from here.

[1] https://docs.openstack.org/project-team-guide/stable-branches.html#extended-maintenance

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.