Shared & public images no working with multi-tenant swift backend

Bug #1625075 reported by Andrew Battye
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Glance
Fix Released
Critical
Dharini Chandrasekar
Newton
New
Undecided
Unassigned

Bug Description

Hi,

We are seeing issues when trying to using public and shared images when Glance is configured to use a multi-tenant Swift backend.

Here's what we see :

1. Create a public image in project cf8fc081a9954cef81befb67b4002ce8
2. Attempt to create instance from image in project 67e22ed6876d432d9e48f9bd2a20a527
3. The instance creation fails, with the following log line in the Glance API

Object GET failed: https://objectstore.domain.corp:443/v1/AUTH_67e22ed6876d432d9e48f9bd2a20a527/glance_6e84cb8d-7f09-4f78-8363-a6005e0c51d2/6e84cb8d-7f09-4f78-8363-a6005e0c51d2 404 Not Found

The issue appears to be that the storage url in the swift store driver is determined from the catalog in the context of the current request (which is scoped to the project we are creating the instance in) not project where the image is created.

Looking at the changes introduced here https://git.openstack.org/cgit/openstack/glance_store/commit/?id=68762058cc5d063f3a846b495af03150e648224f it seems to us that storage_url can only contain the account AUTH_[current_context_project_id] and in this case its not clear how a public or shared image from another project can be retrieved from Swift.

Since this is pretty fundamental for the use case we can only assume we are missing some configuration option. The direct url in Glance is stored as direct_url='swift+config://swift-global/glance_c7396e07-484c-4ef3-b54c-9b6ea0cb367e/c7396e07-484c-4ef3-b54c-9b6ea0cb367e' and
 since the driver and location seem to have no information on the image other than the image id its not clear how it could make the distinction between public/shared images and private ones or determine the project if of the shared image.

The only way we can get this to work is first to create an instance on each hypervisor in the project of the shared image. When we do this creating instances in a second project work because the image is cached on the hypervisor - obviously this is not a viable workaround.

Any information on how to get this scenario working would be much appreciated.

Thanks

Andrew

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

If anyone has cycles to see if they can reproduce this in devstack that would be great.

If this is a real issue it would be great to not have a broken multi-tenant swift store in Newton.

(And to update our glance_store swift functional tests to exercise this.)

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

From the referenced commit message:

"Multi-tenant store doesn't need to fetch swift storage_url from store location because it can fetch it from service catalog in user context."

I'm not sure this is the case -- as the bug reporter notes, the tenant id is part of the swift URL.

Also, I'm not sure that the store the image was saved in is guaranteed to be the same swift store that's in the current keystone catalog.

Changed in glance:
importance: Undecided → Critical
Changed in glance:
assignee: nobody → Dharini Chandrasekar (dharini-chandrasekar)
Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

dharinic has offered to help out with this bug. I've assigned it to her and will be available for further input as required. Unfortunately, I don't have much time to look into it first hand.

NOTES:

1. Please keep updating the status as we progress and more information is obtained.
2. We are seriously considering it for Newton. But we need some answer/clarity on the situation by EOD today to see if we can ask exception in requirements for bump.
3. Otherwise, we will have to backport it to stable/newton and create a release out of it. This won't reflect the requirements but a good release note on the patch should give indication to the operators to keep their store updated in case of multi-tenant.
4. I've marked it as critical as it affects a foundational/primary functionality but TBH this is narrow scoped to only one type of store driver so I'm considering it for Newton requirements bump only based on Stuart's comment for good faith. Ideal scenario is still backport that may take it's own time :|

Revision history for this message
Dharini Chandrasekar (dharini-chandrasekar) wrote :

I tried to reproduce this bug on devstack.
I had swift enabled with multi tenancy on my devstack environment.

Reproduction: http://paste.openstack.org/show/581468/

Revision history for this message
Andrew Battye (3-andres-s) wrote :

Hi Dharini,

Is it possible to share your glance store configuration with us? And also the value stored in the direct url property of the image - which appears to be missing in the image you used to reproduce, an example image show for a problem image in our environment is here http://paste.openstack.org/show/581713/ - we have show_image_direct_url= True in glance-api.conf

Could you also possibly share the swift endpoints you have in the Keystone catalog? We have http://paste.openstack.org/show/581720/

We can consistently reproduce in our environment and can only fix it with a very hacked patch where we introduce awareness of the image project into the connection manager http://paste.openstack.org/show/581717/ - with this patch it works - but obviously its no long term solution.

I can only see it working if the image project id is somehow persisted in the direct_url or some other property passed to the store/connection manager. Since its working for you I can only assume we have something wrong in our configuration.

Thanks

Andrew

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

@Dharini

Also, if you could post the glance-api/glance-registry/swift logs. If the server is in debug mode
it may be possible to figure out the api requests from glance to swift.

Ideally, if you're running with http (not https), you can install ngrep and dump the API request traffic as it goes over the wire and see exactly what's going on, eg to dump glance requests:

 ngrep -d eth0 -W byline port 9292

to dump swift proxy requests

 ngrep -d eth0 -W byline port 8080

see here for some examples: http://wiki.christophchamp.com/index.php?title=Ngrep

(It might be easier to use a one byte test image and just do 'glance image-download <id>' rather than doing a full nova boot)

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

@Andrew/Dharini

Which version of glance server are you using?

Revision history for this message
Andrew Battye (3-andres-s) wrote :

We are using Glance in a Kolla container image build from source with Glance stable/mitaka commit ab4519f9bc9a093ba0ac3b1d8c6d77a9312f6319

Revision history for this message
Dharini Chandrasekar (dharini-chandrasekar) wrote :

@Andrew,

My devstack is picking swift store with scheme swift+config. So when I create an image, I do not get a direct url even if I set the conf opt to enable that.
So here is my log from the Glance DB: http://paste.openstack.org/show/582320/
The image_location table have the location "value". (lines 41 to 62)

And here is my swift endpoint on keystone catalog: http://paste.openstack.org/show/582321/

Apart from this I only have the relevant configuration options set in my glance-api.conf.

Please let me know if there is something that I am missing that I am not able to reproduce this bug. Cos in my devstack environment, it seems to be working fine for multi tenants.
http://paste.openstack.org/show/582322/

Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

@Dharini: Are you using latest (master) on devstack?

Also, please enable show_multiple_locations to see the url with v2 API.

Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

@Dharini: I checked the location string from both Andrew's and your paste file. Looks like there's some misconfiguration that is resulting into you using single tenant store? Please check.

Revision history for this message
Andrew Battye (3-andres-s) wrote :

Hi,

Something seems fundamentally different to our environment. We see a direct URL in the form swift+config://[SWIFT_CONFIG_KEY]/[CONTAINER]_[IMAGE_ID]/[IMAGE_ID]. The values you see are swift+config://[SWIFT_CONFIG_KEY]/[CONTAINER]/[IMAGE_ID].

Based on my read of the code here https://github.com/openstack/glance_store/blob/master/glance_store/_drivers/swift/store.py#L1338 it implies, for whatever reason, your devstack is not using the multi tenant store implementation. Which then, I suspect, might suggest your images are actually going into the same tenant/project (and in turns explains why you can't reproduce).

Can you verify if the images in your case are actually going into containers in the same or different tenant(s)?

Thanks

Andrew

Revision history for this message
Dharini Chandrasekar (dharini-chandrasekar) wrote :

Thanks Andrew and Nikhil. I will see what is causing this issue and get back.
These are the opts that I enabled for glance_store in glance-api.conf:
[glance_store]
stores=swift,file,http
default_store=swift+http
swift_store_create_container_on_put=True
swift_store_multi_tenant=True
swift_store_auth_address=http://192.168.0.69:35357/v3

Andrew, can you confirm?

Stuart and Nikhil: I am using latest master on my devstack. There seems to be some misconfiguration that is not enabling multi tenancy on my devstack environment.

Thanks,
Dharini.

Revision history for this message
Dharini Chandrasekar (dharini-chandrasekar) wrote :

The problem was with my configuration option for swift multi tenancy not being identified.

So after resolving that issue, I could reproduce this bug as here:
http://paste.openstack.org/show/582455/

With multi tenancy enabled, the image is not being available across another tenant thereby returning a 404.

My image details with the swift direct URL and location are as under:
http://paste.openstack.org/show/582456/

Revision history for this message
Dharini Chandrasekar (dharini-chandrasekar) wrote :

An update:
An instance created from an image by an user belonging to another project causes the instance to go to ERROR state.
http://paste.openstack.org/show/582461/

The information from the nova database: http://paste.openstack.org/show/582463/

This is my n-api logs when a demo user from a demo project tries to create an instance with the image:
http://paste.openstack.org/show/582465/

Revision history for this message
Andrew Battye (3-andres-s) wrote :

Hi,

That exactly reproduces our problem -if you check the Glance logs I think you will see a 404 from Swift showing the current project in the Swift account. As posted earlier the patch to the connection manager fixes the issue - where we determine and swap the owner project for the image before retrieving from Swift. However, we are not sure of the best way to properly introduce the owner information to the driver specific code without some significant changes to the driver API. I guess one way would be to rescope the context for public/shared images to the owner project before getting the catalog for the image storage url ?

Thanks

Andrew

Revision history for this message
Dharini Chandrasekar (dharini-chandrasekar) wrote :

Andrew, Yeah the logs have a 404 returned.

Will look into this and discuss with the team and see what they think can be a good solution and work on it.

Thanks for your patience.

Dharini.

Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

@Dharini: please bring this up in the glance meeting tomorrow (we've a few slots open).

Changed in glance:
status: New → Triaged
Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

Based on Dharini's confirmation, I've updated the status.

Revision history for this message
Dharini Chandrasekar (dharini-chandrasekar) wrote :

So after some digging and discussions with hemanthm with respect to this bug:

When multi-tenancy is enabled, it is required that the user's context be used to fetch the storage url of the shared/public image. So the reason for this bug is definitely not the change introduced in https://git.openstack.org/cgit/openstack/glance_store/commit/?id=68762058cc5d063f3a846b495af03150e648224f cos thats how it should be working (unlike single-tenant where images are stored in Glance's account).

Also the logs I shared after trying to reproduce this bug throw a 404 cos of the flavor being inaccessible and not cos the image could not be accessed/downloaded. So at this point, not sure if this bug is reproducible the exact way as reported.

Andrew,
Your hack patch changes the project owner in the url fetched. But I am not sure if that should be the case cos if the image is public or even shared, the second user's context should be good enough to fetch the image according to the way public/shared images in Glance work.

Will work more on this and try to find the exact issue here. Until then, any progress and findings if shared, would help a lot.

Thanks,
Dharini.

Revision history for this message
Andrew Battye (3-andres-s) wrote :

Hi,

I'm still pretty sure that the problem (at least in our environment) is casued by the fact that the catalog is based on the context of the calling user and hence uses the project id of the calling use context, rather than the one shared image was created in.

Since the multi tenant store driver implementation has no knowledge of the image source project I cannot see any way this could possibly work. The reason my 'hack' works is because the Swift account (i.e. the uri segment "AUTH_[project_id]") has to reflect the project that the image binary was created in Swift - I retreive and change it in the driver so it can access the image content.

If this is not done, as I understand Swift, its not possible to retreive the image content from Swift. As long as the calling user's context is passed the Swift URL will always be wrong since the account used is the project of the calling user not the image project.

So the only way I can see this working is if the store driver generates a Swift URL that reflects the account in swift where the image was created.

We recently saw a similar issue in Barbican where access to secrets was not possible for users in a context other than the project the secret was created in (here's the fix https://github.com/openstack/barbican/commit/14ae36d3e57cce35ae6320d6e53b708e1a83cb7a) in an analogous problem I think.

If there is anything we can do to help clarify the issue further please let me know.

Thanks

Andrew

Revision history for this message
Hemanth Makkapati (hemanth-makkapati) wrote :

Dharini and I spent sometime on this and it is indeed breaking the expected behavior around public and shared images like Andrew reported here.

From what we've seen, a couple of things are at play here.

#1. Swift multi-tenant store and swift+config scheme don't work well together. swift+config scheme is primarily meant for single-tenant store. If it is configured with multi-tenant store, the default reference configured is always used to access swift objects, which is orthogonal the idea of a multi-tenant store. There are a couple of ways this could be handled.

  A) Add validation to glance-api such that it fails to start up when both multi-tenant store and swift+config are enabled. While this prevents a certain issue, it may mean that an operator using single-tenant store with swift+config now cannot switch to multi-tenant store. Not sure if anyone intends to do such a thing but it is certainly possible. This brings us to option B

  B) To address the issue mentioned with option A, it is probably best to teach multi-tenant store to ignore the swift config file unless it comes across an image that was previously created using single-tenant store and swift+config. Any images that are created after the switch to multi-tenant store should remain unaffected whether or not a swift config file is provided.

I think Option A is the easiest fix for now before we figure out how to do option B

#2. The change that Andrew linked [0] doesn't cause the behavior reported in this bug directly. However, it is a similar change that went into connection manager which is responsible. Essentially, pulling the storage_url out of catalog won't work if the image one is trying to access is not created by him/her. So, we should always refer to the location mentioned in the database via location.swift_url.

One may ask, then why did we ever shift to using the storage_url from catalog in the first place?
Well, my theory is that it was an inadvertent behaviour introduced trying to fix [1]. The bug reported in bug [1] is exactly what we discussed above in issue #1. Both multi-tenant store and swift+config were set and it is because of this that the keystone url appears in the image location. To fix this, I think we went ahead and switched to using storage_url from the catalog. But, [1] really didn't need a code fix. It is just a weird behavior resulting from two orthogonal features at play.

To summarize, I think we need two patches here:
One to switch back to using location.swift_url instead of using storage_url from the catalog.(Do this in the connection manager too). And two, maybe add validation around configuration when both multi-tenant and swift+config are enabled. (Or figure out how to make multi-tenant store ignore swift+config)

Hope this made some sense. This was a tricky one to make sense of.
Let me know if my assessment is off here.

[0] https://git.openstack.org/cgit/openstack/glance_store/commit/?id=68762058cc5d063f3a846b495af03150e648224f
[1] https://bugs.launchpad.net/swift/+bug/1511025

Revision history for this message
Dharini Chandrasekar (dharini-chandrasekar) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/glance_store 0.19.0

This issue was fixed in the openstack/glance_store 0.19.0 release.

Changed in glance:
status: Triaged → Fix Released
no longer affects: glance (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.