Bug #1321906 “[EDP] Swift credentials passed in plain text” : Bugs : Sahara

Trevor McKay (tmckay) on 2014-05-21

description:

updated

Trevor McKay (tmckay) on 2014-05-22

tags:

added: security

Sergey Lukjanov (slukjanov) on 2014-05-22

Changed in sahara:
milestone:	none → juno-1
milestone:	juno-1 → juno-3

Revision history for this message

John Dickinson (notmyname) wrote on 2014-05-22:

#1

It sounds like Swift's tempurl feature would work perfectly for this. It's based on a shared secret stored in the Swift account metadata.

http://docs.openstack.org/developer/swift/middleware.html#tempurl

Revision history for this message

Robert Clark (robert-clark) wrote on 2014-05-23:

#2

+1 I think Tempurl could be a solution for most of what you've described :)

Revision history for this message

Thierry Carrez (ttx) wrote on 2014-05-26:

#3

not covered by OpenStack VMT yet (no stable release yet)

Changed in ossa:
status:	New → Won't Fix

Revision history for this message

Sergey Lukjanov (slukjanov) wrote on 2014-05-27:

#4

Yup, it sounds like tempurl could be used instead of credentials in our case.

Revision history for this message

Trevor McKay (tmckay) wrote on 2014-05-27:

#5

Sergey,

I have a few concerns about it in our case...

1) We have data objects stored for a long time in Sahara -- could be weeks, months, years, who knows. That's okay, though, because we could generate the tempurl on the fly when a job is submitted and put the updated url in the job

2) How do we set the expiration date? Really hard to know when a job will run, or how long it will take. Could be a huge job, and a busy cluster. Or a tiny job on a quiet cluster.

3) Is there a way to invalidate the url when the job is finished?

4) Although we make the problem a lot better, we still have the issue that Oozie/Hadoop logs would expose an authorized URL, so that someone could read (or write in the case of an output url, authorized for PUT) the data sources. True, it's only the particular data source, and not the whole swift instance, but it's still an exposure.

I had an idea about the above -- what if we used tempurls, and then encrypted them using public/private keys created for the hadoop user? Hadoop could decrypt the tempurl, and then access it. Then log entries wouldn't matter. This would require the Hadoop swift patch to recognize an encrypted url, look up the key file for decryption (probably a configured path) and decrypt. Too crazy?

Changed in sahara:
importance:	Undecided → High
assignee:	nobody → Trevor McKay (tmckay)
status:	New → Triaged

Sergey Lukjanov (slukjanov) on 2014-05-27

Changed in sahara:
importance:	High → Critical

Revision history for this message

Michael McCune (mimccune) wrote on 2014-06-05:

#6

After doing some digging around I'm curious if we can't solve this problem using either trusts and delegation(if we are able to with respect to backward compatibility) or just using Keystone projects.

I can imagine a workflow where Sahara would create a new project on cluster creation. The project would contain the access rights for the Swift data sources. Then as instances get deployed, Sahara would gain a token for each instance and place it on the instance during provisioning. During job execution the instances would use the token to authenticate with Swift for access to the data objects. At job completion, cluster destruction, or when the user is finished, the project and tokens could be revoked from Keystone.

I think this would solve some of the issues surrounding passing a cleartext credential around, and it would give Sahara greater control over the duration of access to the data.

Some difficulties I can see with this method surround the orchestration of the Keystone projects. Especially with regards to the Swift object stores. The Swift objects would need to be included in the project during cluster creation and this might create a disjointed workflow for users if they wish to reuse data objects. Also Hadoop would likely need to be patched to use a Keystone token instead of username/password for communicating with Swift. All of these changes would have an impact on the UI for an entire job process.

Additionally we would still be storing a credential file on the instance. Sahara would just be avoiding passing it through the workflow.xml in favor of using ssh during provisioning to place the credential.

If we are able to use the newer trust/delegation mechanisms then this task could be made easier by allowing Sahara to perform one-time delegation of tokens to the instances.

Another point to consider for the future is multi-tenancy hierarchies, wherein Sahara could create more complex structures of projects to help deliniate which clusters and jobs have access to specific data objects.

After doing some digging around I'm curious if we can't solve this problem using either trusts and delegation(if we are able to with respect to backward compatibility) or just using Keystone projects.

I can imagine a workflow where Sahara would create a new project on cluster creation. The project would contain the access rights for the Swift data sources. Then as instances get deployed, Sahara would gain a token for each instance and place it on the instance during provisioning. During job execution the instances would use the token to authenticate with Swift for access to the data objects. At job completion, cluster destruction, or when the user is finished, the project and tokens could be revoked from Keystone.

I think this would solve some of the issues surrounding passing a cleartext credential around, and it would give Sahara greater control over the duration of access to the data.

Some difficulties I can see with this method surround the orchestration of the Keystone projects. Especially with regards to the Swift object stores. The Swift objects would need to be included in the project during cluster creation and this might create a disjointed workflow for users if they wish to reuse data objects. Also Hadoop would likely need to be patched to use a Keystone token instead of username/password for communicating with Swift. All of these changes would have an impact on the UI for an entire job process.

Additionally we would still be storing a credential file on the instance. Sahara would just be avoiding passing it through the workflow.xml in favor of using ssh during provisioning to place the credential.

If we are able to use the newer trust/delegation mechanisms then this task could be made easier by allowing Sahara to perform one-time delegation of tokens to the instances.

Another point to consider for the future is multi-tenancy hierarchies, wherein Sahara could create more complex structures of projects to help deliniate which clusters and jobs have access to specific data objects.

Revision history for this message

Trevor McKay (tmckay) wrote on 2014-06-09:

#7

mimccune,

reusability of data objects is definitely something we want to preserve, in my mind it's the main reason they exist as separate objects and are persisted in the database. Otherwise we would just specify paths on job launch.

I like the idea of "newer trust/delegation" mechanisms and one time delegation to the instances. That sounds the cleanest to me.

What do you think of my earlier suggestion, to encrypt credentials using public/private keys set up during cluster generation? The Hadoop patch could decrypt username and password stored in the hadoop config using the private key, and lookup the credentials as it does now. Sahara could even store them encrypted in the database. The rest of the current mechanism stays the same.

It may not be great, but if something stops us from using one-time delegation it would be a bridge which is at least superior to plaintext.

Thoughts, everyone?

Revision history for this message

Michael McCune (mimccune) wrote on 2014-06-09:

#8

Trevor,

I agree that hashing the credentials, even if it's just with the known public/private keys is better than leaving them as cleartext. I'm wondering if we can reach a solution where the credentials to the Swift obejct are not needed and the security gating is performed solely by Keysotne. I'm not sure if that's possible but I think it would simplify things.

I also agree that if we can't use the trust/delegation methodology it will be beneficial to hide the credentials. Is it possible we could load an instance with a file based credential that is outside the workflow.xml and change the Hadoop portion to look for credentials outside the xml?

Revision history for this message

Andrew Lazarev (alazarev) wrote on 2014-06-09:

#9

>Sahara could even store them encrypted in the database.
Sahara stores hadoop private key in the same database. So, sahara side will have everything to decrypt and hadoop side will have everything... what is the point? just hashing?

Revision history for this message

Trevor McKay (tmckay) wrote on 2014-06-09:

#10

Why would we store the private key in the Sahara database?

If we generate the private key on the instances during cluster launch, and store only the public key(s) on the node running Sahara, we can encrypt using the public key, and only the instances can decrypt it, right? Isn't that the whole point of public/private keys?

I'm imagining that Sahara encrypts the username/password before writing to the database. It never needs to access data source objects directly.

For job binaries, Sahara still needs a way to get the binaries without exposing credentials

Revision history for this message

Michael McCune (mimccune) wrote on 2014-06-09:

#11

Andrew,

I think the larger point here is to not have cleartext credentials in the workflow.xml. That would be the main point to encrypting the credentials for the instances, then they could perform the decryption at the time of Swift interaction. Thus eliminating the cleartext credentials.

Trevor,

To do what you are suggesting we will need to get the public keys from the instances when they are spawned, probably during provisioning, and then generate the ciphertext credentials for the database/distribution. We will also need to keep track of which credentials belong to each specific instance, unless the instances share a private key(which I don't think they do).

I'm still of the feeling that if we can use Keystone projects to limit access to the data sources, that will give us a clean to way gate all these transactions. Even if we don't use the trust/delegation model we can still create projects that will contain all the instances which need access(including the root Sahara node). What are the limitations to using Keystone in this manner?

Revision history for this message

Trevor McKay (tmckay) wrote on 2014-06-09:

#12

Mike,

> To do what you are suggesting we will need to get the public keys from the instances when they are spawned, probably during > provisioning, and then generate the ciphertext credentials for the database/distribution. We will also need to keep track of > which credentials belong to each specific instance, unless the instances share a private key(which I don't think they do).

Ack, this is what I was thinking. Again, not great but maybe a fallback if we are waiting for something not quite consumable from keystone.

Revision history for this message

Michael McCune (mimccune) wrote on 2014-06-25:

#13

I've been doing some testing with the Keystone trust delegation mechanism and I think there is merit to using it for this solution. Here is a general overview of how we might solve this issue:

1. When a new job that involves Swift objects is created, a trust for the context user’s role is delegated from the context user to the Sahara admin user.

2. A periodic task is started that will acquire a token associated with the delegated trust. This task will place the token in a key/value store on all instances in the cluster. It will update the token based on the timeout value dictated by Keystone.

3. When an instance needs to access a Swift object it uses the token in it’s local key/value store to access that object.

4. When a job has ended, the delegated trust will be revoked from the Sahara admin user, the stored token on all cluster instances will be removed, and the periodic task associated with the job will be terminated.

There are some options we could excercise regarding this solution.

Step 2 could be changed to distribute the trust id instead of an authentication token. In this manner, each instance would need to acquire an authentication token from the trust id. This might be simpler from the perspective of not needing a periodic task in Sahara, but would create much more traffic to the Keystone controller, and would require the Swift-Hadoop plugin to become more aware of when a token needs to be updated.

The key/value store on each instance has not been defined. I was thinking we could use a system wherein the authentication token for a job is stored relative to it's job id, assuming this is unique. In this respect the Swift-Hadoop plugin on each instance would only need to pull the id from it's workflow, look up the token, and then access the Swift objects.

Some issues to consider:

I can imagine a case where a user is part of multiple Keystone projects. It may become difficult to determine what project to delegate trust from. In these cases we might need to end up delegating more trust than is need, or figure out a way to determine which project the Swift endpoint belongs to. I'm not sure there is a clean way to solve this currently.

This also assumes that delegating a "Member" role to the Sahara user will be sufficient to access the Swift objects. There could be configurations where this is not strictly true. In these cases we would need to make sure that there is documentation to support the assumptions we make about a user's access to Swift objects.

This solution will also require that we use Keystone v3 to gain access to the trust/delegation methods.

Another point to consider is how Sahara will keep hold of the trust id. It might be worthwhile to ensure that Sahara only keeps a trust id in memory instead of committing to a database. Because the trust can always be regenerated it might not be necessary for Sahara to store the trust id for long periods of time. In the case of a catastrophic failure of the Sahara controller, it could always regenerate the trust and distribute the tokens as usual.