Improve Password Management and Authentication

Bug #1468422 reported by Bill Erickson on 2015-06-24
258
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Undecided
Unassigned

Bug Description

Evergreen 2.8 -- affects all versions.

Creating this as a public security bug as discussed in the Evergreen security list.

Evergreen stores passwords in the database as MD5 hashes. This is a vast improvement over storing plain text passwords, but it could be even better. There are a number of details to work out, but the basic plan discussed in the security list is something like:

1. Store passwords in a separate table which is only indirectly accessible to the application. (I.e. not exposed to the IDL).
2. Use salted passwords
3. Use stronger encryption (e.g. bcrypt)
4. Devise a password migration scheme

Bill Erickson (berick) wrote :

For the sake of discussion, I've pushed a branch which includes the password table, password type table (as discussed via security list), a function to migrate existing passwords, and a function to verify a new-style password. It assumes no external salt will be used.

The XXXX script performs a batch password migration and includes a small set of inline tests, which assume the presence of concerto data. In its current form, the whole script rolls back. Nothing is committed.

There's much to discuss, but the basic form works as expected.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/berick/lp1468422-passwd-storage

Bill Erickson (berick) wrote :

Additional experiments suggests a mass password migration is not feasible. With the current iter_count (work factor) of 10, hashing takes .1 seconds. In a database with 1 million patrons, this would take ~27 hours to complete. Plus, we probably want a larger iter_count, one that takes about a second to calculate (iter_count "14" on my server) for increased security. That would take about 11 days.

I'll plan to rework the code to support real-time migration. With this, we still have the option of performing batches of password migrations, e.g. via cron after hours, to ensure that all passwords are eventually migrated.

Bill Erickson (berick) wrote :

Changes pushed. From the commit:

Don't rely on mass password migration, since it would take a very long time. Instead, migrate users on demand.

Raise work factor (iteration count) from 10 to 14.

Current flow:

1. Application requests a salt to use as CHAP-style seed
2. If new-style password exists, salt is returned.
3. Else, old password is migrated and the new salt is returned.
4. App finalizes login by checking verify_passwd.

--

With this, batch migrations could still be performed by looping over actor.migrate_passwd(user_id) as a separate process.

Simple tests still pass.

Bill Erickson (berick) wrote :

Pushed some more stuff. Tests are now pgtap tests. (Script will fail of pgtap extension is not installed). Added a lot more comments. Added support for storing more crypt info (iter_count) in the password type (so it can be raised in the future) and support for storing un-encrypted (or externally crypted) passwords, i.e. passwords we don't crypt() directly.

--

Looking at the API, I've run into a snag with backwards compatibility. If we're to use the main password salt as the seed value returned by open-ils.auth.authenticate.init, we need to know which user is attempting to login during init. However, the API only requires the caller to specify if it's using a username or barcode in the secondary authenticate.complete call. The init call has no knowledge of which patron is trying to login.

To avoid modifying the client API's to specify username or barcode in authenticate.init, we need to add logic to authenticate.init to determine for ourselves if the value provided is a username or barcode, so we can lookup the patron and find their salt. This may be a blessing in disguise, since we should probably be doing this on the server-side anyway. Sound sane? Am I missing anything?

Thomas Berezansky (tsbere) wrote :

If we are going this way I think we should modify authenticate.init to support *optional* specifying of username vs barcode (in a new type param, perhaps?) so that an interface that should only ever be accepting one of the two (say, the staff client only wanting usernames) doesn't get tripped up by the backend trying to guess and getting it wrong. If you haven't specified which on the init call then attempt auto-detect. Then we can change the complete call to make that information optional there as well or ignore it and always use what the init call actually used (stored in memcache?).

Mike Rylander (mrylander) wrote :

Bill, I don't see any problem with recording a bit more info with the "seed" blob in memcache resulting from the .init call.

Thomas, I think having an API (or, 2, really) for "strictly either username or barcode" login is fine, but I'd suggest that refactoring the existing code to be strict based on the method name by which it is called (or not be strict, if called with today's name) is better than adding an optional parameter. We can just add ".init.barcode" and ".init.username" variants, and be strict based on what was called.

Thoughts?

Bill Erickson (berick) wrote :

Sounds good. To recap:

Create new API calls .init.barcode and .init.username. Both are strict about treating a barcode as a barcode / username as a username. Barcode/username info is stored in the cache along with the auth seed, nonce, etc. (Should also toss the user ID in to avoid another search in .complete).

Modify the existing .init API to perform barcode vs. username detection, using the same algorithm in place w/ the TPAC. (Check org settings, etc. Later, we could remove this logic from the TPAC to avoid multiple variants of the same logic). Once determined, call the appropriate .init.barcode or .init.username handler.

Modify the .complete API to read the new barcode/username data from the cache, and honor that data over anything provided by the caller.

Bill Erickson (berick) wrote :

API changes pushed. SQL upgrade now commits. Tests moved into Pg/live_t.

This add libpcre to the build process, so an autoreconf -i / configure is probably in order.

One caveat with the API changes is that the .init calls have no concept of a context org unit, so the barcode_regex org unit setting is based on the root org unit. We could add a call parameter -- and probably should -- but unmodified clients will be beholden to the consortium-level barcode_regex setting when calling the existing .init API.

Basic tests work. Can log in with username and barcode from catalog. The passwords are migrated. The auth check takes about a second on my dev VM. The code needs lots of edge case and memory leak testing.

Dan Wells (dbw2) wrote :

I've loaded and tested this code on a development server, and so far everything has worked as expected. I had some build issues (even with autoreconf -i / configure), but I'm fairly sure they were related to a recent OS upgrade and not this branch. I still need to test on a "fresh" machine to be 100% confident, though.

My next goal will be to load this into production. We use LDAP (via auth_proxy) primarily for authentication, so it won't be a huge risk for us, but we have enough "standard" login traffic that it should still be a far better test than I can do privately.

Thanks, Bill!

Bill Erickson (berick) wrote :

Recapping IRC conversation:

We've hit a snag with AuthProxy. When a user authenticates externally, AuthProxy performs a login internally via open-ils.auth to create a login session for the externally-verified user. It does this by pulling the password from the DB and using it to log the user in. In the new world order, the middle layer code cannot access the password and, even if it could, it would not be able to use it, since open-ils.auth needs the real password w/ some md5 hashing, not a bcrypted password.

Proposed solution was to create a new private service which implements the contents of oils_auth.c:oilsAuthHandleLoginOK(). It would take a user (probably by id) and add the user to the auth cache. With this, services besides open-ils.auth can "log in" a user without knowing the user's password. oils_auth.c would be modified to call this new service instead of continuing to use its own implementation to avoid duplication.

In essence, open-ils.auth would be broken up into open-ils.auth and open-ils.auth-internal. AuthProxy would call open-ils.auth-internal to log in externally-verified users.

Bill Erickson (berick) wrote :

Here's a parallel branch implementing a new open-ils.auth_internal service. It's only job is to put users into the auth cache. open-ils.auth calls this service after login has succeeded. Posting here for comment / review while I prepare to merge this into the main branch above.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp1468422-auth-internal

Mike Rylander (mrylander) wrote :

Bill,

The new service looks good. In scanning the patch set, you use auth_internal for the service name, but register the method with a name containing auth-internal (_ vs -). I personally prefer - but I know XML::RPC is not a fan of that spelling. And, of course, it'll work just fine as it is right now, but may confuse devs calling the service.

Bill Erickson (berick) wrote :

Oops, thanks, Mike! I started with "-", before remembering other services use "_", presumably because of the xml::rpc restriction. Fix pushed to resolve the dangling bits. I'll start merging these changes into the main branch soon.

Bill Erickson (berick) wrote :

Rebased original branch (collab/berick/lp1468422-passwd-storage) to master, pushed a few more fixes, and merged the auth_internal code.

Known TODO items:

1. Teach AuthProxy to use the new open-ils.auth_internal API for caching authenticated users.
2. implement password updates in open-ils.actor.patron.update and open-ils.actor.user.password.update
3. Look for any other code that directly references actor.passwd

Bill Erickson (berick) wrote :

Code pushed to resolve #2 above.

WRT Perl code, #3 above is also resolved. New code implements the standalone password update API, the password reset API, and SIP password verification.

To share code (and for various other reasons), I migrated the user update code from open-ils.storage to open-ils.cstore. Testing looks good so far, but we'll want to do a good bit of user update testing while testing this branch.

Bill Erickson (berick) wrote :

Correction: #3 is resolved, minus the pending work on #1 (AuthProxy)

Dan Wells (dbw2) wrote :

We've got this code up and running (with AuthProxy modifications) in production, and it's working well so far. Excellent work, Bill!

Having put it through it's paces for this type of use, I do have a few questions.

First, the easier ones. The current auth API uses 'type' and 'org' for argument names, but the auth_internal code expects 'login_type' and 'org_unit' for the same values. Should we consider keeping the names the same instead? It's slightly more predictable, but if we are instead trying to purposefully improve the names, I can see that side as well. Just wanted to point it out and see what others felt.

Second, we may want to consider splitting apart or branching some parts of oilsAuthComplete() a little more. When AuthProxy.pm was a straight-up wrapper for the real authentication, we got a lot of value for free. In short, we got all of what I would call user "validation" (card is active, user not barred, user active, user has login perm). In the current setup, we would have to redo all that in the proxy layer (not a huge deal, but clumsy).

One possibility is to shove more of these checks (and probably stats, etc.) down into the auth_internal layer (either in the session.create call, or (maybe better) in a separate call or some kind). We probably want to avoid doing any kind of password-less validation in the public side, since anyone could potentially learn (admittedly unexciting) private information about any user account.

Thoughts? I am willing to take a stab at it if needed, but might be barking up the wrong tree.

On Wed, Dec 2, 2015 at 5:58 PM, Dan Wells <email address hidden> wrote:

> We've got this code up and running (with AuthProxy modifications) in
> production, and it's working well so far. Excellent work, Bill!
>

Thanks Dan!

>
> Having put it through it's paces for this type of use, I do have a few
> questions.
>
> First, the easier ones. The current auth API uses 'type' and 'org' for
> argument names, but the auth_internal code expects 'login_type' and
> 'org_unit' for the same values. Should we consider keeping the names
> the same instead? It's slightly more predictable, but if we are instead
> trying to purposefully improve the names, I can see that side as well.
> Just wanted to point it out and see what others felt.
>

My goal was to make the parameter names more explicit. Perhaps consistency
is more clear, though. No strong opinions from me on that.

>
> Second, we may want to consider splitting apart or branching some parts
> of oilsAuthComplete() a little more. When AuthProxy.pm was a straight-
> up wrapper for the real authentication, we got a lot of value for free.
> In short, we got all of what I would call user "validation" (card is
> active, user not barred, user active, user has login perm). In the
> current setup, we would have to redo all that in the proxy layer (not a
> huge deal, but clumsy).

> One possibility is to shove more of these checks (and probably stats,
> etc.) down into the auth_internal layer (either in the session.create
> call, or (maybe better) in a separate call or some kind). We probably
> want to avoid doing any kind of password-less validation in the public
> side, since anyone could potentially learn (admittedly unexciting)
> private information about any user account.
>

+1 to moving that into auth_internal. I also like making that a separate
API call for modularity and in case there is need in the future to bypass
the validation checks for certain types of logins.

>
> Thoughts? I am willing to take a stab at it if needed, but might be
> barking up the wrong tree.
>

That would be much appreciated.

-b

Mike Rylander (mrylander) wrote :

On Thu, Dec 3, 2015 at 9:59 AM, Bill Erickson <email address hidden> wrote:
> On Wed, Dec 2, 2015 at 5:58 PM, Dan Wells <email address hidden> wrote:
[snip]
>>
>> First, the easier ones. The current auth API uses 'type' and 'org' for
>> argument names, but the auth_internal code expects 'login_type' and
>> 'org_unit' for the same values. Should we consider keeping the names
>> the same instead? It's slightly more predictable, but if we are instead
>> trying to purposefully improve the names, I can see that side as well.
>> Just wanted to point it out and see what others felt.
>>
>
> My goal was to make the parameter names more explicit. Perhaps consistency
> is more clear, though. No strong opinions from me on that.
>

My preference would be to keep one set of names, and to use the
existing names. Parity seems more important than name-based
documentation, as I think future developers using those API calls will
likely cargo-cult the existing public-side code.

my $0.02...

-miker

Galen Charlton (gmc) wrote :

Cross-referencing bug 1526558, which has implications for any tweaks that might get made to AuthProxy and auth_internal. In particular, it would be nice to have a way to *only* authenticate users without automatically creating a new session.

Dan Wells (dbw2) wrote :

I've gone ahead and pushed to the collab branch the fairly simple changes needed to make AuthProxy.pm not completely fall over. It is now at least functional, but these commits do not resolve the bigger "validation" questions raised above.

Bill Erickson (berick) wrote :

Dan, are you still planning to work on the "validation" steps? If not, I'll take a look.

Dan Wells (dbw2) wrote :

Bill, I've started moving pieces around, but frankly I have small enough confidence in this area that I am approaching it more as exercise, and would not object in any way to you forging ahead. If nothing else I'll be more prepared to test whatever you come up with, and if we're lucky, comment intelligently on it :)

Bill Erickson (berick) wrote :

Thanks for the update, Dan. I'll see if I can make some headway here.

Bill Erickson (berick) wrote :

Pushed commit to implement open-ils.auth_internal.user.validate, including changes to open-ils.auth to use the new code. This does not include auth_proxy changes. I'll leave that to Dan for now.

Latest master commits are also merged into this branch.

Dan Wells (dbw2) wrote :

Hey Bill, looks good. My cobbled version wasn't so far off after all, so maybe I'll be more confident next time :)

On first pass, the only concern I had when I was working on it (which is still present in your version) is what to do about the "PATRON_CARD_INACTIVE" event. Your code replicates the current behavior properly (it sends it along to the client), but I'm wondering why we currently hide "PATRON_INACTIVE" behind a password check but not "PATRON_CARD_INACTIVE". I'm not a good judge of what should be private or not, but it does seem at least a little odd to hide one and not the other. Certainly no big deal, so just bringing it up while we're in here thinking about things.

I'll also definitely jump on the auth_proxy bits very soon while everything is still fresh in my mind.

Bill Erickson (berick) wrote :

Hi Dan, I had a similar moment of confusion over that. As you noted, I opted simply to replicate the existing behavior.

However, I think it would make sense to mimic the PATRON_CARD_INACTIVE behavior and only return a PATRON_CARD_INACTIVE event if the password check succeeds. Otherwise, return the generic login-failed event. I'll take a look at that...

Bill Erickson (berick) wrote :

Consistent PATRON_CARD_INACTIVE handling pushed, along w/ some additional code duplication cleanup in that general area of oils_auth.

Dan Wells (dbw2) wrote :

After some discussion on IRC, Bill has pushed a few fixes for non-existent usernames and globalizing login permission checks, and I have pushed a small typo fix plus support for the new validate API in AuthProxy.pm.

Everything is still in the original collab branch.

Dan Wells (dbw2) wrote :

Okay, still struggling a bit with org_unit values in this code. As far as I can tell, a normal staff client login doesn't send an org_unit value at all. However, this fact is masked for typical logins, I think, because this line in oilsAuthComplete():

jsonObjectSetKey(params,"org_unit", jsonNewNumberObject(orgloc));

ends up setting it to "0" via a side-effect of jsonNewNumberObject() (the null argument passes through snprintf() and ends up as a 0).

Since the org no longer matters for the permission check, I think the only thing it does for an average login is look up the timeout values. However, both auth_internal functions fail outright if you don't pass an org_unit value in. Because of this, auth_proxy staff-client logins currently fail, since the staff-client doesn't send an org, and the auth_proxy code doesn't convert the undefined org_unit argument to "0".

It would be easy enough to have auth_proxy convert an undefined org_unit argument to "0" (to match the current internal auth behavior), but I am wondering if instead we should relax the org_unit requirements on the auth_internal side. It seems maybe a little better or more predictable than passing in a "fake" value. Thoughts?

Bill Erickson (berick) wrote :

Sticking with the theme of backwards compatibility, relaxing the requirement in auth_internal makes sense to me. I'll take a look at that.

Bill Erickson (berick) wrote :

Fix pushed to collab branch. Both API calls in auth_internal now confirm the org_unit passed in has value > 0. If not, it defaults to the root org unit.

Bill Erickson (berick) wrote :

* Pushed some updated release comments about manual account migration.
* Modified the migration code to avoid migrating already-migrated passwords.
* Rebased to master w/ some light squashing and pushed:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/berick/lp1468422-passwd-storage-rebase1

tags: added: privacy pullrequest
Changed in evergreen:
milestone: none → 2.10-beta
Bill Erickson (berick) wrote :

Included in the above branch is a change in the work factor (iteration count) for CRYPT from 14 to 10. This reduces the duration of the CRYPT call from ~1 second to ~0.1 seconds. Comparing Evergreen master to this branch using work factor 10 shows an overall addition of about 0.1 seconds to the login process. (With 14, the process was almost a second longer).

Dan Wells (dbw2) wrote :

In thinking over this some more, a couple things worth noting before this gets pushed.

First, we didn't really follow through with the parameter renaming discussion in comments #17 - #19. I don't really have a preference, and am happy with how they are now, but didn't just want to gloss over it. One side effect benefit of making them different is that it discourages passing the whole parameter blob to auth_internal, which in some cases contains the the plaintext password, and we're better off not passing it around when we don't need to.

Second, I think we still need to shore up the log redaction config to cover some of these new methods, both in the sample config and in the release notes. That should be an easy task, and I'll plan on taking care of that late today if nobody beats me to it.

Dan Wells (dbw2) wrote :

Regarding work factor, I assume we'll be able to bump it up incrementally as time passes without a lot of hullabaloo? My understanding is that we'd need to simply hash the existing hashes some more, but want to make sure there's nothing in our staged algorithm to make things more difficult than that for us.

Bill Erickson (berick) wrote :

Regarding raising the work factor...

If you simply modify actor.passwd_type.iter_count, any passwords created or modified after that point will use the new work factor. This will not affect existing passwords, because they are verified using the work factor encoded within the salt.

Example salt: $2a$10$dkfdm0JgfZtfWPisZZu1se (work factor "$10$").

If we wanted the ability to force all passwords to use a different work factor, we'd have to write code to re-hash the existing passwords.

A good reference: http://crypto.stackexchange.com/questions/3003/do-i-have-to-recompute-all-hashes-if-i-change-the-work-factor-in-bcrypt

Dan Wells (dbw2) wrote :

Thanks for confirmation, Bill. Also, I think I take back the redaction stuff. On second look I am not seeing new methods/services that require any private information. In some of my early testing of auth_internal, I saw some private stuff in the logs, but I think that was just on account of passing around arguments unnecessarily. Caveat programmator!

Bill Erickson (berick) wrote :

Reworked my comments above about work factor and added them to the release notes.

As a reminder to anyone testing, the current branch is:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/berick/lp1468422-passwd-storage-rebase1

Dan Wells (dbw2) wrote :

Okay, I have signed off on all of Bill's work, and moved my four meager commits to the end. Hoping this helps get this in!

New branch is here:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/dbwells/lp1468422-passwd-storage-rebase1-signoff

working/collab/dbwells/lp1468422-passwd-storage-rebase1-signoff

Bill Erickson (berick) wrote :

Thanks for all the help, Dan! I've merged the code to master.

Changed in evergreen:
status: New → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Other bug subscribers