tokens in memcache have no/improper expiration

Bug #1578401 reported by Matt Fischer on 2016-05-04
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Low
Unassigned

Bug Description

tokens stored in memcache have no (improper) expiration data when set.

Found on stable/mikata and stable/liberty using cachepool backend and the non-pooled backend.

When you store in memcache you can optionally pass in a time at which the value is no good, a ttl. Keystone should be doing this for it's local token cache but it doesn't look like it is.

Here is a dump of some slabs that have tokens in them, last field is expiration time:

stats cachedump 11 10
ITEM 8302fc81f2ffb552d5ba8d3e5f0e182ee285786a [724 b; 1460583765 s]
ITEM 2ffe5d0821302a8501068a8411ce1749cea0645b [776 b; 1460583765 s]
ITEM eb6e6f7e9118133a9a98944da874ac1b59c5675b [724 b; 1460583765 s]
ITEM ee076b853dd5e5956366854abf6c49dbdf5ee4c2 [723 b; 1460583765 s]

Lets see if these are really tokens:

get 8302fc81f2ffb552d5ba8d3e5f0e182ee285786a
VALUE 8302fc81f2ffb552d5ba8d3e5f0e182ee285786a 1 724
cdogpile.cache.api
CachedValue
p0
((dp1
S'access'
p2
(dp3
S'token'
p4
(dp5
S'issued_at'
p6
S'2016-05-04T21:20:27.000000Z'
p7
sS'expires'
p8
S'2016-05-04T23:20:27.146911Z'
p9
sS'id'
p10
V<fernet token value here>

Yep thats a Fernet token.

Dumping older and older stuff, I can find cached tokens that are 24 hours old in here, 22 hours past our valid token deadline.

So lets compare that to some tokens that keystone_authtoken middleware is caching for control services:

stats cachedump 21 100
ITEM tokens/418b2c5a0e67d022b0578fbc4c96abf4a4509e94aca4ca1595167f8f01448957 [8463 b; 1462397763 s]
ITEM tokens/2b5a26e3bdf4dec0caae141353297f0316b55daf683b4bc0fcd1ab7bf4ba8f9b [8312 b; 1462397539 s]
ITEM tokens/778329eb53545cbd664fa67e6429f48692679f428077b48baa4991f13cc1817c [8312 b; 1462397538 s]
ITEM tokens/b80b06cf688c37f8688c368a983c2fd533c662b7b3063c6a2665c59def958cdd [8312 b; 1462397537 s]
ITEM tokens/61cd52b0654641a21c62831f6e5be9f0328898d05026d6bb91c787d79cb8b460 [8463 b; 1462397536 s]

All have valid and different expiration times so it's respecting my settings.

So what's that timestamp in the earlier list? Well it's 4/13/2016, 3:42:45 PM GMT-6:00 DST. That happens to be the last time memcache restarted and so I assume it's just filler.

What's the impact?

I've not figured out if there is one yet for sure. I have a token valid time of 2 hours and I had set cache time to the same. I did try to dig out an old token but it would not validate so I don't think there's a security issue. I suspect the main issue is that my keystone memcache always runs completely 100% full. We max memcache at 20% of RAM on a box, and that's a lot (20% of 256G). I suspect with no valid expiration time memcache is lazily evicting old tokens when it runs out of ram rather than replacing expired ones and not allocating more RAM.

[PROD] mfisch@east-keystone-001:~$ cat /proc/3937/status
Name: memcached
State: S (sleeping)
Tgid: 3937
Ngid: 3937
Pid: 3937
PPid: 1
TracerPid: 0
Uid: 65534 65534 65534 65534
Gid: 65534 65534 65534 65534
FDSize: 1024
Groups: 0
VmPeak: 54500552 kB
VmSize: 54500552 kB <-- that's a lot of twinkies

I feel this merits deeper investigation.

Dolph Mathews (dolph) on 2016-07-06
tags: added: fernet
David Stanek (dstanek) wrote :

I'll look into this a little deeper, but from the information provided it doesn't sound like there is an issue. The 'stats cachedump' command will often, and maybe even likely, show things that are expired. memcache won't actually delete the item from memory until you access it again and it decides that it's expired or if it's trying to make room for new entries. In the new entries case I'm not sure if you'd see if as an eviction because the data wasn't forced out.

You can tell if an item has no expiration pretty easily if it's expiration is equal to 'time - uptime' where time and uptime can be found in the stats output.

David Stanek (dstanek) wrote :

Also note that memcache is greedy and will use up (or at least appear that way to unix tools) all of the memory you give it.

David Stanek (dstanek) wrote :

Matt and I talked offline about this and we'll work together next week on it.

Changed in keystone:
status: New → Incomplete
David Stanek (dstanek) on 2016-07-11
Changed in keystone:
assignee: nobody → David Stanek (dstanek)
Changed in keystone:
assignee: David Stanek (dstanek) → nobody
David Stanek (dstanek) on 2016-08-02
Changed in keystone:
assignee: nobody → David Stanek (dstanek)
David Stanek (dstanek) wrote :

This behavior is actually expected based on our current implementation. keystonemiddleware is actually using the memcached library directly and setting timeouts. keystone uses dogpile and sets a dogpile expiration and not a memcached expiration. This means that the data will expire like expected, but it won't be expired as far as memcached is concerned.

Other than appearing to be fuller than it actually is, I don't think there is too much impact. Even when a memcached timeout exists on an entry is will not actually be deleted from memory by memcached until it is accessed again or memcached needs the memory.

I've created a POC change in olso.cache that "fixes" this problem.

Changed in keystone:
status: Incomplete → Confirmed
importance: Undecided → Low
tags: added: caching
removed: fernet
David Stanek (dstanek) wrote :

If this is worth having we could bring back this review: https://review.openstack.org/#/c/350942

I don't really think this is a bug, but rather a misunderstanding since we do caching differently between components.

Lance Bragstad (lbragstad) wrote :

Automatically unassigning due to inactivity.

Changed in keystone:
assignee: David Stanek (dstanek) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers