OpenStack Identity (keystone)

cms token_id's are not URL safe nor RFC compliant

Bug #1233838 reported by John Dennis on 2013-10-01

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Identity (keystone)	Fix Released	Undecided	John Dennis

Bug Description

<pre>
The token id for a cms signed token is generated via
cms.cms_to_token() which extracts the base64 cms data to use as the
token id. The '/' character in the base64 text was
replaced with the '-' character in an attempt to make the
resulting token id URL safe. The token id is used in both URL's and in
HTTP header values.

There are a few problems with this approach.

1) The result is still not url safe due the presence of the '+'
character and the pad character '='. Both of these characters are
reserved for the query component and thus would need to be escaped
further disrupting the base64 alphabet. See RFC-2396 "Uniform Resource
Identifiers (URI): Generic Syntax"

2) RFC-4648 "The Base16, Base32, and Base64 Data Encodings" defines a
URL safe encoding for base64 data. It maps '+' to '-' and '/' to '_'
and either strips the padding or demands it be %-encoded as per
RFC-2396. The result is both URL and file name safe and is referred to
as base64url.

3) The Python base64 module has direct support for base64url (we should
be using it).

4) The current mapping of '/' to '-' is unfortunate because it
directly conflicts with the RFC-4648 base64url mapping. The alphabet
character '-' is supposed to represent index 62 not index 63, thus one
cannot augment the current mapping to comply with RFC-4648. Plus the
current mapping still isn't URL safe.

In OpenStack we should adhere to standards when they exist and not
invent a non-standard incomplete solution. If we use the RFC-4648
compliant mechanism we can then also call standard Python libraries to
perform base64url encode/decode operations.

Note, base64url is also safe as a value in HTTP headers.

The cms.cms_to_token() and cms.token_to_cms() should be re-implemented
to produce token id's which can be safely used in HTTP contexts as
well as using RFC defined base64 alphabets.

Since token lifetimes are quite short there shouldn't be backward
compatibility issues with previously issued tokens. A new token
utilizing the new token id format will issued.

Note:

base64url continues to use the '=' pad character which is NOT URL
safe. RFC-4648 suggests two alternate methods to deal with this.

percent-encode
    percent-encode the pad character (e.g. '=' becomes
    '%3D'). This makes the base64url text fully safe. But
    percent-enconding has the downside of requiring
    percent-decoding prior to feeding the base64url text into a
    base64url decoder since most base64url decoders do not
    recognize %3D as a pad character and most decoders require
    correct padding.

no-padding
    padding is not strictly necessary to decode base64 or
    base64url text, the pad can be computed from the input text
    length. However many decoders demand padding and will consider
    non-padded text to be malformed. If one wants to omit the
    trailing pad character(s) for use in URL's it can be added back.

For for token id use it we prefer strip the padding rather than
percent-encode the padding. This makes the token id slightly shorter
and cleaner.
</pre>