Fernet rotate doesn't prevent rotation when disk is full

Bug #1642457 reported by John Lin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Fix Released
Low
John Lin

Bug Description

When the root partition on any control node is full, the Fernet key on all control nodes will be empty. This will cause Keystone cannot auth anyone (500 Internal Server Error). Is that caused by Fernet key rotation?

When I check the files in /etc/keystone/fernet-keys:

root@control1:/etc/keystone/fernet-keys# ll
total 40
drwxr-s--- 2 keystone keystone 4096 Nov 17 00:00 ./
drwxr-xr-x 5 keystone keystone 4096 Nov 10 11:24 ../
-rw------- 1 keystone keystone 0 Nov 17 00:00 0
-rw------- 1 keystone keystone 44 Nov 16 13:57 10
-rw------- 1 keystone keystone 44 Nov 9 00:00 3
-rw------- 1 keystone keystone 44 Nov 10 00:00 4
-rw------- 1 keystone keystone 44 Nov 11 00:00 5
-rw------- 1 keystone keystone 44 Nov 12 00:00 6
-rw------- 1 keystone keystone 44 Nov 13 00:00 7
-rw------- 1 keystone keystone 44 Nov 14 00:00 8
-rw------- 1 keystone keystone 44 Nov 15 00:00 9

Here is some of the Keystone logs when the master Fernet token is empty.

[req-37cfe30f-5ff0-4d28-a187-066bf8031ad4 - - - - -] Fernet key must be 32 url-safe base64-encoded bytes.
Traceback (most recent call last):
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/keystone/common/wsgi.py", line 249, in __call__
    result = method(context, **params)
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/keystone/auth/controllers.py", line 416, in authenticate_for_token
    parent_audit_id=token_audit_id)
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/keystone/common/manager.py", line 124, in wrapped
    __ret_val = __f(*args, **kwargs)
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/keystone/token/provider.py", line 388, in issue_v3_token
    parent_audit_id)
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/keystone/token/providers/fernet/core.py", line 44, in issue_v3_token
    *args, **kwargs)
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/keystone/token/providers/common.py", line 623, in issue_v3_token
    token_id = self._get_token_id(token_data)
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/keystone/token/providers/fernet/core.py", line 201, in _get_token_id
    access_token_id=access_token_id
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/keystone/token/providers/fernet/token_formatters.py", line 165, in create_token
    token = self.pack(serialized_payload)
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/keystone/token/providers/fernet/token_formatters.py", line 75, in pack
    return self.crypto.encrypt(payload).rstrip(b'=').decode('utf-8')
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/keystone/token/providers/fernet/token_formatters.py", line 64, in crypto
    fernet_instances = [fernet.Fernet(key) for key in keys]
  File "/openstack/venvs/keystone-mitaka/lib/python2.7/site-packages/cryptography/fernet.py", line 37, in __init__
    "Fernet key must be 32 url-safe base64-encoded bytes."
ValueError: Fernet key must be 32 url-safe base64-encoded bytes.

Tags: fernet
Revision history for this message
Lance Bragstad (lbragstad) wrote :

Hi John,

Would you be able to check the contents of the key? It looks like the staged key, or the zero key (0), doesn't contain *anything*. As a result, the MultiFernet object is unable to initialize itself. The error being reported by keystone is actually coming from the fernet library [0].

How many max_active_keys are you using? More than 11? Are there any keys in your key repository that you know are stale and you can remove?

[0] https://github.com/pyca/cryptography/blob/7ff4c8fe2400244a2e6e2a86bc015b2025907b5a/src/cryptography/fernet.py#L35-L38

tags: added: fernet
summary: - Root partition full on any control node makes master Fernet keys empty
- on all nodes
+ Fernet rotate doesn't prevent rotation when disk is full
Revision history for this message
Lance Bragstad (lbragstad) wrote :

I'm did a little more digging on this and wanted to document a possible improvement we could make.

During the rotation process, we rename the current staged key to be the next primary [0]. Immediately afterwords, we create a new staged key named `0` [1]. I'm assuming this is the step that writes nothing to the key when the disk is full. One possible improvement we could make would be to validate the key that was written to the file matches a given length. If it does then we assume it was written to disk successfully. If not, then we assume something bad happened on the disk and we do a rollback of the previous primary key promotion (renaming the now primary key back to `0`, making it the staged key, then ensure the key that was the primary becomes the primary again - completing the rollback).

The advantage would be that keystone would fail in the rotation process, but still allow tokens to be issued and validated, instead of allowing the rotation to complete (unsuccessfully) and fail at token creation or validation time because the MutliFernet object can't be initialized. Regardless, there would need to be some sort of operator intervention in order to fix a full disk and do future rotations.

[0] https://github.com/openstack/keystone/blob/613c048b6f4bda91de1c0e9618abd0bda78ccc50/keystone/common/fernet_utils.py#L199-L203
[1] https://github.com/openstack/keystone/blob/613c048b6f4bda91de1c0e9618abd0bda78ccc50/keystone/common/fernet_utils.py#L211

Changed in keystone:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
John Lin (johnlinp) wrote :

Hi Lance,

How about creating `0.tmp` first, and then check if it pass the validation (key length)? If yes, then start doing the rotation (i.e. rename current staged key to the next primary, and then rename `0.tmp` to `0`). If not, then delete `0.tmp` and leave some error messages.

John Lin (johnlinp)
Changed in keystone:
assignee: nobody → John Lin (johnlinp)
Revision history for this message
Lance Bragstad (lbragstad) wrote :

John, that would be a possibility. If you have any progress locally, feel free to push it for review and we'll see what the rest of the team thinks about the approach.

Thanks!

John Lin (johnlinp)
Changed in keystone:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/413495

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (master)

Reviewed: https://review.openstack.org/413495
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=5b7c9a66f0aed860ea0776d4c5b42710d88fcb5f
Submitter: Jenkins
Branch: master

commit 5b7c9a66f0aed860ea0776d4c5b42710d88fcb5f
Author: johnlinp <email address hidden>
Date: Wed Dec 21 15:17:01 2016 +0800

    Handle disk write failure when doing Fernet key rotation

    _create_new_key() is broke down into 2 parts:

    1. _create_tmp_new_key()
    2. _become_valid_new_key()

    This can avoid empty Fernet keys when the write to the
    staged key fails. The _become_valid_new_key() is called
    only after a successful call to _create_tmp_new_key().

    Change-Id: Iaf33e2b291f13b9eb9464ef345a8664a634121ff
    Closes-Bug: #1642457
    Signed-off-by: John Lin <email address hidden>

Changed in keystone:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/keystone 11.0.0.0b3

This issue was fixed in the openstack/keystone 11.0.0.0b3 development milestone.

Changed in keystone:
milestone: none → ocata-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/439400

Revision history for this message
Dolph Mathews (dolph) wrote :

Proposed a revert for the above fix in https://review.openstack.org/#/c/442513/ as it does not appear to address the buggy behavior of writing empty key files.

I also don't think the impact of this should be Wishlist, as there are no new features being requested. Instead, this is an edge case with Low impact -- or Medium if the effect can cascade through multiple rotation attempts.

Revision history for this message
John Lin (johnlinp) wrote :

Hi Dolph,

I don't understand why you think the fix doesn't solve the problem. I've tested it in the unit test and a real full disk environment. They worked as expected.

Can you explain why you think there can still be empty keys? Thanks!

Revision history for this message
Dolph Mathews (dolph) wrote :

After poking at the test in the original patch, I've abandoned the revert because I forgot about the behavior in the key loader that skips non-numeric file names (so 0.tmp would be skipped, even if it was empty, thus avoiding the issue).

Dolph Mathews (dolph)
Changed in keystone:
importance: Wishlist → Low
Revision history for this message
John Lin (johnlinp) wrote :

Since that after multiple rotation attempts will produce multiple empty key files, I suggest that the importance should be "Medium" according to Dolph's previous comment.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to keystone (stable/newton)

Related fix proposed to branch: stable/newton
Review: https://review.openstack.org/443158

Revision history for this message
Dolph Mathews (dolph) wrote :

@John: That's exactly what I was testing this morning :) I'd like to see the fix backported, but want to ensure I understand the impact. To do so, I've been attempting to reproduce the original bug. However, using the test approach in the patch merged to master against stable/newton produces a totally different behavior than the one described in the bug. Specifically, the first rotation returns successfully but fails to rotate anything, and all the keys that appear on disk are fine. Subsequent rotations result in OSErrors that prevent any further rotation, which would be a totally different bug. In between each rotation attempt, I can validate that all key files on disk contain valid keys, each are non-empty, and more importantly, the keys don't appear to change. No empty key files are ever created when flush() raises an IOError. Any suggestions?

See the patchset in #13 for an illustration.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on keystone (stable/newton)

Change abandoned by Tony Breeds (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/443158
Reason: This branch (stable/newton) is at End Of Life

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Tony Breeds (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/439400
Reason: This branch (stable/newton) is at End Of Life

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.