keystone_fernet incorrectly calculates rotation schedule

Bug #1809469 reported by Doug Szumski
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
High
Unassigned
Pike
High
Unassigned
Queens
High
Mark Goddard
Rocky
High
Mark Goddard
Stein
High
Mark Goddard

Bug Description

On a deployment with multiple instances of Keystone using Fernet tokens, there will be multiple instances of the keystone_fernet container. Each instance will call the script /usr/bin/fernet-rotate.sh at 8am. As the Keystone docs explain, this can cause failed validations:

`
Fernet keys need to be rotated at periodic intervals, and the keys need to be synchronised to each of the other keystone units. Keys should only be rotated on the master keystone unit, and must be synchronised before they are rotated again. “Over rotation” occurs if a unit rotates its keys such that there is no suitable decoding key on another unit that can decode a token that has been generated on the master. This happens if two key rotations are done on the master before a synchronisation has been successfully performed. This should be avoided. Over rotations can also cause validation keys to be removed before a token’s expiration which would result in failed validations.
` - https://specs.openstack.org/openstack/charm-specs/specs/rocky/approved/keystone-fernet-tokens.html

We need to limit the rotation of Fernet tokens to a single instance.

This bug affects the FluentD Monasca plugin, which only retrieves a new token if the expiration date is passed. It may also affect other services, but some services may attempt to re-authenticate after over rotation which can mask the bug.

Revision history for this message
John Garbutt (johngarbutt) wrote :

So I believe the tokens Keystone hands out last 1 hour (not sure on that), and with three controllers the default behaviour is to rotate every 8 hours:

ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
0 0 * * * /usr/bin/fernet-rotate.sh
ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
0 8 * * * /usr/bin/fernet-rotate.sh
ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
0 16 * * * /usr/bin/fernet-rotate.sh

For each of these, fernet-rotate is giving you roughly the behaviour noted here:
https://docs.openstack.org/keystone/pike/admin/identity-fernet-token-faq.html#how-should-i-approach-key-distribution

The logs show the correct things happening:
May 9th 2019, 09:00:02.000 INFO ctrl2 keystone Excess key to purge: /etc/keystone/fernet-keys/139
 May 9th 2019, 01:00:02.000 INFO ctrl1 keystone Excess key to purge: /etc/keystone/fernet-keys/138
 May 8th 2019, 17:00:03.000 INFO ctrl3 keystone Excess key to purge: /etc/keystone/fernet-keys/137

However, we still see these logs from keystone:
May 9th 2019, 09:06:34.000 WARNING ctrl1 keystone
 This is not a recognized Fernet token <snip> TokenNotFound

Which suggests some clients think they have a valid token, but they don't, after the above rotation.

Possibly we need to set keystone CONF.fernet_tokens.max_active_keys?

cfg.IntOpt(
    'max_active_keys',
    default=3,
    min=1,
    help=utils.fmt("""
This controls how many keys are held in rotation by `keystone-manage
fernet_rotate` before they are discarded. The default value of 3 means that
keystone will maintain one staged key (always index 0), one primary key (the
highest numerical index), and one secondary key (every other index). Increasing
this value means that additional secondary keys will be kept in the rotation.
"""))

Revision history for this message
John Garbutt (johngarbutt) wrote :

so token timeout is 1 day...

[token]
revoke_by_id = False
provider = fernet
expiration = 86400

but we have already rotated out too many keys by then...

we need to update max_active_keys to match the number of controllers.

Revision history for this message
John Garbutt (johngarbutt) wrote :
Revision history for this message
John Garbutt (johngarbutt) wrote :

Actually it is more complicated, due to:

# This controls the number of seconds that a token can be retrieved for beyond
# the built-in expiry time. This allows long running operations to succeed.
# Defaults to two days. (integer value)
#allow_expired_window = 172800

So we have three days of needing to read the tokens.

In that time we have 9 key rotations with three controllers, plus we want a staging key out there, plus one for wiggle room.

Mark Goddard (mgoddard)
Changed in kolla-ansible:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (master)

Change abandoned by John Garbutt (<email address hidden>) on branch: master
Review: https://review.opendev.org/657967

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/659619

Mark Goddard (mgoddard)
summary: - keystone_fernet container runs token rotate on multiple hosts
+ keystone_fernet incorrectly calculates rotation schedule
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/659619
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=25ac955a4e2645da29f8c7b807f0bac5afb43838
Submitter: Zuul
Branch: master

commit 25ac955a4e2645da29f8c7b807f0bac5afb43838
Author: Mark Goddard <email address hidden>
Date: Thu May 16 14:01:39 2019 +0100

    Add unit test for keystone fernet cron generator

    Before making changes to this script, document its behaviour with a unit
    test.

    There are two major issues:

    * requesting an interval of more than 1 day results in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, results in no jobs

    Change-Id: I655da1102dfb4ca12437b7db0b79c9a61568f79e
    Related-Bug: #1809469

Changed in kolla-ansible:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)
Download full text (3.3 KiB)

Reviewed: https://review.opendev.org/659293
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=6c1442c385450004dd253f3f464fe4336194be99
Submitter: Zuul
Branch: master

commit 6c1442c385450004dd253f3f464fe4336194be99
Author: Mark Goddard <email address hidden>
Date: Thu May 16 17:26:45 2019 +0100

    Fix keystone fernet key rotation scheduling

    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.

    When we over rotate keys, we get logs like this:

     This is not a recognized Fernet token <token> TokenNotFound

    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.

    With three controllers, in crontab in keystone-fernet we see the once a day
    correctly staggered across the three controllers:

    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh

    Currently with three controllers we have this keystone config:

    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)

    [fernet_tokens]
    max_active_keys = 4

    Currently, kolla-ansible configures key rotation according to the following:

       rotation_interval = token_expiration / num_hosts

    This means we rotate keys more quickly the more hosts we have, which doesn't
    make much sense.

    Keystone docs state:

       max_active_keys =
         ((token_expiration + allow_expired_window) / rotation_interval) + 2

    For details see:
    https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html

    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.

    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.

    This change also fixes the fernet cron job generator, which was broken
    in the following cases:

    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs

    It should now b...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/666086

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/666087

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/666088

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/666090

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/666093

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/666095

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/stein)

Reviewed: https://review.opendev.org/666086
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=1a6e9f7e927ebb3c2f021befc3630f4279dbceb1
Submitter: Zuul
Branch: stable/stein

commit 1a6e9f7e927ebb3c2f021befc3630f4279dbceb1
Author: Mark Goddard <email address hidden>
Date: Thu May 16 14:01:39 2019 +0100

    Add unit test for keystone fernet cron generator

    Before making changes to this script, document its behaviour with a unit
    test.

    There are two major issues:

    * requesting an interval of more than 1 day results in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, results in no jobs

    Change-Id: I655da1102dfb4ca12437b7db0b79c9a61568f79e
    Related-Bug: #1809469
    (cherry picked from commit 25ac955a4e2645da29f8c7b807f0bac5afb43838)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/stein)
Download full text (3.4 KiB)

Reviewed: https://review.opendev.org/666090
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=8e627c1eef5e7ef047cd3860d162a0e2a800e5ab
Submitter: Zuul
Branch: stable/stein

commit 8e627c1eef5e7ef047cd3860d162a0e2a800e5ab
Author: Mark Goddard <email address hidden>
Date: Thu May 16 17:26:45 2019 +0100

    Fix keystone fernet key rotation scheduling

    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.

    When we over rotate keys, we get logs like this:

     This is not a recognized Fernet token <token> TokenNotFound

    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.

    With three controllers, in crontab in keystone-fernet we see the once a day
    correctly staggered across the three controllers:

    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh

    Currently with three controllers we have this keystone config:

    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)

    [fernet_tokens]
    max_active_keys = 4

    Currently, kolla-ansible configures key rotation according to the following:

       rotation_interval = token_expiration / num_hosts

    This means we rotate keys more quickly the more hosts we have, which doesn't
    make much sense.

    Keystone docs state:

       max_active_keys =
         ((token_expiration + allow_expired_window) / rotation_interval) + 2

    For details see:
    https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html

    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.

    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.

    This change also fixes the fernet cron job generator, which was broken
    in the following cases:

    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs

    It should...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/rocky)

Reviewed: https://review.opendev.org/666087
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=c3e5ab0dc3b87c6ddae78a8f29d268ebe840638d
Submitter: Zuul
Branch: stable/rocky

commit c3e5ab0dc3b87c6ddae78a8f29d268ebe840638d
Author: Mark Goddard <email address hidden>
Date: Thu May 16 14:01:39 2019 +0100

    Add unit test for keystone fernet cron generator

    Before making changes to this script, document its behaviour with a unit
    test.

    There are two major issues:

    * requesting an interval of more than 1 day results in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, results in no jobs

    Change-Id: I655da1102dfb4ca12437b7db0b79c9a61568f79e
    Related-Bug: #1809469
    (cherry picked from commit 25ac955a4e2645da29f8c7b807f0bac5afb43838)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/rocky)
Download full text (3.4 KiB)

Reviewed: https://review.opendev.org/666093
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=d66e95d1d96c9f5aee52740df53d6e784c7b8194
Submitter: Zuul
Branch: stable/rocky

commit d66e95d1d96c9f5aee52740df53d6e784c7b8194
Author: Mark Goddard <email address hidden>
Date: Thu May 16 17:26:45 2019 +0100

    Fix keystone fernet key rotation scheduling

    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.

    When we over rotate keys, we get logs like this:

     This is not a recognized Fernet token <token> TokenNotFound

    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.

    With three controllers, in crontab in keystone-fernet we see the once a day
    correctly staggered across the three controllers:

    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh

    Currently with three controllers we have this keystone config:

    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)

    [fernet_tokens]
    max_active_keys = 4

    Currently, kolla-ansible configures key rotation according to the following:

       rotation_interval = token_expiration / num_hosts

    This means we rotate keys more quickly the more hosts we have, which doesn't
    make much sense.

    Keystone docs state:

       max_active_keys =
         ((token_expiration + allow_expired_window) / rotation_interval) + 2

    For details see:
    https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html

    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.

    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.

    This change also fixes the fernet cron job generator, which was broken
    in the following cases:

    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs

    It should...

Read more...

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/queens)

Reviewed: https://review.opendev.org/666088
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=ec2aa48c1713187dcb4ebfc836e45b8cfe5329c4
Submitter: Zuul
Branch: stable/queens

commit ec2aa48c1713187dcb4ebfc836e45b8cfe5329c4
Author: Mark Goddard <email address hidden>
Date: Thu May 16 14:01:39 2019 +0100

    Add unit test for keystone fernet cron generator

    Before making changes to this script, document its behaviour with a unit
    test.

    There are two major issues:

    * requesting an interval of more than 1 day results in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, results in no jobs

    Change-Id: I655da1102dfb4ca12437b7db0b79c9a61568f79e
    Related-Bug: #1809469
    (cherry picked from commit 25ac955a4e2645da29f8c7b807f0bac5afb43838)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/queens)
Download full text (3.4 KiB)

Reviewed: https://review.opendev.org/666095
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=d5cef35a7fbb8349ed6cbd5862b24caed52ff7a4
Submitter: Zuul
Branch: stable/queens

commit d5cef35a7fbb8349ed6cbd5862b24caed52ff7a4
Author: Mark Goddard <email address hidden>
Date: Thu May 16 17:26:45 2019 +0100

    Fix keystone fernet key rotation scheduling

    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.

    When we over rotate keys, we get logs like this:

     This is not a recognized Fernet token <token> TokenNotFound

    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.

    With three controllers, in crontab in keystone-fernet we see the once a day
    correctly staggered across the three controllers:

    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh

    Currently with three controllers we have this keystone config:

    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)

    [fernet_tokens]
    max_active_keys = 4

    Currently, kolla-ansible configures key rotation according to the following:

       rotation_interval = token_expiration / num_hosts

    This means we rotate keys more quickly the more hosts we have, which doesn't
    make much sense.

    Keystone docs state:

       max_active_keys =
         ((token_expiration + allow_expired_window) / rotation_interval) + 2

    For details see:
    https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html

    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.

    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.

    This change also fixes the fernet cron job generator, which was broken
    in the following cases:

    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs

    It shoul...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 8.0.0.0rc2

This issue was fixed in the openstack/kolla-ansible 8.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 6.2.2

This issue was fixed in the openstack/kolla-ansible 6.2.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 7.1.2

This issue was fixed in the openstack/kolla-ansible 7.1.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 9.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 9.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers