failure to generate cert: cannot satisfy request, as TTL would result in notAfter 2030-12-26T02:40:00.829770137Z that is beyond the expiration of the CA certificate at 2030-10-15T09:14:03Z

Bug #1909425 reported by Mike Wilson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
vault-charm
Triaged
Medium
Unassigned

Bug Description

EDIT(lourot on 2021-03-09): the real issue here is that the charm properly logs this valid error but then doesn't put itself in error state, so that the system doesn't work properly later on although everything is green. See comment #8

I just added a new node to my cluster and it is stuck waiting for kube-proxy to start. The snap isn't configured with any options like my other nodes and so it fails to start:

kubernetes-worker/15 waiting idle 21 10.0.4.38 Waiting for kube-proxy to start.

$ sudo snap get kube-proxy
error: snap "kube-proxy" has no configuration

No real information in the logs about any errors. I'm running k8s 1.20.1 and kubernetes-worker charm version 718.

Revision history for this message
Mike Wilson (knobby) wrote :

Looks like I'm missing tls_client.certs.saved and worker.auth.bootstrapped.

Revision history for this message
Mike Wilson (knobby) wrote :

unit-vault-0: 03:40:00 ERROR unit.vault/0.juju-log certificates:10: cannot satisfy request, as TTL would result in notAfter 2030-12-26T02:40:00.829770137Z that is beyond the expiration of the CA certif
icate at 2030-10-15T09:14:03Z
unit-vault-0: 03:40:00 INFO unit.vault/0.juju-log certificates:10: Processing certificate request from kubernetes-worker_15 for system:kubelet

Revision history for this message
Mike Wilson (knobby) wrote :

sad_panda.jpg

Should there be a warning in the vault config about this? This resulted in a non-obvious issue. Maybe surface something better than "Waiting for kube-proxy to start" to get people on the right path? Leaving this bug for that type of discussion.

Had to nuke the app and add it again, but it started up fine after that.

Revision history for this message
George Kraft (cynerva) wrote :

Hey Mike, thanks for the investigation and details.

> Should there be a warning in the vault config about this?

I'm guessing you had changed the vault charm's TTL configs and that's what led to the error you pasted above. Is that right?

If changing TTL config can lead to unexpected (and silent?) failures to generate certs, then yeah, at a minimum the config description should be updated to warn users about it. Ideally the charm would also set a status if it's failing to generate certs.

> Maybe surface something better than "Waiting for kube-proxy to start" to get people on the right path?

The kubernetes-master and kubernetes-worker charms are both pretty bad about this. We do have an issue open about it here: https://bugs.launchpad.net/charm-kubernetes-master/+bug/1868541

I've removed kubernetes-worker from this issue - we'll track fixing those in the issue we already have open.

Changed in charm-kubernetes-worker:
status: New → Invalid
no longer affects: charm-kubernetes-worker
George Kraft (cynerva)
summary: - new node waiting on kube-proxy to start
+ failure to generate cert: cannot satisfy request, as TTL would result in
+ notAfter 2030-12-26T02:40:00.829770137Z that is beyond the expiration of
+ the CA certificate at 2030-10-15T09:14:03Z
Revision history for this message
Mike Wilson (knobby) wrote :

> I'm guessing you had changed the vault charm's TTL configs and that's what led to the error you pasted above. Is that right?

Yes, I was going to max security by pushing these values to max and the TTL for the CA was the same as the TTL for the certificate and it couldn't generate a certificate because it would expire after the CA.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Thanks for reporting!

> If changing TTL config can lead to unexpected (and silent?) failures to generate certs

This bug is now assigned to the vault-charm and that failure isn't silent since the charm ends up in error state. IMHO the error message is good enough as it points to a TTL issue. Or what do you think? Also note that recent work has been done to be able to more easily leave this error state:

https://bugs.launchpad.net/vault-charm/+bug/1886907
https://bugs.launchpad.net/vault-charm/+bug/1885576

Is there anything left to be done here on the vault-charm itself or can we safely close this bug? Thanks a lot!

Revision history for this message
Mike Wilson (knobby) wrote :

I never hit an error state in my status though. Everything was happy, it was just failing to generate certificates. Are you saying that it now produces an error on the vault charm?

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

No, I misread, my mistake, I understand now. You're seeing the following error message in the logs:

unit-vault-0: 03:40:00 ERROR unit.vault/0.juju-log certificates:10: cannot satisfy request, as TTL would result in notAfter 2030-12-26T02:40:00.829770137Z that is beyond the expiration of the CA certificate at 2030-10-15T09:14:03Z

But this error isn't fatal and makes it hard later on to understand why things aren't working. I believe this is the culprit: https://opendev.org/openstack/charm-vault/src/branch/master/src/reactive/vault_handlers.py#L938

----------
        except vault.VaultInvalidRequest as e:
            log(str(e), level=ERROR)
            continue # TODO: report failure back to client
----------

Changed in vault-charm:
status: New → Triaged
importance: Undecided → Medium
description: updated
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.