Certificates are not created

Bug #1893847 reported by David Ames on 2020-09-01
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Charm Helpers
Critical
David Ames
OpenStack Placement Charm
Critical
David Ames
OpenStack ceilometer charm
Critical
David Ames
OpenStack ceph-radosgw charm
Critical
David Ames
OpenStack cinder charm
Critical
David Ames
OpenStack glance charm
Critical
David Ames
OpenStack heat charm
Critical
David Ames
OpenStack keystone charm
Critical
David Ames
OpenStack neutron-api charm
Critical
David Ames
OpenStack nova-cloud-controller charm
Critical
David Ames
OpenStack openstack-dashboard charm
Critical
David Ames
vault-charm
Undecided
Unassigned

Bug Description

Discourse post [0] shows a bug in nova-cloud-contorller that is similar to private bug LP#1886077 [1] but not exactly the same. In this case the SSL directory is empty on 2/3 ncc nodes:

"""
ubuntu@juju-e9be94-1-lxd-11:~$ sudo systemctl status apache2.service
● apache2.service - The Apache HTTP Server
     Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2020-09-01 16:33:18 UTC; 1min 19s ago
       Docs: https://httpd.apache.org/docs/2.4/

Sep 01 16:33:18 juju-e9be94-1-lxd-11 systemd[1]: Starting The Apache HTTP Server...
Sep 01 16:33:18 juju-e9be94-1-lxd-11 apachectl[3577]: AH00526: Syntax error on line 14 of /etc/apache2/sites-enabled/openstack_https_frontend.conf:
Sep 01 16:33:18 juju-e9be94-1-lxd-11 apachectl[3577]: SSLCertificateFile: file '/etc/apache2/ssl/nova/cert_10.80.20.205' does not exist or is empty
Sep 01 16:33:18 juju-e9be94-1-lxd-11 apachectl[3574]: Action 'start' failed.
Sep 01 16:33:18 juju-e9be94-1-lxd-11 apachectl[3574]: The Apache error log may have more information.
Sep 01 16:33:18 juju-e9be94-1-lxd-11 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Sep 01 16:33:18 juju-e9be94-1-lxd-11 systemd[1]: apache2.service: Failed with result 'exit-code'.
Sep 01 16:33:18 juju-e9be94-1-lxd-11 systemd[1]: Failed to start The Apache HTTP Server.
ubuntu@juju-e9be94-1-lxd-11:~$ ls -lha /etc/apache2/ssl/nova
total 8.0K
dr-xr-xr-x 2 root root 4.0K Aug 31 22:33 .
drwxr-xr-x 3 root root 4.0K Aug 31 22:33 ..
ubuntu@juju-e9be94-1-lxd-11:~$

    EDIT: I should mention that running juju run-action --wait vault/leader reissue-certificates did not work for me :upside_down_face: I even removed a single unit and added it back. the new unit also has an empty /etc/apache2/ssl/nova certs directory
"""

A similar fix to [2][3] may help but this needs testing.

[0] https://discourse.juju.is/t/bug-openstack-hacluster-apache2-service-not-running-wrong-ssl-cert-name/3372/5
[1] https://bugs.launchpad.net/charm-openstack-dashboard/+bug/1886077
[2] https://review.opendev.org/#/c/747115/
[3] https://review.opendev.org/#/c/740188/
[4] https://review.opendev.org/#/c/749393/

David Ames (thedac) on 2020-09-02
summary: - Certerifictes are not created
+ Certificates are not created
Mirek (mirek186) wrote :

fyi, bundle attached, also it's not only a nova-cloud-controller issue I think it's more down to how vault is creating certs and restarting services as most of the time I had this issue with placement service, another issue is that when you do a reissue certs it not always restarting all Apache services on every unit and therefore had issues where HTTPS is talking to HTTP. Unfortunately, I don't have logs or can re-create the issue as I've already resolved the issue locally.

Here is an easy reproducer for something that is very likely to share the same root cause: https://discourse.juju.is/t/how-do-you-use-hacluster/3659/3?u=aurelien-lourot

When deploying heat+hacluster and vault using the auto-unlock feature, certificates seem to be randomly linked to either the unit IP or the VIP.

David Ames (thedac) on 2020-10-16
Changed in charm-helpers:
status: New → Triaged
status: Triaged → Won't Fix
status: Won't Fix → In Progress
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
David Ames (thedac) wrote :

The simplest description of the root problem is that the algorithm for how certificates are requested [0] is different from the way sym links are created [1].

The change [2] attempts to fix this. In testing with the heat bundle provided above the issue is resolved.

Unfortunately, we are currently in charm freeze for the 20.10 release. So the charm-helpers change or at least the syncs into the charms will have to occur after the 20.10 release.

TODO: check charms.openstack and determine if changes need to occur there. Prefereably utilize the charm-helper change [2] directly.

[0] https://github.com/juju/charm-helpers/blob/master/charmhelpers/contrib/openstack/cert_utils.py#L123
[1] https://github.com/juju/charm-helpers/blob/master/charmhelpers/contrib/openstack/cert_utils.py#L162
[2] https://github.com/juju/charm-helpers/pull/520

David Ames (thedac) wrote :

One more thing to make clear, the problem only occurs with auto-generate-root-ca-cert set to true (and/or the deprecated setting, totally-unsecure-auto-unlock set to true).

The workaround is to set the above to False at deploy time and run the post-deployment actions on vault as documented in [0] and [1].

[0] https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-vault.html
[2] https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-certificate-management.html

Brad Marshall (brad-marshall) wrote :

FWIW I have seen this occur multiple times without using the auto-generate-root-ca-cert or totally-unsecure-auto-unlock, so there's definitely other conditions it occurs in.

FWIW from previous offline discussions with thedac and Hybrid512 I understood that totally-unsecure-auto-unlock was drastically increasing the likelihood of hitting the bug but not necessary in order to hit the bug.

Hybrid512 (walid-moghrabi) wrote :

I confirm that this is *NOT* related to totally-unsecure-auto-unlock, I encounter that bug very frequently (and randomly) on many charms even with a proper manual vault unsealing.

In my deployment, I have these charms in HA mode with hacluster (3 nodes each in lxd containers spread on 3 different bare metal machines) :

- ceilometer (rev.278) ==> never
- ceph-radosgw (rev. 291) ==> sometimes
- cinder (rev. 306) ==> never
- glance (rev. 301) ==> sometimes
- gnocchi (rev. 42) ==> never
- heat (rev. 279) ==> very often
- keystone (rev. 319) ==> never
- masakari (rev. 4) ==> never
- neutron-api (rev. 290) ==> very often
- nova-cloud-controller (rev. 350) ==> very often
- openstack-dashboard (rev. 309) ==> very often
- placement (rev. 15) ==> very often
- vault (rev. 41) ==> never

Hacluster is rev. 72.

You can find next to their name the frequency of failures with certificates ... they are not behaving the same from one charm to another ... some never fails, some others have failures at nearly every deployments, sometimes only 1 unit, sometimes, all of them.

Just some comments, I don't know if that help in any way, I'm available for more testing if needed.

David Ames (thedac) wrote :

Hybrid512,

Hi, I need to do some bug hygiene here. The charm-helper fix should be in all of these charms in the "next" version of the charm:

cs:~openstack-charmers-next/<CHARM>

Would it be possible for you to test with the charms in next?

I will confirm the fix is in each and mark them fix committed once confirmed.

Changed in charm-helpers:
status: In Progress → Fix Released
Changed in charm-openstack-dashboard:
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 21.01
Changed in charm-nova-cloud-controller:
status: New → Fix Committed
status: Fix Committed → New
Changed in charm-openstack-dashboard:
status: New → Fix Committed
David Ames (thedac) on 2021-01-05
Changed in charm-ceilometer:
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 21.01
Changed in charm-heat:
status: New → Fix Committed
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 21.01
Changed in charm-neutron-api:
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 21.01
David Ames (thedac) on 2021-01-05
Changed in charm-nova-cloud-controller:
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 21.01
Changed in vault-charm:
status: New → Invalid
Changed in charm-placement:
status: New → Fix Committed
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 21.01
David Ames (thedac) on 2021-01-05
Changed in charm-glance:
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 21.01
Changed in charm-ceph-radosgw:
assignee: nobody → David Ames (thedac)
importance: Undecided → Critical
milestone: none → 21.01
status: New → Fix Committed
David Ames (thedac) wrote :

"The charm-helper fix should be in all of these charms in the "next" version of the charm"

This was a smidge optimistic.

If the charm is marked fixed committed (or eventual fixed released) it has the fix. If it is marked in progress it should show up bellow:

https://review.opendev.org/q/topic:%22bug%252F1893847%22+(status:open%20OR%20status:merged)

Changed in charm-ceilometer:
status: New → In Progress
Changed in charm-glance:
status: New → In Progress
Changed in charm-neutron-api:
status: New → In Progress
Changed in charm-nova-cloud-controller:
status: New → In Progress
Changed in charm-cinder:
assignee: nobody → David Ames (thedac)
importance: Undecided → Critical
milestone: none → 21.01
status: New → In Progress
Changed in charm-keystone:
importance: Undecided → Critical
milestone: none → 21.01
status: New → In Progress
assignee: nobody → David Ames (thedac)
David Ames (thedac) wrote :

Update: Since the last post on this bug we discovered a bug in the first fix. A subsequent fix was landed in charm-helpers [0].

We are in the middle of doing syncs and rebuilds for the 21.01 charm release. Once these [1] have landed the ~next charms will have all the required fixes. The 21.01 [2] charms will also have the fixes.

[0] https://github.com/juju/charm-helpers/commit/27b3f59ddeaefff6f1b4a269959e522cb42c1639
[1] https://review.opendev.org/q/topic:%22sync-for-21-01%22+status:open
[2] https://docs.openstack.org/charm-guide/latest/release-timeline-2101.html

Changed in charm-nova-cloud-controller:
status: In Progress → Fix Committed
David Ames (thedac) wrote :
Changed in charm-ceilometer:
status: In Progress → Fix Committed
Changed in charm-cinder:
status: In Progress → Fix Committed
Changed in charm-glance:
status: In Progress → Fix Committed
Changed in charm-keystone:
status: In Progress → Fix Committed
Changed in charm-neutron-api:
status: In Progress → Fix Committed
David Ames (thedac) on 2021-02-10
Changed in charm-nova-cloud-controller:
status: Fix Committed → Fix Released
Changed in charm-heat:
status: Fix Committed → Fix Released
Changed in charm-openstack-dashboard:
status: Fix Committed → Fix Released
Changed in charm-neutron-api:
status: Fix Committed → Fix Released
Changed in charm-placement:
status: Fix Committed → Fix Released
Changed in charm-ceilometer:
status: Fix Committed → Fix Released
Changed in charm-glance:
status: Fix Committed → Fix Released
Changed in charm-ceph-radosgw:
status: Fix Committed → Fix Released
Changed in charm-cinder:
status: Fix Committed → Fix Released
Changed in charm-keystone:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers