Charm blocked due to: Ports which should be open, but are not: 8991

Bug #2051569 reported by Konstantinos Kaskavelis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Designate Charm
Triaged
Undecided
Unassigned

Bug Description

Solutions QA has a failed run, during the deployment of Openstack Yoga layer, when designate stayed on a blocked state with the message: Ports which should be open, but are not: 8991

The output of Juju status can be seen here:
https://oil-jenkins.canonical.com/job/fce_build_layer/9158//console

From the logs, we notice several connection errors, but we are not 100% sure how they relate to the failure.

Failed Run:

https://solutions.qa.canonical.com/testruns/0572f30c-6c2f-41fa-97b8-16e6c6b7deb4

Logs (available for two weeks):
https://oil-jenkins.canonical.com/artifacts/0572f30c-6c2f-41fa-97b8-16e6c6b7deb4/index.html

Tags: cdo-qa
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

This is the reason why the port was open: the apache2 unit (which serves port 8991) was down:

Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: Stopping The Apache HTTP Server...
Jan 28 01:35:35 juju-f7587a-5-lxd-3 apachectl[83060]: AH00526: Syntax error on line 33 of /etc/apache2/sites-enabled/openstack_https_frontend.conf:
Jan 28 01:35:35 juju-f7587a-5-lxd-3 apachectl[83060]: SSLCertificateFile: file '/etc/apache2/ssl/designate/cert_designate.silo5.lab1.solutionsqa' does not exist or is empty
Jan 28 01:35:35 juju-f7587a-5-lxd-3 apachectl[83058]: Action 'graceful-stop' failed.
Jan 28 01:35:35 juju-f7587a-5-lxd-3 apachectl[83058]: The Apache error log may have more information.
Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: apache2.service: Failed with result 'exit-code'.
Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: Stopped The Apache HTTP Server.
Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: Starting The Apache HTTP Server...
Jan 28 01:35:35 juju-f7587a-5-lxd-3 apachectl[83067]: AH00526: Syntax error on line 33 of /etc/apache2/sites-enabled/openstack_https_frontend.conf:
Jan 28 01:35:35 juju-f7587a-5-lxd-3 apachectl[83067]: SSLCertificateFile: file '/etc/apache2/ssl/designate/cert_designate.silo5.lab1.solutionsqa' does not exist or is empty
Jan 28 01:35:35 juju-f7587a-5-lxd-3 apachectl[83064]: Action 'start' failed.
Jan 28 01:35:35 juju-f7587a-5-lxd-3 apachectl[83064]: The Apache error log may have more information.
Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: apache2.service: Failed with result 'exit-code'.
Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: Failed to start The Apache HTTP Server.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

And in the designate_2 log:

2024-01-28 01:35:35 DEBUG unit.designate/2.juju-log server.go:325 certificates:126: Writing file /etc/apache2/sites-available/openstack_https_frontend.conf root:designate 640
2024-01-28 01:35:35 WARNING unit.designate/2.certificates-relation-changed logger.go:60 Job for apache2.service failed because the control process exited with error code.
2024-01-28 01:35:35 WARNING unit.designate/2.certificates-relation-changed logger.go:60 See "systemctl status apache2.service" and "journalctl -xeu apache2.service" for details.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Also in the designate_2 log:

The timestamps for the cert_designate.silo5:

2024-01-28 01:35:59 DEBUG unit.designate/2.juju-log server.go:325 certificates:126: Writing file /etc/apache2/ssl/designate/cert_designate.silo5.lab1.solutionsqa root:designate 640
2024-01-28 01:35:59 DEBUG unit.designate/2.juju-log server.go:325 certificates:126: Writing file /etc/apache2/ssl/designate/key_designate.silo5.lab1.solutionsqa root:designate 640

vs when the apache unit tried to restart:

Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: apache2.service: Failed with result 'exit-code'.
Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: Failed to start The Apache HTTP Server.

e.g. the certs were written after that attempt to start the apache2 server. Then it was marked inactive: in the sysd journal:

Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: apache2.service: Failed with result 'exit-code'.
Jan 28 01:35:35 juju-f7587a-5-lxd-3 systemd[1]: Failed to start The Apache HTTP Server.
...

Jan 28 01:36:08 juju-f7587a-5-lxd-3 systemd[1]: apache2.service: Unit cannot be reloaded because it is inactive.
...
Jan 28 01:38:27 juju-f7587a-5-lxd-3 systemd[1]: apache2.service: Unit cannot be reloaded because it is inactive.

I'm guessing there is a logic error between when the apache conf files are written with the cert files, and when the files are written, with an apache2 restart in-between them.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I *think* the bug is a race in the config rendering in designate in:

    def render_full_config(self, interfaces_list):
        """Render all config for Designate service

        @returns None
        """
        # Render base config first to ensure Designate API is responding as
        # sink configs rely on it.
        self.render_base_config(interfaces_list)
        self.render_with_interfaces(interfaces_list)

It calls render twice, which potentially involves two restarts.

However, it looks like `configure_tls(...)` function in charms.openstack does a `service_reload('apache2')` as part of configuring certs to try a restart apache2 when the certs have been written.

So the issue appears to be:

The apache2 config is written with the cert info and then apache2 is restarted BEFORE the cert file is written to disk, and this is some kind of race or logic error as it doesn't happen EVERY time.

This might be due to the ordering of charms.reactive calls to handlers that occur and when a file gets written vs when certificates get configured.

Changed in charm-designate:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.