Services not running that should be: apache2

Bug #1952414 reported by Aurelien Lourot
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceph RADOS Gateway Charm
New
Undecided
Unassigned
Gnocchi Charm
New
Undecided
Unassigned
OpenStack Cinder Charm
New
Undecided
Unassigned
OpenStack Dashboard Charm
New
Undecided
Unassigned
OpenStack Glance Charm
New
Undecided
Unassigned
OpenStack Heat Charm
New
Undecided
Unassigned
OpenStack Neutron API Charm
New
Undecided
Unassigned
OpenStack Nova Cloud Controller Charm
New
Undecided
Unassigned
OpenStack Swift Proxy Charm
New
Undecided
Unassigned

Bug Description

We have been seeing this in various charm gates since the beginning of November 2021.

For example on Focal-Victoria in a 21.10 backport on the neutron-api charm in the octavia gate:
https://review.opendev.org/c/openstack/charm-octavia/+/819246
https://openstack-ci-reports.ubuntu.com/artifacts/d39/819246/2/check/focal-victoria-no-amphora/d39cb11/

For example on Bionic-Queens on `master` on the neutron-api charm in a trilio gate:
https://review.opendev.org/c/openstack/charm-trilio-horizon-plugin/+/816064
https://openstack-ci-reports.ubuntu.com/artifacts/524/816064/11/check/bionic-queens-41_trilio-horizon/524a9a9/

For example on Bionic-Queens on `master` on the nova-cloud-controller charm in a trilio gate:
https://review.opendev.org/c/openstack/charm-trilio-wlm/+/816065
https://openstack-ci-reports.ubuntu.com/artifacts/5c0/816065/9/check/bionic-queens-41_trilio-wlm/5c0535f/

For example on Bionic-Train on `master` on the swift-proxy charm in the gnocchi gate: (juju crashdump attached)
https://review.opendev.org/c/openstack/charm-gnocchi/+/817785
https://openstack-ci-reports.ubuntu.com/artifacts/0aa/817785/3/check/test-s3-bionic-train-s3/0aa151a/

Maybe caused by lp:1795918

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Revision history for this message
Felipe Reyes (freyes) wrote :
Download full text (3.2 KiB)

this is not looking right, I'm troubleshooting a ci job failure https://openstack-ci-reports.ubuntu.com/artifacts/5d4/819438/2/check/bionic-ussuri-ha-ovn/5d44828/

```
2021-11-26 17:04:54.656762 | focal-medium | 2021-11-26 17:04:54 [INFO] Timed out waiting for 'neutron-api/0'. The workload status message is 'Services not running that should be: apache2' which is not one of '['ready', 'Ready', 'Unit is ready']'
```

```
neutron-api/0* blocked idle 4 172.16.0.254 9696/tcp Services not running that should be: apache2
```

Looking at it closer I found this:

In syslog:
```
Nov 26 16:38:34 juju-98a11d-zaza-4d85c92c4d57-4 systemd[1]: Starting The Apache HTTP Server...
Nov 26 16:38:34 juju-98a11d-zaza-4d85c92c4d57-4 apachectl[5947]: AH00526: Syntax error on line 13 of /etc/apache2/sites-enabled/openstack_https_frontend.conf:
Nov 26 16:38:34 juju-98a11d-zaza-4d85c92c4d57-4 apachectl[5947]: SSLCertificateFile: file '/etc/apache2/ssl/neutron/cert_172.16.0.254' does not exist or is empty
Nov 26 16:38:34 juju-98a11d-zaza-4d85c92c4d57-4 apachectl[5947]: Action 'start' failed.
Nov 26 16:38:34 juju-98a11d-zaza-4d85c92c4d57-4 apachectl[5947]: The Apache error log may have more information.
Nov 26 16:38:34 juju-98a11d-zaza-4d85c92c4d57-4 systemd[1]: apache2.service: Control process exited, code=exited status=1
Nov 26 16:38:34 juju-98a11d-zaza-4d85c92c4d57-4 systemd[1]: apache2.service: Failed with result 'exit-code'.
Nov 26 16:38:34 juju-98a11d-zaza-4d85c92c4d57-4 systemd[1]: Failed to start The Apache HTTP Server.
```

Then in the apache config `/etc/apache2/sites-enabled/openstack_https_frontend.conf`:
```
    SSLCertificateFile /etc/apache2/ssl/neutron/cert_172.16.0.254
    # See LP 1484489 - this is to support <= 2.4.7 and >= 2.4.8
    SSLCertificateChainFile /etc/apache2/ssl/neutron/cert_172.16.0.254
    SSLCertificateKeyFile /etc/apache2/ssl/neutron/key_172.16.0.254
```

Then in `/var/log/juju/unit-neutron-api-0.log`:

```
2021-11-26 15:56:29 DEBUG juju-log certificates:41: Writing file /usr/local/share/ca-certificates/vault_juju_ca_cert.crt root:root 444
2021-11-26 15:56:29 DEBUG certificates-relation-changed Clearing symlinks in /etc/ssl/certs...
2021-11-26 15:56:29 DEBUG certificates-relation-changed done.
2021-11-26 15:56:29 DEBUG certificates-relation-changed Updating certificates in /etc/ssl/certs...
2021-11-26 15:56:30 DEBUG certificates-relation-changed 129 added, 0 removed; done.
2021-11-26 15:56:30 DEBUG certificates-relation-changed Running hooks in /etc/ca-certificates/update.d...
2021-11-26 15:56:30 DEBUG certificates-relation-changed done.
2021-11-26 15:56:30 DEBUG jujuc server.go:211 running hook tool "juju-log" for neutron-api/0-certificates-relation-changed-4815572091935494528
2021-11-26 15:56:30 DEBUG juju-log certificates:41: Writing file /etc/apache2/ssl/neutron/cert_host-172-16-0-254.project.serverstack root:root 640
2021-11-26 15:56:31 DEBUG jujuc server.go:211 running hook tool "juju-log" for neutron-api/0-certificates-relation-changed-4815572091935494528
2021-11-26 15:56:31 DEBUG juju-log certificates:41: Writing file /etc/apache2/ssl/neutron/key_host-172-16-0-254.project.serverstack roo...

Read more...

tags: added: unstable-test
Revision history for this message
Felipe Reyes (freyes) wrote :

I think everyhing boils down to get_network_addresses()[0] , according to the docstring it may return any of these lists:

[(address_in_net_a, hostname_in_net_a),...]
[(address_in_net_a, vip_in_net_a),...]
[(address_in_net_a, address_in_net_a),...]

For the job I am troubleshooting neutron-api has a single unit (non-HA), so we have security that there is no vip set which rules out the possibility of return a list like the second one. So what if one run of get_network_addressess() returned a list like the first one while other runs returned a list like the third one?, I have no evidence, but it would explain the behavior.

[0] https://github.com/juju/charm-helpers/blob/master/charmhelpers/contrib/openstack/context.py#L1071

Revision history for this message
Felipe Reyes (freyes) wrote :
Revision history for this message
Corey Bryant (corey.bryant) wrote :
Revision history for this message
Felipe Reyes (freyes) wrote :
Revision history for this message
Felipe Reyes (freyes) wrote :

https://github.com/juju/charm-helpers/pull/661 - the get_network_addresses() is now logging what the returned set contains, so when this bug pops up again we need to grep for "get_network_addresses" in /var/log/juju/unit-*.log to confirm or discard my hypothesis in comment #3 .

Revision history for this message
Nobuto Murata (nobuto) wrote :
Download full text (6.7 KiB)

> we need to grep for "get_network_addresses" in /var/log/juju/unit-*.log
> to confirm or discard my hypothesis in comment #3 .

Hope this helps:

"get_network_addresses: [('192.168.151.123', '192.168.151.123')]"

$ ll /etc/apache2/ssl/horizon/
total 16
dr-xr-xr-x 2 root root 4096 Jun 22 03:45 ./
drwxr-xr-x 3 root root 4096 Jun 22 03:29 ../
-rw-r----- 1 root root 1492 Jun 22 03:45 cert_juju-31eb60-1-lxd-5.maas
-rw-r----- 1 root root 1678 Jun 22 03:45 key_juju-31eb60-1-lxd-5.maas

$ journalctl -u apache2 -n 30 --no-pager
-- Logs begin at Wed 2022-06-22 03:08:02 UTC, end at Wed 2022-06-22 04:24:33 UTC. --
Jun 22 03:37:32 juju-31eb60-1-lxd-5 systemd[1]: Stopped The Apache HTTP Server.
Jun 22 03:37:38 juju-31eb60-1-lxd-5 systemd[1]: Starting The Apache HTTP Server...
Jun 22 03:37:39 juju-31eb60-1-lxd-5 apachectl[30701]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 192.168.151.123. Set the 'ServerName' directive globally to suppress this message
Jun 22 03:37:39 juju-31eb60-1-lxd-5 systemd[1]: Started The Apache HTTP Server.
Jun 22 03:46:02 juju-31eb60-1-lxd-5 systemd[1]: Reloading The Apache HTTP Server.
Jun 22 03:46:02 juju-31eb60-1-lxd-5 apachectl[34513]: AH00548: NameVirtualHost has no effect and will be removed in the next release /etc/apache2/sites-enabled/default-ssl.conf:3
Jun 22 03:46:02 juju-31eb60-1-lxd-5 apachectl[34513]: AH00526: Syntax error on line 43 of /etc/apache2/sites-enabled/default-ssl.conf:
Jun 22 03:46:02 juju-31eb60-1-lxd-5 apachectl[34513]: SSLCertificateFile: file '/etc/apache2/ssl/horizon/cert_192.168.151.123' does not exist or is empty
Jun 22 03:46:02 juju-31eb60-1-lxd-5 apachectl[34510]: Action 'graceful' failed.
Jun 22 03:46:02 juju-31eb60-1-lxd-5 apachectl[34510]: The Apache error log may have more information.
Jun 22 03:46:02 juju-31eb60-1-lxd-5 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Jun 22 03:46:02 juju-31eb60-1-lxd-5 systemd[1]: Reload failed for The Apache HTTP Server.
Jun 22 03:50:30 juju-31eb60-1-lxd-5 systemd[1]: Stopping The Apache HTTP Server...
Jun 22 03:50:30 juju-31eb60-1-lxd-5 apachectl[36531]: AH00548: NameVirtualHost has no effect and will be removed in the next release /etc/apache2/sites-enabled/default-ssl.conf:3
Jun 22 03:50:30 juju-31eb60-1-lxd-5 apachectl[36531]: AH00526: Syntax error on line 43 of /etc/apache2/sites-enabled/default-ssl.conf:
Jun 22 03:50:30 juju-31eb60-1-lxd-5 apachectl[36531]: SSLCertificateFile: file '/etc/apache2/ssl/horizon/cert_192.168.151.123' does not exist or is empty
Jun 22 03:50:30 juju-31eb60-1-lxd-5 apachectl[36529]: Action 'stop' failed.
Jun 22 03:50:30 juju-31eb60-1-lxd-5 apachectl[36529]: The Apache error log may have more information.
Jun 22 03:50:30 juju-31eb60-1-lxd-5 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Jun 22 03:50:30 juju-31eb60-1-lxd-5 systemd[1]: apache2.service: Failed with result 'exit-code'.
Jun 22 03:50:30 juju-31eb60-1-lxd-5 systemd[1]: Stopped The Apache HTTP Server.
Jun 22 03:50:37 juju-31eb60-1-lxd-5 systemd[1]: Starting The Apache HTTP Server...
Jun 22 03:50:37 juju-31eb60-1-lxd-5 apachectl[36545]: AH00548...

Read more...

Revision history for this message
Nobuto Murata (nobuto) wrote :

Another unit with "Services not running that should be: apache2" in the same deployment.

tags: added: cdo-qa
Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.