upgrade from bionic-queens to bionic-rocky results in "nova-os-api-compute.service is not running" in nagios (used to say "to-stein", but actually happens on upgrade to rocky)

Bug #1849897 reported by Drew Freiberger
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Triaged
High
Unassigned

Bug Description

After performing upgrades from xenial-queens through to bionic-stein, it has resulted in a stale nagios check defined:

nova-api-os-compute - CRITICAL: nova-api-os-compute.service is not running

When I investigate, I find nova-api-os-compute service is masked on the system.

It appears this has moved to an apache2 wsgi service ala:
/etc/apache2/sites-enabled/wsgi-api-os-compute.conf

I would like to suggest changing this nagios check to validate content availability of this wsgi service rather than checking status of the systemd service that is no longer valid.

Changed in charm-nova-cloud-controller:
status: New → Triaged
importance: Undecided → High
tags: added: openstack-upgrade
summary: - upgrade from rocky to stein results in "nova-os-api-compute.service is
- not running" in nagios
+ upgrade from rocky to bionic results in "nova-os-api-compute.service is
+ not running" in nagios (used to say "to-stein", but actually happens on
+ upgrade to rocky)
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote : Re: upgrade from rocky to bionic results in "nova-os-api-compute.service is not running" in nagios (used to say "to-stein", but actually happens on upgrade to rocky)

So, I've reproduced it and it actually happens from the upgrade of bionic-queens (distro) -> bionic-rocky. It does move from its "own" api executable to being run under wsgi in apache2. The fix (as Drew reported) is to migrate the check to validating the wsgi service. An interim is to simply rely on the fact that the apache2 check "means" that the api is running at bionic and remove the defunct nova-api-os-compute check altogether.

I'll investigate the difficulty on the former, but put a patch in to clean up the defunct check at bionic+.

Changed in charm-nova-cloud-controller:
assignee: nobody → Alex Kavanagh (ajkavanagh)
status: Triaged → In Progress
Revision history for this message
Andrea Ieri (aieri) wrote :

As a workaround, the check can be removed by injecting updated relation data.

Example:

$ juju run -u nova-cloud-controller/0 -- relation-ids nrpe-external-master
nrpe-external-master:257

$ juju run -u nova-cloud-controller/0 -- relation-list -r257
nrpe-container/38

$ juju run -u nrpe-container/38 -- relation-get -r257 - nova-cloud-controller/0

[...checks are here...]

Save the monitors in a file, remove the nova-api-os-compute check:

$ cat monitors.lp1849897.out
monitors:
  remote:
    nrpe:
      apache2: {command: check_apache2}
      haproxy: {command: check_haproxy}
      haproxy_queue: {command: check_haproxy_queue}
      haproxy_servers: {command: check_haproxy_servers}
      memcached: {command: check_memcached}
      nova-conductor: {command: check_nova-conductor}
      nova-consoleauth: {command: check_nova-consoleauth}
      nova-novncproxy: {command: check_nova-novncproxy}
      nova-scheduler: {command: check_nova-scheduler}

Now set the amended relation data:

$ juju run -u nova-cloud-controller/0 -- relation-set -r257 monitors="$(cat monitors.lp184
9897.out)"

Changed in charm-nova-cloud-controller:
status: In Progress → Triaged
assignee: Alex Kavanagh (ajkavanagh) → nobody
summary: - upgrade from rocky to bionic results in "nova-os-api-compute.service is
- not running" in nagios (used to say "to-stein", but actually happens on
- upgrade to rocky)
+ upgrade from bionic-queens to bionic-rocky results in "nova-os-api-
+ compute.service is not running" in nagios (used to say "to-stein", but
+ actually happens on upgrade to rocky)
tags: added: aubergine
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.