process-monitor: updating nova venv results in downtime

Bug #1839928 reported by Merlin on 2019-08-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
masakari-monitors
Undecided
Unassigned

Bug Description

Today we ran openstack-ansible after switching our openstack-ansible from stable/stein-Branch to 19.0.0-Tag. Doing so, we called setup-openstack.yml. Afterwards ALL compute nodes where out of service.

Analyzing this state we found in /var/log/daemon.log, that the nova-compute-Service is in a start-stop-Loop:

Aug 7 12:07:56 host1 systemd[1]: Stopping nova-compute service...
Aug 7 12:07:57 host1 systemd[1]: Stopped nova-compute service.
Aug 7 12:07:57 host1 systemd[1]: nova-compute.service: Consumed 3.442s CPU time
Aug 7 12:07:57 host1 systemd[1]: Started nova-compute service.
Aug 7 12:07:59 host1 systemd[1]: Stopping nova-compute service...
Aug 7 12:07:59 host1 systemd[1]: Stopped nova-compute service.
Aug 7 12:07:59 host1 systemd[1]: nova-compute.service: Consumed 2.544s CPU time
Aug 7 12:07:59 host1 systemd[1]: Started nova-compute service.
Aug 7 12:08:02 host1 systemd[1]: Stopping nova-compute service...
Aug 7 12:08:02 host1 systemd[1]: Stopped nova-compute service.
Aug 7 12:08:02 host1 systemd[1]: nova-compute.service: Consumed 2.614s CPU time
Using masakari-process-monitor and suspecting it to fail we had an eye on /var/log/syslog and found:

Aug 7 16:17:33 host1 masakari-processmonitor[48168]: 2019-08-07 16:17:33.833 48168 WARNING masakarimonitors.processmonitor.process_handler.handle_process [-] Process '/openstack/venvs/nova-19.0.0.0rc3.dev5/bin/nova-compute' is not found.
Aug 7 16:17:33 host1 masakari-processmonitor[48168]: 2019-08-07 16:17:33.945 48168 INFO masakarimonitors.processmonitor.process_handler.handle_process [-] Restart of process with executing command: systemctl restart nova-compute
Checking our process list of the system we found the nova-process:

:~# ps uax | grep nova
nova 38205 2.2 0.0 2618068 168120 ? Ssl 16:20 0:05 /openstack/venvs/nova-19.0.0/bin/python2 /openstack/venvs/nova-19.0.0/bin/nova-compute
From this we deduced that the masakari-process-monitor is looking for the full path of nova-compute binary.

After updating our venv the actual path and the path masakari-process-monitor expects differ and thus reacts.

This behavior is obviously harmfull.

Especially in case of openstack-ansible there is a long time between the calls of os-nova-install.yml and os-masakri-install.yml.

- import_playbook: os-keystone-install.yml
...
- import_playbook: os-nova-install.yml
- import_playbook: os-neutron-install.yml
- import_playbook: os-heat-install.yml
- import_playbook: os-horizon-install.yml
- import_playbook: os-designate-install.yml
- import_playbook: os-gnocchi-install.yml
- import_playbook: os-swift-install.yml
- import_playbook: os-ceilometer-install.yml
- import_playbook: os-aodh-install.yml
- import_playbook: os-panko-install.yml
- import_playbook: os-ironic-install.yml
- import_playbook: os-magnum-install.yml
- import_playbook: os-trove-install.yml
- import_playbook: os-sahara-install.yml
- import_playbook: os-octavia-install.yml
- import_playbook: os-tacker-install.yml
- import_playbook: os-blazar-install.yml
- import_playbook: os-masakari-install.yml
...
The bug will be fixed eventually (if configuration of OSA is ok) during the runtime of setup-openstack.yml.

Nonetheless ALL openstack-ansible-based Installation will experience downtime of there nova-compute-Service in the meantime.

This has to be fixed ASAP, pls.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers