Nova services do not restart on N->O upgrade

Bug #1673889 reported by Dan Kolb
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Critical
Jesse Pretorius

Bug Description

OS Ubuntu 16.04
Version 14.1.1 upgrading to stable/ocata
Docs - https://docs.openstack.org/developer/openstack-ansible/upgrade-guide/script-upgrade.html
Possibly duplicate of https://bugs.launchpad.net/openstack-ansible/+bug/1667130

On an environment with successful Tempest results of Stable/Newton, and upgrading to Stable/Ocata via the upgrade script, post upgrade tests fail. Nova logs report IncompatibleVersion errors, and all nova services are reporting running 14.1.1 versions. Other services correctly run the 15.0.0 version after upgrade.

http://paste.openstack.org/show/603195/

Restarting all nodes/containers then results in all services running the 15.0.0 version and Tempest tests running successfully.

The Ansible output of the upgrade does show nova containers were not restarted

TASK [Lxc container restart] ***************************************************
Friday 17 March 2017 15:09:03 -0500 (0:00:01.844) 0:07:59.862 **********
skipping: [infra01_nova_console_container-6c00742d]
skipping: [infra02_nova_console_container-34ac99c3]
skipping: [infra01_nova_conductor_container-bda26a60]
skipping: [infra03_nova_conductor_container-ad862a38]
skipping: [infra02_nova_conductor_container-11d4b383]
skipping: [infra01_nova_api_metadata_container-9a0788d7]
skipping: [infra03_nova_api_metadata_container-c8765b95]
skipping: [infra02_nova_api_metadata_container-7827b6fd]
skipping: [infra01_nova_api_os_compute_container-96f37e5e]
skipping: [infra03_nova_api_os_compute_container-2a1d8c4c]
skipping: [infra02_nova_api_os_compute_container-65538299]
skipping: [infra01_nova_cert_container-67530cee]
skipping: [infra03_nova_cert_container-c17f638e]
skipping: [infra02_nova_cert_container-6265c5e0]
skipping: [infra01_nova_scheduler_container-50325832]
skipping: [infra03_nova_scheduler_container-2225daf1]
skipping: [infra02_nova_scheduler_container-3446e7c8]
skipping: [compute02]
skipping: [compute03]
skipping: [compute01]
skipping: [compute06]
skipping: [compute07]
skipping: [compute04]
skipping: [compute05]
skipping: [compute08]
skipping: [compute09]
skipping: [infra01_nova_api_placement_container-b6acbeb3]
skipping: [infra03_nova_api_placement_container-c9e430a9]
skipping: [infra03_nova_console_container-3c16b6e4]
skipping: [infra02_nova_api_placement_container-f3bd6d65]

Full installation/upgrade logs can be found at (HUGE_FILE_ALERT):
http://172.99.106.115/jenkins/job/22Node_BME_Upgrade(danko)/131/console

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Kevin Carter (kevin-carter) wrote :

the nova services within the containers should have restarted when the new set of venvs, config, and init scripts were dropped. If any of that didn't happen we'd be facing an issue with the nova service as found within the os_nova role. I could see on possibility where the handler maybe didn't fire? Was this an upgrade that ran into an issue mid-upgrade and was rerun?

As for the container restarts we don't restart all containers on all upgrade, in fact we do our best not to restart containers in an effort to maximize uptime. So the fact that the container restart was all skipped, in my mind, means that we did a decent job of not impacting uptime during the major migration.

As for the fact that all of the services were running the proper version once the containers were restarted, I'd have to suspect that there was something else that happened that caused the handler not to fire which could be a bug in the role or something else. Would it be possible to get the entire log for the run? Also have we confirmed this issue on multiple runs?

Changed in openstack-ansible:
assignee: nobody → Kevin Carter (kevin-carter)
Revision history for this message
Dan Kolb (dankolbrs) wrote :

Log of the entire run is available in the link in the bug description at:
http://172.99.106.115/jenkins/job/22Node_BME_Upgrade(danko)/131/console

This includes deployment, tempest test runs, and some additional tests. This was upgrade via the upgrade script with it completing with a 0 return code, and no additional configuration or restart of the upgrade was done. I'd recommend doing a wget and less the file to view the console output, as its around 65mb and will probably crash a browser attempting to load.

The effects of Nova services requiring restart has been seen on all stable/newton to stable/ocata runs attempting upgrade via the upgrade script on a 22 node bare metal environment, and I have reproduced in an AIO following the deploy and upgrade instructions provided in the OSA docs linked in the bug report.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-os_nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/451920

Changed in openstack-ansible:
assignee: Kevin Carter (kevin-carter) → Dan Kolb (dankolbrs)
status: Confirmed → In Progress
Changed in openstack-ansible:
assignee: Dan Kolb (dankolbrs) → Jesse Pretorius (jesse-pretorius)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-os_nova (master)

Reviewed: https://review.openstack.org/451920
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-os_nova/commit/?id=5fbbff6b4697577ecbcda8c55b9d1182ce22e00d
Submitter: Jenkins
Branch: master

commit 5fbbff6b4697577ecbcda8c55b9d1182ce22e00d
Author: Dan Kolb <email address hidden>
Date: Thu Mar 30 12:27:29 2017 -0500

    Reload service files on Nova services restart

    During an upgrade new service files are added, but systemd is not
    reloaded during restart of nova services to pick up these file
    changes. This performs a daemon-reload when restarting nova
    services.

    Change-Id: I98b3f66429ee045f052ad491847cf82d2f5d4efc
    Closes-Bug: #1673889

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-os_nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/452327

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-os_nova (stable/ocata)

Reviewed: https://review.openstack.org/452327
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-os_nova/commit/?id=a09316e3d1dec0de69c0deb35bb5b61fb0b2ef52
Submitter: Jenkins
Branch: stable/ocata

commit a09316e3d1dec0de69c0deb35bb5b61fb0b2ef52
Author: Dan Kolb <email address hidden>
Date: Thu Mar 30 12:27:29 2017 -0500

    Reload service files on Nova services restart

    During an upgrade new service files are added, but systemd is not
    reloaded during restart of nova services to pick up these file
    changes. This performs a daemon-reload when restarting nova
    services.

    Change-Id: I98b3f66429ee045f052ad491847cf82d2f5d4efc
    Closes-Bug: #1673889
    (cherry picked from commit 5fbbff6b4697577ecbcda8c55b9d1182ce22e00d)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (stable/ocata)

Reviewed: https://review.openstack.org/452537
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=ee9a366effeba3d7716220597444c779e126af51
Submitter: Jenkins
Branch: stable/ocata

commit ee9a366effeba3d7716220597444c779e126af51
Author: Dan Kolb <email address hidden>
Date: Sun Apr 2 08:23:56 2017 -0500

    Updates SHA to nova-services restart fix.

    Modifies SHA to include the recent merge to daemon-reload during
    nova services restart.

    Closes-Bug: #1673889

    Change-Id: Ie2336d09a11c043ad1a37c8830829ee61ed71b30

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-os_nova 16.0.0.0b1

This issue was fixed in the openstack/openstack-ansible-os_nova 16.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible 15.1.1

This issue was fixed in the openstack/openstack-ansible 15.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-os_nova 15.1.1

This issue was fixed in the openstack/openstack-ansible-os_nova 15.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.