N -> O upgrade: wrong nova placement parameters.

Bug #1684058 reported by Sofer Athlan-Guyot on 2017-04-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Sofer Athlan-Guyot

Bug Description

Hi,

during the upgrade we configure the compute node in batch (by default) and set some needed options for ocata. On of them in the placement configuration, see [1].

Unfortunatly, the configuration is wrong and we get:

    ERROR oslo_service.service PlacementNotConfigured: This compute is not configured to talk to the placement service. Configure the [placement] section of nova.conf and restart the service.

in the logs.

That seems to lead to an orchestration problem where the nova-compute service cannot be safely restarted.

I wonder if the restart [2] is needed as the packages are not yet updated at this point and we still have newton nova-compute.

[1] https://github.com/openstack/tripleo-heat-templates/blob/299b9f532377a3a0c16ba9cb4fe92c637fc38eeb/puppet/major_upgrade_steps.j2.yaml#L48-L63

[2] https://github.com/openstack/tripleo-heat-templates/blob/299b9f532377a3a0c16ba9cb4fe92c637fc38eeb/puppet/major_upgrade_steps.j2.yaml#L63

Marios Andreou (marios-b) wrote :

12:40 < chem> marios: https://bugs.launchpad.net/tripleo/+bug/1684058, I'm not sure we should restart the nova-compute at all
12:41 < chem> marios: it's still newton at that moment and don't need the new parameter or do I miss something ?
12:43 < marios> chem: well its meant to be for picking up that placement config i think owalsh was helping us with that at the time.. owalsh we are talking about
                https://github.com/openstack/tripleo-heat-templates/blob/299b9f532377a3a0c16ba9cb4fe92c637fc38eeb/puppet/major_upgrade_steps.j2.yaml#L63 which happens on
                nova-compute nodes before the controller ansible upgrade
12:43 < marios> chem: i think it was 'in order to allow nova compute to talk to the upgraded nova and other services on controllers after we upgrade _those_ we should set
                these things on nova node first

So for the restart of nova-compute, as it's newton, it completely ignores the nova placement [1] parameter. But those are required, as yum upgrade will try to restart the service and, as we saw, it fails if the placement parameter are not set.

[1] nova placement for newton is really not clear:
  - we have https://review.openstack.org/#/c/442035/ which says remove it as it's unused;
  - we have https://docs.openstack.org/developer/nova/placement.html that says it's introduced in newton ....

All in all I think we shouldn't restart it as it's confusing.

Changed in tripleo:
assignee: nobody → Sofer Athlan-Guyot (sofer-athlan-guyot)
status: Confirmed → In Progress
tags: added: upgrade

So, originally reported there https://bugzilla.redhat.com/show_bug.cgi?id=1440680.

So what we are observing here is:
 - 1. we restart during the controller phase after adding the placement parameter: the restart is useless but not harmful
 - 2. now nova conf has wrong placement parameter, see[1]
 - 3. during the non-controller upgrade script, yum upgrade openstack-nova-compute and then try to restart it and fails because of wrong placement parameter, it goes one restarting it continuously;
 - 4. puppet happens and upgrade parameters in nova.conf one at a time (like sequential crudini) with a purge (recreate all the configuration)
 - 5. systemd restart openstack-nova-compute at the wrong time and get default value for rabbitmq (127.0.0.1) and the openstack-nova-compute is stuck at boot
 - 6. puppet hangs as it cannot restart the service as it's already in the "starting" state.

So I think that making sure we get the placement parameter right should solve the issue. Having openstack-nova-compute down before yum upgrade shouldn't be necessary. It's just a matter of having the parameters right

Reviewed: https://review.openstack.org/457965
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=88a3168b3019f7c8232c14b95d4c7c6fb5080f03
Submitter: Jenkins
Branch: master

commit 88a3168b3019f7c8232c14b95d4c7c6fb5080f03
Author: Sofer Athlan-Guyot <email address hidden>
Date: Wed Apr 19 11:26:45 2017 +0200

    N->O upgrade, fix wrong parameters to nova placement.

    According to [1] we need os_region_name, not region_name. Furthermore
    the os_interface is configured as well. The hard check on this
    parameter was introduced in ocata[2], explaining why the newton version
    did not chock on it.

    [1] https://docs.openstack.org/ocata/config-reference/compute/config-options.html
    [2] https://github.com/openstack/nova/commit/d486315e0

    Closes-Bug: #1684058
    Change-Id: If6118bf03e832fe3fa5ea4fcb1b436afd2adf80a

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/458416
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6f75d76d42203657a2b39af5269d2a8f586e93bc
Submitter: Jenkins
Branch: stable/ocata

commit 6f75d76d42203657a2b39af5269d2a8f586e93bc
Author: Sofer Athlan-Guyot <email address hidden>
Date: Wed Apr 19 11:26:45 2017 +0200

    N->O upgrade, fix wrong parameters to nova placement.

    According to [1] we need os_region_name, not region_name. Furthermore
    the os_interface is configured as well. The hard check on this
    parameter was introduced in ocata[2], explaining why the newton version
    did not chock on it.

    [1] https://docs.openstack.org/ocata/config-reference/compute/config-options.html
    [2] https://github.com/openstack/nova/commit/d486315e0

    Closes-Bug: #1684058
    Change-Id: If6118bf03e832fe3fa5ea4fcb1b436afd2adf80a
    (cherry picked from commit 88a3168b3019f7c8232c14b95d4c7c6fb5080f03)

tags: added: in-stable-ocata

This issue was fixed in the openstack/tripleo-heat-templates 6.1.0 release.

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.