newton->ocata upgrade hangs starting nova-api

Bug #1665717 reported by Steven Hardy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Oliver Walsh

Bug Description

I'm testing composable upgrades, and with nova enabled we get stuck at step2 with this error in the nova API log:

2017-02-17 17:11:06.341 51250 ERROR nova Traceback (most recent call last):
2017-02-17 17:11:06.341 51250 ERROR nova File "/usr/bin/nova-api", line 10, in <module>
2017-02-17 17:11:06.341 51250 ERROR nova sys.exit(main())
2017-02-17 17:11:06.341 51250 ERROR nova File "/usr/lib/python2.7/site-packages/nova/cmd/api.py", line 59, in main
2017-02-17 17:11:06.341 51250 ERROR nova server = service.WSGIService(api, use_ssl=should_use_ssl)
2017-02-17 17:11:06.341 51250 ERROR nova File "/usr/lib/python2.7/site-packages/nova/service.py", line 319, in __init__
2017-02-17 17:11:06.341 51250 ERROR nova self.workers = (getattr(CONF, '%s_workers' % wname, None) or
2017-02-17 17:11:06.341 51250 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2267, in __getattr__
2017-02-17 17:11:06.341 51250 ERROR nova return self._get(name)
2017-02-17 17:11:06.341 51250 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2672, in _get
2017-02-17 17:11:06.341 51250 ERROR nova value = self._do_get(name, group, namespace)
2017-02-17 17:11:06.341 51250 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2715, in _do_get
2017-02-17 17:11:06.341 51250 ERROR nova % (opt.name, str(ve)))
2017-02-17 17:11:06.341 51250 ERROR nova ConfigFileValueError: Value for option osapi_compute_workers is not valid: Should be greater than or equal to 1
2017-02-17 17:11:06.341 51250 ERROR nova

The problem seems to be we update nova-api packages, then try to start the service, but puppet hasn't yet run to set the value for osapi_compute_workers.

I'm currently not sure why others manually testing aren't seeing this, I guess my (all upstream) environment is different, but we need to resolve this to enable upgrde CI testing with nova enabled.

Steven Hardy (shardy)
tags: added: upgrade
Revision history for this message
Steven Hardy (shardy) wrote :

There seem to be multiple problems - as well as the *workers config above, the package updates are trying to restart nova* while rabbit is down, so we may have to adjust the steps.

Ideally we'd just stop the packages messing with the service state (--noscripts?) as it makes the ordering more difficult here I think.

Still unclear how/why this is working for others so it'd be good to compare environments and figure that out and/or get confirmation from someone with an upstream env that they see the same as me.

Changed in tripleo:
status: New → Triaged
importance: Undecided → High
milestone: none → ocata-rc2
Revision history for this message
Alex Schultz (alex-schultz) wrote :

nova changed the valid options for this as 0 used to be OK but it was no longer allowed. I'll have to dig up the bug, so it might be why this is only affecting upgrades as I recall this being done in Ocata

Revision history for this message
Alex Schultz (alex-schultz) wrote :
tags: added: ocata-backport-potential
Revision history for this message
Steven Hardy (shardy) wrote :

Yeah I think the problem is the nova packaging tries to restart the nova services on upgrade, but the services aren't yet reconfigured by puppet, so the config files still contain the "old" e.g newton appropriate values.

Really, I think we need a way to disable the RPM scriptlets from messing with service state, as it's a major headache when we just want to stop the ctlplane services then have puppet handle starting the services in the right order.

I expect there are workarounds we can do in the ansible tasks to get past this, but we probably need investigation into the packaging to figure out exactly what is bouncing services, and how we can make it conditional for this environment (and others where we don't want packages restarting services).

Revision history for this message
Oliver Walsh (owalsh) wrote :

I think we just need to reorder the ansible tasks so that the upgrade is run after we stop the service. The rpm is running a systemctl try-restart, which should do nothing if the service isn't running.

Revision history for this message
Michele Baldessari (michele) wrote :

So, completely untested, but there does seem to be a way to disable scripts execution in yum. From the yum.conf manpage (main section):
tsflags Comma or space separated list of transaction flags to pass to the rpm transaction set. These include 'noscripts', 'notriggers', 'nodocs', 'test', 'justdb' and
'nocontexts'. 'repackage' is also available but that does nothing with newer rpm versions. You can set all/any of them. However, if you don't know what these do in the
context of an rpm transaction set you're best leaving it alone. Default is an empty list. Also see the "yum fs" command, for excluding docs.

So adding the following to /etc/yum.conf:
[main]
tsflags=noscripts

might do the trick? (I could not find a way in yum's code to use an alternative /etc/yum.conf file, so in this route we'd have to tweak yum.conf, run the transaction and then reset it back as it was).

Revision history for this message
Steven Hardy (shardy) wrote :

Ok I'll try reordering the tasks so we ensure the service is stopped before attempting the upgrade - I'm not clear why we special-cased the nova-api upgrade here though, as it is already stopped before the pacakge upgrade happens for all packages:

https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/nova-api.yaml#L197

        - name: update nova api
          tags: step2
          yum: name=openstack-nova-api state=latest
        - name: Stop and disable nova_api service (pre-upgrade not under httpd)
          tags: step2
          service: name=openstack-nova-api state=stopped enabled=no

So we can reverse the order of those tasks, but I'm still not clear why we can't just defer the nova-api upgrade to the tripleo-packages upgrade in step3 here:

https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/tripleo-packages.yaml#L44

        - name: Update all packages
          tags: step3
          yum: name=* state=latest

The way this is supposed to work is we stop all services in step2, upgrade the packages in step3, then do any migrations etc needed in steps 4+, finally we run puppet which reconfigures the services and starts them so it'd be good to confirm the special-casing of the nova-api yum update is actually needed (and if so why) - matbu can perhaps help clarify that?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/436150

Changed in tripleo:
assignee: nobody → Oliver Walsh (owalsh)
status: Triaged → In Progress
Revision history for this message
Steven Hardy (shardy) wrote :

Oh and thanks bandini for the tsflags=noscripts tip - that's interesting but I don't think we want to globally disable the RPM scripts, only stop the service states getting modified by any package upgrade operations.

Revision history for this message
Oliver Walsh (owalsh) wrote :

Good point re the need for the individual upgrade tasks. All of the nova services are doing this but the only special case I can see is the placement-api.

Also nova-conductor is still setting the upgrade pin. It should leave this to puppet.

Revision history for this message
Michele Baldessari (michele) wrote :

Fully agreed Steven, we'd have to set tsflags=noscript only for a very specific upgrade run and then make sure that flag is gone. Only to be used if we can't come up with other solutions/approaches, I think.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/436228

Revision history for this message
Marios Andreou (marios-b) wrote :

o/ hey guys how's Atlanta :)

shardy I've definitely hit this - last week was the first time, but i was still testing the nova review https://review.openstack.org/#/c/405241/ and that went through a few revisions before landing. Trouble is, once you've hit this you'd have to start downgrading nova to hit it again (i.e. you recover by manually setting the osapi_workers in nova.conf and re-run the upgrade so won't hit it second time). I hit it again yesterday and came looking for this bug a few minutes ago :)

I haven't reset my environment yet so can't fully test https://review.openstack.org/#/c/436150/ yet (it has to be on a clean pre yum update env) but lgtm... for what it's worth i did ask a couple of times about the special case step2 for the nova-api update (see https://review.openstack.org/#/c/405241/26/puppet/services/nova-api.yaml for example) - and it still isn't clear to me why that is the case.

For the disabling of scripts (even though we probably won't need it here) - fyi i think we can have better flexibility (for the single package case) using 'rpm' directly like we do for the openvswitch package for example https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/pacemaker_common_functions.sh#L314 (skip the postuninstall scripts) and now at https://review.openstack.org/#/c/434346/6/puppet/services/tripleo-packages.yaml

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/436150
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=10ba1fa6068978d5779da4b3c6966d73e893a7e5
Submitter: Jenkins
Branch: master

commit 10ba1fa6068978d5779da4b3c6966d73e893a7e5
Author: Oliver Walsh <email address hidden>
Date: Mon Feb 20 14:10:45 2017 -0500

    Stop nova-api before upgrading package

    If the service is running then the rpm upgrade will attempt to restart.
    Ensuring the service is stopped before upgrade should resolve this.

    Change-Id: I4179cb773616721640490d26082eacac45f92dff
    Closes-Bug: 1665717

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ocata)

Reviewed: https://review.openstack.org/436228
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=9014197a7a1af563fcee81e45b547747c6cea492
Submitter: Jenkins
Branch: stable/ocata

commit 9014197a7a1af563fcee81e45b547747c6cea492
Author: Oliver Walsh <email address hidden>
Date: Mon Feb 20 14:10:45 2017 -0500

    Stop nova-api before upgrading package

    If the service is running then the rpm upgrade will attempt to restart.
    Ensuring the service is stopped before upgrade should resolve this.

    Change-Id: I4179cb773616721640490d26082eacac45f92dff
    Closes-Bug: 1665717
    (cherry picked from commit 10ba1fa6068978d5779da4b3c6966d73e893a7e5)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 6.0.0.0rc2

This issue was fixed in the openstack/tripleo-heat-templates 6.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.0.0b1

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.