nova-compute fails to resume after upgrade from Queens to Rocky

Bug #1816299 reported by Vern Hart
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
Fix Released
Undecided
Corey Bryant

Bug Description

I am trying to upgrade a deployment from Queens to Rocky.
The process I'm following is:

  juju run-action nova-compute-kvm/12 --wait pause
  juju run-action nova-compute-kvm/12 --wait openstack-upgrade
  juju run-action nova-compute-kvm/12 --wait resume

The upgrade succeeds but the unit fails to resume, saying the nova-compute service isn't running.

The nova-compute service failed to restart with the same privsep helper error in nova-compute.log as mentioned in https://bugs.launchpad.net/charm-nova-compute/+bug/1802304.

Because the environment is offline and I don't have keyboard access, I've typed this from the screen. This is the bottom of the trace.

2019-02-14 15:56:01.849 43535 ERROR os_vif [req-179a3b10-306a-11e9-9dc6-02f2b9b313b8 - - - - -] Failed to plug via VIFOpenVSwitch(active=True,address=fa:ab:16:aa:2a:bb,bridge_name='br-int',has_traffic_filtering=True,id=41d0becc-306a-11e9-b22b-02f2b9b313b8,network=Network(4be5c3c6-306a-11e9-9fbf-02f2b9b313b8),plugin='ova',port_profile=VIFPortProfileOpenVSwtich,preserve_on_delete=False,vif_name='tap41d0becc-30'): oslo_privsep.daemon.FailedToDropPrivileges: privsep helper command exited non-zero (1)

I was able to recover from this by purging the python2 rootwrap and privsep packages before resuming the unit. The scripted upgrade looks something like this:

  juju run-action nova-compute-kvm/12 --wait pause
  juju run-action nova-compute-kvm/12 --wait openstack-upgrade
  juju ssh nova-compute-kvm/12 'sudo apt purge -y python-oslo.rootwrap python-oslo.privsep;
    sudo apt autoremove -y'
  juju run-action nova-compute-kvm/12 --wait resume
  # ceilometer-agent and neutron-openvswitch services break after the above apt commands
  # the units here are the units subordinate to the particular nova-compute unit
  juju run --unit ceilometer-agent/20,neutron-openvswitch-kvm/32 hooks/install

I thought this might be resolved by the fix for https://bugs.launchpad.net/charm-nova-compute/+bug/1802304 because we were running on old charms but the problem persists after updating nova-compute, ceilometer-agent, and neutron-openvswitch.

The revisions of those charms we're running are:

  e464bba nova-compute-292
  ee81e0e neutron-openvswitch-255
  6650284 ceilometer-agent-248

Tags: cpe-onsite
Revision history for this message
Alexander Litvinov (alitvinov) wrote :

I was able to reproduce this issue on my orangebox.
Attaching juju crashdump and the bundle to this bug.

Steps:
Deploy queens bundle with enable-dvr: true.
Create an instance.
Try upgrade (in this case unit where the instance is)

alex@xx:$ juju run-action nova-compute/2 --wait pause
unit-nova-compute-2:
  id: d30a0370-94f4-41ae-84e7-7abf3b37bbc4
  status: completed
  timing:
    completed: 2019-02-19 11:57:50 +0000 UTC

alex@xx:$ juju run-action nova-compute/2 --wait openstack-upgrade
unit-nova-compute-2:
  id: b93a1681-8408-4f5b-8a38-59cacdd10357
  results:
    outcome: success, upgrade completed.
  status: completed
  timing:
    completed: 2019-02-19 12:01:03 +0000 UTC

alex@xx :$ juju run-action nova-compute/2 --wait resume
unit-nova-compute-2:
  id: 9a339a2c-3802-4540-83a3-174bef2cbc2f
  message: 'Action resume failed: Couldn''t resume: Services not running that should
    be: nova-compute'
  status: failed
  timing:
    completed: 2019-02-19 12:01:33 +0000 UTC

cat nova-compute.log

2019-02-19 12:01:29.240 338412 INFO oslo.privsep.daemon [req-17603cb1-f634-41c2-9699-1b2ad109a143 - - - - -] Running privsep helper: ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/nova/nova.conf', '--config-file', '/etc/nova/nova-compute.conf', '--privsep_context', 'vif_plug_ovs.privsep.vif_plug', '--privsep_sock_path', '/tmp/tmpi55osjtc/privsep.sock']
2019-02-19 12:01:29.788 338412 WARNING oslo.privsep.daemon [-] privsep log: Deprecated: Option "logdir" from group "DEFAULT" is deprecated. Use option "log-dir" from group "DEFAULT".
2019-02-19 12:01:29.880 338412 CRITICAL oslo.privsep.daemon [req-17603cb1-f634-41c2-9699-1b2ad109a143 - - - - -] privsep helper command exited non-zero (1)
2019-02-19 12:01:29.880 338412 ERROR os_vif [req-17603cb1-f634-41c2-9699-1b2ad109a143 - - - - -] Failed to plug vif VIFBridge(active=True,address=fa:16:3e:3e:bd:25,bridge_name='qbr2b2c956a-cb',has_traffic_filtering=True,id=2b2c956a-cb49-4131-b2c8-fb124d775c73,network=Network(874247e5-fa3a-4421-8986-d6f9bb8f13b4),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap2b2c956a-cb'): oslo_privsep.daemon.FailedToDropPrivileges: privsep helper command exited non-zero (1)

Revision history for this message
Alexander Litvinov (alitvinov) wrote :
tags: added: cpe-onsite
Revision history for this message
Alexander Litvinov (alitvinov) wrote :
Revision history for this message
Alexander Litvinov (alitvinov) wrote :

Purging the python2 rootwrap and privsep packages before resuming the unit (around 12:10 in logs) also recovers the unit.

Revision history for this message
Alexander Litvinov (alitvinov) wrote :

Subscribing field-high as this issue is blocking user acceptance testing of our cloud
(fully working upgrade)

Revision history for this message
Alexander Litvinov (alitvinov) wrote :

Actually same behaviour with enable-dvr : false.
SO dvr is unrelated to this particular problem

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hi Vern,

Something is wrong if you have py2 packages remaining after upgrading with the charms to rocky. The rocky charms should be purging all py2 openstack packages.

Thanks,
Corey

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Vern, can you make sure you have these commits in your version of the charm? There are 3 patches here (easiest to find the top of them by searching for "from:")

http://paste.ubuntu.com/p/ysVvWh57BH/

Revision history for this message
Vern Hart (vern) wrote : Re: [Bug 1816299] Re: nova-compute fails to resume after upgrade from Queens to Rocky
Download full text (3.4 KiB)

I have a copy of the charms we most recently deployed, let me check.

First patch, remove_old_packages: check.
Second, determine_held_packages: check.
Third, determine_purge_packages: check.

This is charm version nova-compute-292, btw.

Vern

On Tue, Feb 19, 2019 at 8:05 PM Corey Bryant <email address hidden>
wrote:

> Vern, can you make sure you have these commits in your version of the
> charm? There are 3 patches here (easiest to find the top of them by
> searching for "from:")
>
> http://paste.ubuntu.com/p/ysVvWh57BH/
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1816299
>
> Title:
> nova-compute fails to resume after upgrade from Queens to Rocky
>
> Status in OpenStack nova-compute charm:
> New
>
> Bug description:
> I am trying to upgrade a deployment from Queens to Rocky.
> The process I'm following is:
>
> juju run-action nova-compute-kvm/12 --wait pause
> juju run-action nova-compute-kvm/12 --wait openstack-upgrade
> juju run-action nova-compute-kvm/12 --wait resume
>
> The upgrade succeeds but the unit fails to resume, saying the nova-
> compute service isn't running.
>
> The nova-compute service failed to restart with the same privsep
> helper error in nova-compute.log as mentioned in
> https://bugs.launchpad.net/charm-nova-compute/+bug/1802304.
>
> Because the environment is offline and I don't have keyboard access,
> I've typed this from the screen. This is the bottom of the trace.
>
> 2019-02-14 15:56:01.849 43535 ERROR os_vif [req-179a3b10-306a-
> 11e9-9dc6-02f2b9b313b8 - - - - -] Failed to plug via
> VIFOpenVSwitch(active=True,address=fa:ab:16:aa:2a:bb,bridge_name='br-
> int',has_traffic_filtering=True,id=41d0becc-306a-11e9-b22b-
> 02f2b9b313b8,network=Network(4be5c3c6-306a-11e9-9fbf-
>
> 02f2b9b313b8),plugin='ova',port_profile=VIFPortProfileOpenVSwtich,preserve_on_delete=False,vif_name='tap41d0becc-30'):
> oslo_privsep.daemon.FailedToDropPrivileges: privsep helper command
> exited non-zero (1)
>
> I was able to recover from this by purging the python2 rootwrap and
> privsep packages before resuming the unit. The scripted upgrade looks
> something like this:
>
> juju run-action nova-compute-kvm/12 --wait pause
> juju run-action nova-compute-kvm/12 --wait openstack-upgrade
> juju ssh nova-compute-kvm/12 'sudo apt purge -y python-oslo.rootwrap
> python-oslo.privsep;
> sudo apt autoremove -y'
> juju run-action nova-compute-kvm/12 --wait resume
> # ceilometer-agent and neutron-openvswitch services break after the
> above apt commands
> # the units here are the units subordinate to the particular
> nova-compute unit
> juju run --unit ceilometer-agent/20,neutron-openvswitch-kvm/32
> hooks/install
>
> I thought this might be resolved by the fix for
> https://bugs.launchpad.net/charm-nova-compute/+bug/1802304 because we
> were running on old charms but the problem persists after updating
> nova-compute, ceilometer-agent, and neutron-openvswitch.
>
> The revisions of those charms we're running are:
>
> e464bba nova-compute-292
> ee81e0e neutron-o...

Read more...

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Vern, thanks for checking.

Some further debugging on my own deployment.

I added some "CCB" debugs and confirmed that when py2 packages are removed, python-nova is being removed but python-oslo.privsep is not being removed:
https://paste.ubuntu.com/p/shCZbbjVpZ/
https://paste.ubuntu.com/p/gKSYY72MBn/

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Ok I think this is the problem, and I thought we had this fixed but maybe we missed something here. Subordinate charms (neutron-openvswitch) should be handled properly on upgrade, ie. they should be migrated to py2 rocky packages as well and py2 packages should be removed.

https://paste.ubuntu.com/p/PBmQQqfJ5p/

Revision history for this message
Corey Bryant (corey.bryant) wrote :

that should say "they should be migrated to py3 rocky packages as well and py2 packages should be removed"

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I think the fix here is ensure all subordinates (such as neutron-openvswitch and ceilometer-agent) are moved to py3 packages and their py2 packages removed. Similar to:

https://github.com/openstack/charm-nova-compute/commit/7f15c2f5ce230ec3d1724d924a164e6a49b8584a

Revision history for this message
Corey Bryant (corey.bryant) wrote :
Changed in charm-nova-compute:
assignee: nobody → Corey Bryant (corey.bryant)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.openstack.org/638160
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=32ef5b4cca70f184a5652de234682ecf5413427c
Submitter: Zuul
Branch: master

commit 32ef5b4cca70f184a5652de234682ecf5413427c
Author: Corey Bryant <email address hidden>
Date: Tue Feb 19 20:15:30 2019 +0000

    py3: deal with more subordinate dependencies

    Ensure subordinate py3 packages are installed if their py2
    counter-parts are currently installed for neutron-openvswitch
    and ceilometer-agent.

    Change-Id: I940fb2ce9d671e919c817cee7adda2ac22ecb3fb
    Closes-Bug: #1816299

Changed in charm-nova-compute:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/18.11)

Fix proposed to branch: stable/18.11
Review: https://review.openstack.org/638203

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/18.11)

Reviewed: https://review.openstack.org/638203
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=685a2e54e14fc18b65fdcc2ad78ab156bc1ea90e
Submitter: Zuul
Branch: stable/18.11

commit 685a2e54e14fc18b65fdcc2ad78ab156bc1ea90e
Author: Corey Bryant <email address hidden>
Date: Tue Feb 19 20:15:30 2019 +0000

    py3: deal with more subordinate dependencies

    Ensure subordinate py3 packages are installed if their py2
    counter-parts are currently installed for neutron-openvswitch
    and ceilometer-agent.

    Change-Id: I940fb2ce9d671e919c817cee7adda2ac22ecb3fb
    Closes-Bug: #1816299
    (cherry picked from commit 32ef5b4cca70f184a5652de234682ecf5413427c)

Revision history for this message
Alexander Litvinov (alitvinov) wrote :

I checked with next-charm (nova-compute 423) and problem seems resolved.
After upgrade to rocky unit can successfully resume and in idle state.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Alexander, thanks very much for giving it a test!

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This is fixed in the stable/18.11 branch and is now available in the charms store.

Changed in charm-nova-compute:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.