remove-unit of nova-compute does not actually remove the nova service or the neutron agent

Bug #1317560 reported by Nicolas Thomas
52
This bug affects 9 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
Triaged
Medium
Unassigned
nova-compute (Juju Charms Collection)
Invalid
Medium
Unassigned

Bug Description

When removing units successfully from Juju points of you it does not appear in Horizon as expected.

How to reproduce : deploy an Icehouse on precise ... with several units. Check in horizon. remove units and check horizon nothing is done.

Expected result compute removed.

Revision history for this message
James Page (james-page) wrote :

Believe it or not this is quite tricky to fix as even if you terminate the service unit etc... nova will still have a record of it in its database.

That said, it won't be used for scheduling as it should be marked as down in some way.

Changed in nova-compute (Juju Charms Collection):
importance: Undecided → Medium
status: New → Triaged
tags: added: openstack
Revision history for this message
Liang Chen (cbjchen) wrote :

Navigating to http://<horizon-host>/horizon/admin/info/, under the "Compute Service" tab, the removed service would be marked as state down.

Revision history for this message
Nicolas Thomas (thomnico) wrote : Re: [Bug 1317560] Re: remove-unit of nova-compute does not actually remove

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I assume it will be the same if the server crashed.. whereas here the
operator made a conscious decision to remove a unit. Possible scenario
a rolling hardware upgrade.

On 03/12/2014 21:44, Liang Chen wrote:
> Navigating to http://<horizon-host>/horizon/admin/info/, under the
> "Compute Service" tab, the removed service would be marked as
> state down.
>

- --
Best Regards,
Nicolas Thomas - Solution Architect - Canonical
http://insights.ubuntu.com/?p=889
GPG FPR: D592 4185 F099 9031 6590 6292 492F C740 F03A 7EB9
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUhXFRAAoJEEkvx0DwOn65Tm4IAJ+uKo7zJDCzEZzVmr84UZf1
Jv/QHqbM45E4V15QUwLU8Ctid8yyGUJk6HR951DH6aWozh77IL0FzOoGd28ychVM
RgOioE+3yJmll7RyjD+cHSu7Ncw0SEeCg93D+XEOzWeDYoUxAQVd66Y1QYYAp1Jr
IAE2YAMWhPws3hGnkdFOGfdfoO9Zv5lgcmSK8hPfF/VXqjb8eiZPxhJWNrQjQALY
95yrf30ofVZrOHCkyJ6na/Kjw7OY9ncXo6+w6s6iaQdYb2xnbWXyWv+trlAJ65I8
4oKq9PQltFQD5byUMFIVfL32tKhg70EGQ9Xg4g9ISAdtD/3VMTjgNVPyxJfCrjA=
=WHfI
-----END PGP SIGNATURE-----

Liang Chen (cbjchen)
Changed in nova-compute (Juju Charms Collection):
assignee: nobody → Liang Chen (cbjchen)
Revision history for this message
Liang Chen (cbjchen) wrote : Re: remove-unit of nova-compute does not actually remove

Sure. Icehouse added a new API - os-extended-services-delete that seems to be useful one for this case.

Revision history for this message
Liang Chen (cbjchen) wrote :

But calling admin API from inside charms doesn't seem good. Need to find another way to do this.

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

Just a note on the operational side of this issue: current behavior
has actually saved us from having to migrate away hosted VMs for a
hardware upgrade.

Although not needed (we could have just taken down
nova-compute at the node whilst), we did remove-unit/add-unit
against the same MaaS'd host, and was great to see the hosted
VMs still there.

Revision history for this message
Liang Chen (cbjchen) wrote :

That's the reason nova only exposed enable/disable services before. os-extended-services-delete is used when retiring a compute node, through it doesn't destroy the hosted VMs either.

Revision history for this message
Liang Chen (cbjchen) wrote :

I submitted a patch to remove services from the nova-manage util - https://review.openstack.org/#/c/141105/. Once that gets merged, nova-compute charm can use a stop hook to completely wipe off the corresponding service.

Liang Chen (cbjchen)
Changed in nova-compute (Juju Charms Collection):
status: Triaged → In Progress
Revision history for this message
Nicolas Thomas (thomnico) wrote : Re: [Bug 1317560] Re: remove-unit of nova-compute does not actually remove

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

An expected behavior if live migration is enabled will be that the VMs
on that node get migrated before stopping.

My 2cents

On 11/12/2014 21:29, Liang Chen wrote:
> ** Changed in: nova-compute (Juju Charms Collection) Status:
> Triaged => In Progress
>

- --
Best Regards,
Nicolas Thomas - Solution Architect - Canonical
http://insights.ubuntu.com/?p=889
GPG FPR: D592 4185 F099 9031 6590 6292 492F C740 F03A 7EB9
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUispYAAoJEEkvx0DwOn65kHsH/2XHogDpe4F2Dbbw4JqI6fX5
o2aAZm+uyPDUN/hKdvXThaTBRAtO+LaV/JfhWziIJqx8NPf17Kgs8idODhVMKwwB
1g1L72sOErZPAeLDdIz1lzJ3Ra2Ia6xxEwvGyiXvZAw9uD7joMKvyTgaGggXDj+P
0I1OdQZyo7vtslJIBFUI9OEhEbK4JbprscfWDzjkFJEUjOrnqPXk/cOJIbD6V+xd
37ze3B/JxES/SWnyarANaTSmevAJEgVV3gmMAy7UTmfV5ikxH3lPWwncvqqAzNUr
fBK/GCM1Ysakdc3Od0VTqHG6Avh1WUTAQr9aciFrrDrsaV+4rVjWPJ6JmJfBgbc=
=OzZk
-----END PGP SIGNATURE-----

Revision history for this message
Liang Chen (cbjchen) wrote : Re: remove-unit of nova-compute does not actually remove

That seems good. And even if live migration is not enabled, the VMs can still be evacuated to other compute nodes. But let's now focus on the removal of the orphaned service records - doing similar thing as the os-extended-services-delete admin API does for this bug. We can file another bug/bp for the migration/evacuation feature, and handle it there.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Note that nova-compute nodes do not have direct db access (they go though nova-conductor) so a nova-manage operation would have to be performed by a service that does have direct access e.g. nova-cloud-controller.

Revision history for this message
Liang Chen (cbjchen) wrote :

Thank you very much for the reminder. You are absolutely right. The nova-manage operation will have to be performed at where the nova-conductor is deployed (nova-cloud-controller). That way, the database access can be ensured. So I am thinking of putting the nova-manage invocation to somewhere like the compute relation departure hook in nova-cloud-controller charm.

Changed in nova-compute (Juju Charms Collection):
assignee: Liang Chen (cbjchen) → nobody
status: In Progress → Triaged
James Page (james-page)
Changed in charm-nova-compute:
importance: Undecided → Medium
status: New → Triaged
Changed in nova-compute (Juju Charms Collection):
status: Triaged → Invalid
Revision history for this message
Junien F (axino) wrote :

This is still the case with pike and 17.11 charms. Neither the "nova-compute" nova service nor the Open vSwitch neutron agent are removed from the DB. My experience is that this causes problems when reinstalling a node.

summary: - remove-unit of nova-compute does not actually remove
+ remove-unit of nova-compute does not actually remove the nova service or
+ the neutron agent
Revision history for this message
Wouter van Bommel (woutervb) wrote :

Just hit this bug today on a cloud running 17.04 charms, quite annoying esp as the host was in multiple aggregates, and removed the host first, before realizing this. So that became quit a challenge.

Revision history for this message
Ramon Grullon (rgrullon) wrote :

Hit this issue on Bionic running Queens.

Had to go into the mysql db and search nova-api database until found node in host_mappings table and removed it.

tags: added: scaleback
Revision history for this message
Peter Matulis (petermatulis) wrote :

The first part of this bug (nova service) was fixed via bug 1691998.

tags: added: openstack-advocacy
Revision history for this message
macchese (max-liccardo) wrote :

this bug affects also openstack 2023.2 Bobcat

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi @macchese; when you say it still affects 2023.2 does that include both the nova-service as well as the neutron agent? (i.e. did you do the actions as indicated via bug 1691998 and in the charm guide? https://docs.openstack.org/charm-guide/latest/admin/ops-scale-back-nova-compute.html) Thanks.

Revision history for this message
macchese (max-liccardo) wrote :

sorry, I didn't.
I used remove-unit only and then some sql command to remove the hypervisor.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.