very slow juju unit removals for nova compute charm

Bug #1926189 reported by Steven Parker
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
New
Undecided
Unassigned

Bug Description

Juju charm version
nova-compute-kvm nova-compute jujucharms 314 ubuntu

Deleting the only two units on a hyperconverged node (not including subordinate charms)
juju remove-unit nova-compute-kvm/#
juju remove-unit ceph-osd-ssd/#

All other charms remove within 45 minutes or so but nova compute is taking longer then 3 hrs to remove.
This cloud has a total of 43 compute nodes.

The nodes are 40 cores with about 1T or RAM so node performance is not a concern

I was able to get the entire removal process in the units juju logs and have attached that for review.

----

From what I can tell for example the ceph-mon relation (unit ceph-mon/6)
Triggers more then once

On nova compute unit
2021-03-29 13:40:40 DEBUG juju.machinelock machinelock.go:172 machine lock acquired for nova-compute-kvm/52 uniter (run relation-changed (96; unit: ceph-mon/6) hook)
2021-03-29 13:40:40 DEBUG juju.worker.uniter.operation executor.go:132 preparing operation "run relation-changed (96; unit: ceph-mon/6) hook" for nova-compute-kvm/52
2021-03-29 13:40:45 DEBUG juju.worker.uniter.operation executor.go:132 executing operation "run relation-changed (96; unit: ceph-mon/6) hook" for nova-compute-kvm/52
2021-03-29 13:41:12 DEBUG juju.worker.uniter.operation executor.go:132 committing operation "run relation-changed (96; unit: ceph-mon/6) hook" for nova-compute-kvm/52
2021-03-29 13:41:21 DEBUG juju.machinelock machinelock.go:186 machine lock released for nova-compute-kvm/52 uniter (run relation-changed (96; unit: ceph-mon/6) hook)
2021-03-29 15:29:08 DEBUG juju.worker.uniter.operation executor.go:85 running operation run relation-changed (96; unit: ceph-mon/6) hook for nova-compute-kvm/52
2021-03-29 15:29:08 DEBUG juju.machinelock machinelock.go:162 acquire machine lock for nova-compute-kvm/52 uniter (run relation-changed (96; unit: ceph-mon/6) hook)
2021-03-29 15:29:08 DEBUG juju.machinelock machinelock.go:172 machine lock acquired for nova-compute-kvm/52 uniter (run relation-changed (96; unit: ceph-mon/6) hook)
2021-03-29 15:29:08 DEBUG juju.worker.uniter.operation executor.go:132 preparing operation "run relation-changed (96; unit: ceph-mon/6) hook" for nova-compute-kvm/52
2021-03-29 15:29:10 DEBUG juju.worker.uniter.operation executor.go:132 executing operation "run relation-changed (96; unit: ceph-mon/6) hook" for nova-compute-kvm/52
2021-03-29 15:29:32 DEBUG juju.worker.uniter.operation executor.go:132 committing operation "run relation-changed (96; unit: ceph-mon/6) hook" for nova-compute-kvm/52
2021-03-29 15:29:38 DEBUG juju.machinelock machinelock.go:186 machine lock released for nova-compute-kvm/52 uniter (run relation-changed (96; unit: ceph-mon/6) hook)
2021-03-29 19:21:37 DEBUG juju.worker.uniter.operation executor.go:85 running operation run relation-departed (96; unit: ceph-mon/6, departee: nova-compute-kvm/52) hook for nova-compute-kvm/52
2021-03-29 19:21:37 DEBUG juju.machinelock machinelock.go:162 acquire machine lock for nova-compute-kvm/52 uniter (run relation-departed (96; unit: ceph-mon/6, departee: nova-compute-kvm/52) hook)
2021-03-29 19:21:37 DEBUG juju.machinelock machinelock.go:172 machine lock acquired for nova-compute-kvm/52 uniter (run relation-departed (96; unit: ceph-mon/6, departee: nova-compute-kvm/52) hook)
2021-03-29 19:21:37 DEBUG juju.worker.uniter.operation executor.go:132 preparing operation "run relation-departed (96; unit: ceph-mon/6, departee: nova-compute-kvm/52) hook" for nova-compute-kvm/52
2021-03-29 19:21:38 DEBUG juju.worker.uniter.operation executor.go:132 executing operation "run relation-departed (96; unit: ceph-mon/6, departee: nova-compute-kvm/52) hook" for nova-compute-kvm/52
2021-03-29 19:21:39 DEBUG juju.worker.uniter.operation executor.go:132 committing operation "run relation-departed (96; unit: ceph-mon/6, departee: nova-compute-kvm/52) hook" for nova-compute-kvm/52
2021-03-29 19:21:41 DEBUG juju.machinelock machinelock.go:186 machine lock released for nova-compute-kvm/52 uniter (run relation-departed (96; unit: ceph-mon/6, departee: nova-compute-kvm/52) hook)

One ceph units 6
on ceph unit I see a good number of relation changed events
ubuntu@juju-45c58f-82-lxd-17:/var/log/juju$ zgrep relation-changed unit-ceph-mon-6-2021-03-29T04-25-38.717.log.gz | wc -l
20050
2021-03-26 15:44:16 DEBUG jujuc server.go:211 running hook tool "relation-get" for ceph-mon/6-client-relation-changed-5074781327933005362
--
zgrep relation-joined unit-ceph-mon-6-2021-03-29T04-25-38.717.log.gz | wc -l
9327
2021-03-26 14:16:50 DEBUG jujuc server.go:211 running hook tool "relation-get" for ceph-mon/6-osd-relation-joined-5545706669755477128

On the actual nova compute node we have these counts for various hooks that are run.

fgrep -c relation-joined *.log
unit-ceph-osd-ssd-8.log:0
unit-clamav-37.log:9
unit-filebeat-1039.log:0
unit-hw-health-93.log:0
unit-landscape-client-1018.log:40
unit-lldpd-818.log:9
unit-neutron-openvswitch-652.log:0
unit-nova-compute-kvm-52.log:1330 <<<---
unit-nrpe-host-697.log:142
unit-nrpe-host-707.log:145
unit-ntp-832.log:679

relation changed
unit-neutron-openvswitch-652.log:157
unit-nova-compute-kvm-52.log:7650 << -- higher
unit-nrpe-host-697.log:206
unit-nrpe-host-707.log:204
unit-ntp-832.log:679

relation-departed
unit-nova-compute-kvm-52.log:1528
unit-nrpe-host-697.log:26
unit-nrpe-host-707.log:27
unit-ntp-832.log:4727 <<-- bigger for ntp

config-changed
unit-nova-compute-kvm-52.log:0
unit-nrpe-host-697.log:111
unit-nrpe-host-707.log:98
unit-ntp-832.log:0

Tags: scaleback
Revision history for this message
Steven Parker (sbparke) wrote :

I have attached the nova log which contains the removal logs.

tags: added: scaleback
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.