Config change causes 2 of 3 n-c-c units to mask all services

Bug #1930636 reported by Paul Goins
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
New
Undecided
Unassigned

Bug Description

This was encountered after upgrading charms to the latest version, then doing a stein->train upgrade followed by a train->ussuri upgrade. The issue did not manifest until the ussuri upgrade, and now appears to manifest on config change as simple as the debug/verbose flags.

Upon doing a config change, I end up with 2 of my 3 n-c-c units having all their nova services masked, as well as the haproxy service.

Running the pause action followed by the resume action is enough to re-enable the nova services, however the haproxy service requires a manual unmasking.

I suspect something got in a weird state in the local state DBs on the 2 "broken" units, but I do not have any idea what that may have been. I can pull DB flags to help determine what might be causing these services to get re-masked upon any config changes... I don't have time right now, but I wanted to make sure this gets filed so it does not get forgotten.

Revision history for this message
Liam Young (gnuoy) wrote :

Please can you attach the contents of /var/log/juju from the nova-cc units to this bug, thanks.

Changed in charm-nova-cloud-controller:
status: New → Incomplete
Revision history for this message
Paul Goins (vultaire) wrote :

As there may be sensitive details in the logs, I've sanitized what I could and have sent the logs internally to Liam for review.

Revision history for this message
Liam Young (gnuoy) wrote :
Download full text (3.2 KiB)

A few questions from looking at the logs:

* Please could you provide approximate times when each step was taken (charm upgrade, upgrade to train, upgrade to ussuri).
* Was the upgrade done via action managed upgrades ?
* Are the units still going into a masked state on config-changed events? If so the db flags would be very useful.

fwiw looking at the logs all I can see is the units initially masking their services waiting for the leader to signal that the db sync is done and then the services being unmasked when the signal arrives:

nova-cloud-controller/2

Unit starts
2021-06-07 23:49:45 INFO juju unit_agent.go:253 Starting unit workers for "nova-cloud-controller/2"

2021-06-07 23:52:59 INFO juju-log Disabling services into db relation joined

2021-06-08 00:04:58 INFO juju-log Loaded template from templates/train/nova.conf
2021-06-08 00:05:00 INFO juju-log Database sync not ready. Shutting down services
2021-06-08 00:10:50 INFO juju-log cluster:33: Database sync not ready. Shutting down services
2021-06-08 00:12:01 INFO juju-log cluster:33: Database sync not ready. Shutting down services

nova-cloud-controller/0
2021-06-08 00:14:43 INFO juju-log shared-db:189: Informing peers that dbsync is complete

nova-cloud-controller/2
2021-06-08 00:14:44 DEBUG jujuc server.go:211 running hook tool "juju-log" for nova-cloud-controller/2-leader-settings-changed-1071374215362069442
2021-06-08 00:14:49 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable apache2
2021-06-08 00:14:50 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable memcached
2021-06-08 00:14:51 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable nova-scheduler
2021-06-08 00:14:52 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable nova-spiceproxy
2021-06-08 00:14:54 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable haproxy
2021-06-08 00:14:55 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable nova-conductor
2021-06-08 00:15:08 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable nova-scheduler
2021-06-08 00:15:09 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable nova-spiceproxy
2021-06-08 00:15:10 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable memcached
2021-06-08 00:15:12 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable nova-conductor
2021-06-08 00:15:13 WARNING unit.nova-cloud-controller/2.leader-settings-changed logger.go:60 Executing: /lib/systemd/systemd-sysv-install enable haproxy
2021-06-08 00:15:14 WARNING unit.nova-cloud-controller/2.lead...

Read more...

tags: added: openstack-upgrade
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack nova-cloud-controller charm because there has been no activity for 60 days.]

Changed in charm-nova-cloud-controller:
status: Incomplete → Expired
Steven Parker (sbparke)
Changed in charm-nova-cloud-controller:
status: Expired → New
Revision history for this message
Steven Parker (sbparke) wrote :

I'm happy to provide the DB flags.
Just provide the commands that I would need to run.

managed upgrades was set to false
   action-managed-upgrade=false

Thanks

Revision history for this message
Nick DiLernia (ndilernia) wrote :

Running into this as well during Train to Ussuri Upgrades (Bionic).

Let me know if there's any info we can provide.

Revision history for this message
Nick DiLernia (ndilernia) wrote :

We did collect SOS reports on each unit that experienced the issue if it would be helpful.

After generating the SOS reports, we attempted setting action-managed-upgrade=true, letting juju settle, then action-managed-upgrade=false, but that didn't correct the issue either.

As a workaround, we ending up pausing the issue units so juju would mask the services then resuming them so it would unmask those services and that worked.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.