neutron services continuously restarted following upgrade

Bug #1893008 reported by Edward Hope-Morley
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Neutron API Charm
Fix Released
Critical
Edward Hope-Morley

Bug Description

Starting an action-managed upgrade of Neutron using the neutron-api charm appears to have triggered something that is causing the charm to continuously restart neutron-services which causing other neutron agents to fail. We see this happening over and over and have had to stop juju unit agents.

2020-08-26 02:36:01 DEBUG juju-log cluster:12: Restarting neutron services following db initialisation
2020-08-26 02:36:50 DEBUG juju-log cluster:12: Restarting neutron services following db initialisation
2020-08-26 02:37:27 DEBUG juju-log cluster:12: Restarting neutron services following db initialisation
2020-08-26 02:37:51 DEBUG juju-log cluster:12: Restarting neutron services following db initialisation
2020-08-26 02:38:16 DEBUG juju-log cluster:12: Restarting neutron services following db initialisation
2020-08-26 02:39:17 DEBUG juju-log cluster:12: Restarting neutron services following db initialisation
2020-08-26 02:39:42 DEBUG juju-log cluster:12: Restarting neutron services following db initialisation
2020-08-26 02:41:17 DEBUG juju-log cluster:12: Restarting neutron services following db initialisation
2020-08-26 02:41:47 DEBUG juju-log cluster:12: Restarting neutron services following db initialisation

This is using the 20.08 charms and upgrading from Rocky to Stein on Bionic

description: updated
Revision history for this message
Trent Lloyd (lathiat) wrote :

From an initial look it seems likely this is happening due to check_local_db_actions_complete() and it's subsequent call to is_new_dbinit_notification - that check has to fail for it to then "echo" the DB notification. Which then is received and it again fails the check that it's new and echoes it again.

But I didn't yet debug why that check is failing

See https://github.com/openstack/charm-neutron-api/blob/ec9304f50ea47d76592ab26fe522f1e582031565/hooks/neutron_api_utils.py#L292

Probably need to add some debugging of the compared values. My only guess was some kind of leadership confusion.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

The peer units appear to be ping-ponging information like this:

neutron-db-initialised: neutron-api/0-shared-db:65-d75be875-ceab-4a1f-9020-25900e7a0119
neutron-db-initialised-echo: neutron-api/2-shared-db:65-22e15b94-cc14-46d1-8ce2-196c505a929d

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Starting the units back up again re-introduced the issue.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

I think i might have isolated the issue, reading the code it looks like check_local_db_actions_complete() is called by any unit when it probably shouldn't be called by the leader since it will already have performed those tasks. A subsequent leader switch is resulting in this getting called continuously since the new leader has old relation data that it things is new.

Changed in charm-neutron-api:
milestone: none → 20.10
assignee: nobody → Edward Hope-Morley (hopem)
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-api (master)

Fix proposed to branch: master
Review: https://review.opendev.org/748143

Changed in charm-neutron-api:
status: New → In Progress
tags: added: backport-potential stable-backport
tags: added: sts
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-api (stable/20.08)

Fix proposed to branch: stable/20.08
Review: https://review.opendev.org/748146

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-api (master)

Reviewed: https://review.opendev.org/748143
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-api/commit/?id=104626a19f7edbdee04725ec502f74c54bbfedd2
Submitter: Zuul
Branch: master

commit 104626a19f7edbdee04725ec502f74c54bbfedd2
Author: Edward Hope-Morley <email address hidden>
Date: Wed Aug 26 10:42:40 2020 +0100

    Fix db init notifications

    Ensures that leader does not respond to db init
    notifications to avoid infitinite looping after
    leader switches to a different unit.

    Also ensures that leader only restarts its neutron-server
    once on db init.

    Closes-Bug: #1893008

    Change-Id: I59b9d5e0caab62b72380879bf16cb0fd8703bb32

Changed in charm-neutron-api:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-api (stable/20.08)

Reviewed: https://review.opendev.org/748146
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-api/commit/?id=e7c233c846f54606181fdb60d37ab9038fb1c032
Submitter: Zuul
Branch: stable/20.08

commit e7c233c846f54606181fdb60d37ab9038fb1c032
Author: Edward Hope-Morley <email address hidden>
Date: Wed Aug 26 10:42:40 2020 +0100

    Fix db init notifications

    Ensures that leader does not respond to db init
    notifications to avoid infitinite looping after
    leader switches to a different unit.

    Also ensures that leader only restarts its neutron-server
    once on db init.

    Closes-Bug: #1893008

    Change-Id: I59b9d5e0caab62b72380879bf16cb0fd8703bb32
    (cherry picked from commit 104626a19f7edbdee04725ec502f74c54bbfedd2)

Changed in charm-neutron-api:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.