There is a possibility that 'running' notification will remain

Bug #1773765 reported by takahara.kengo on 2018-05-28
44
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Masakari Charm
Undecided
Billy Olsen
Ubuntu Cloud Archive
Status tracked in Victoria
Stein
Medium
Billy Olsen
Train
Medium
Billy Olsen
Ussuri
Medium
Billy Olsen
Victoria
Medium
Billy Olsen
masakari
Status tracked in Victoria
Stein
Medium
Unassigned
Train
Medium
Unassigned
Ussuri
Medium
Unassigned
Victoria
Medium
suzhengwei
masakari (Ubuntu)
High
Unassigned
Focal
High
Unassigned
Groovy
High
Unassigned

Bug Description

[Impact]
masakari-engine has two periodic tasks, one for processing 'new' notifications and the other for processing 'error' notifications But it doesn't have a periodic task for processing 'running' notifications.

Looking at the code of masakari-engine, if the process of it goes down immediately after it changes notification status from 'new' to 'running', then the notification which status is 'running' will remain will not be processed by periodic tasks.

So, should masakari-engine's periodic task process the 'running' notification?
(Although it need to make such a logic that main process doesn't compete with periodic tasks.)
Or should the 'running' notification be handled by the operator?

[Test Case]
lxc launch ubuntu-daily:groovy g1 (or other corresponding release combination)
lxc exec g1 /bin/bash
sudo apt install masakari-engine

== expect test failure with old code ==
setup:
* copy new test code from patch to /usr/lib/python3/dist-packages/masakari/tests/unit/engine/test_engine_mgr.py
* modify /usr/lib/python3/dist-packages/masakari/tests/unit/engine/test_engine_mgr.py to set EXPIRED_TIME = NOW, and comment out call to test_check_expired_notifications since it doesn't exist without new patch applied.
test:
* cd /usr/lib/python3/dist-packages
* python3 -m unittest masakari.tests.unit.engine.test_engine_mgr.EngineManagerUnitTestCase.test_check_expired_notifications

== expect test success with patched code ==
setup: enable corresponding -proposed pocket
test:
* cd /usr/lib/python3/dist-packages
* python3 -m unittest masakari.tests.unit.engine.test_engine_mgr.EngineManagerUnitTestCase.test_check_expired_notifications

[Regression Potential]
A regression in this code could occur if either of the time intervals were calculated incorrectly which means a notification could be marked as failed perhaps long before the expiration interval. The defaults can be changed for check_expired_notifications_interval and notifications_expired_interval via masakari-engine config options which helps mitigate that risk. The defaults seem reasonable with a 10 minute periodic check and setting expired notifications to failed status only after 24 hours.

Peter De Sousa (pjds) wrote :

Hit this bug yesterday, I was unable to reset my compute node status using

`openstack segment host update $SEGMENT_UUID node01 --on_maintenance False`.

After much digging I noticed that there was still a notification with "running".

I worked around this by logging into the DB and deleting the running record from the notifications table.

suzhengwei (sue.sam) wrote :

I met the same problem recently.

I think there are two solutions.
1.Add one Notification API, which allow the operator to change one notification's state.
2.As suggested by takahara.kengo, in masakari-engine, we can add one periodic task to check runing notification. If process time of the notification longer than config time, force set it state to 'failed'.

Changed in masakari:
assignee: nobody → suzhengwei (sue.sam)
suzhengwei (sue.sam) on 2020-04-17
Changed in masakari:
status: New → In Progress
Nobuto Murata (nobuto) wrote :

Reviewed: https://review.opendev.org/720623
Committed: https://git.openstack.org/cgit/openstack/masakari/commit/?id=3a4f782441f6bdcfb8ee49a393937267fc246c56
Submitter: Zuul
Branch: master

commit 3a4f782441f6bdcfb8ee49a393937267fc246c56
Author: suzhengwei <sugar-2008@163.com>
Date: Fri Apr 17 08:59:35 2020 +0800

    check expired notifications

    Occasionally, there would be notifications which will remain 'new',
    'error' or 'running' status all times, and not to be processed again.
    Due to this, operator can not update the segment or host.

    This patch add one task to periodically check unfinished notifications.
    If one unfinished notification is expired, just set its status to
    'failed'.

    Close-Bug: #1773765
    Change-Id: If49635639dd976aeec3ea73e702ad2636fcf1e0a

suzhengwei (sue.sam) on 2020-09-24
Changed in masakari:
status: In Progress → Fix Committed
Changed in masakari:
status: Fix Committed → Fix Released

Reviewed: https://review.opendev.org/753927
Committed: https://git.openstack.org/cgit/openstack/masakari/commit/?id=cd05dc31bb8c4ce529dac4698dd7bb80866a4139
Submitter: Zuul
Branch: stable/stein

commit cd05dc31bb8c4ce529dac4698dd7bb80866a4139
Author: suzhengwei <sugar-2008@163.com>
Date: Fri Apr 17 08:59:35 2020 +0800

    check expired notifications

    Occasionally, there would be notifications which will remain 'new',
    'error' or 'running' status all times, and not to be processed again.
    Due to this, operator can not update the segment or host.

    This patch add one task to periodically check unfinished notifications.
    If one unfinished notification is expired, just set its status to
    'failed'.

    Conflicts:
            masakari/tests/unit/engine/test_engine_mgr.py
            - difference of import mock and import unittest.mock

    Closes-Bug: #1773765
    Change-Id: If49635639dd976aeec3ea73e702ad2636fcf1e0a
    (cherry picked from commit 3a4f782441f6bdcfb8ee49a393937267fc246c56)

tags: added: in-stable-stein
tags: added: in-stable-train

Reviewed: https://review.opendev.org/753926
Committed: https://git.openstack.org/cgit/openstack/masakari/commit/?id=e0423906a7aabb4ea02b754e041f725c16535be5
Submitter: Zuul
Branch: stable/train

commit e0423906a7aabb4ea02b754e041f725c16535be5
Author: suzhengwei <sugar-2008@163.com>
Date: Fri Apr 17 08:59:35 2020 +0800

    check expired notifications

    Occasionally, there would be notifications which will remain 'new',
    'error' or 'running' status all times, and not to be processed again.
    Due to this, operator can not update the segment or host.

    This patch add one task to periodically check unfinished notifications.
    If one unfinished notification is expired, just set its status to
    'failed'.

    Conflicts:
            masakari/tests/unit/engine/test_engine_mgr.py
            - Update test for difference of unittest.mock and mock

    Closes-Bug: #1773765
    Change-Id: If49635639dd976aeec3ea73e702ad2636fcf1e0a
    (cherry picked from commit 3a4f782441f6bdcfb8ee49a393937267fc246c56)

tags: added: in-stable-ussuri

Reviewed: https://review.opendev.org/753921
Committed: https://git.openstack.org/cgit/openstack/masakari/commit/?id=b53155b314068a46cd097d3788add1b6d4fc0aff
Submitter: Zuul
Branch: stable/ussuri

commit b53155b314068a46cd097d3788add1b6d4fc0aff
Author: suzhengwei <sugar-2008@163.com>
Date: Fri Apr 17 08:59:35 2020 +0800

    check expired notifications

    Occasionally, there would be notifications which will remain 'new',
    'error' or 'running' status all times, and not to be processed again.
    Due to this, operator can not update the segment or host.

    This patch add one task to periodically check unfinished notifications.
    If one unfinished notification is expired, just set its status to
    'failed'.

    Close-Bug: #1773765
    Change-Id: If49635639dd976aeec3ea73e702ad2636fcf1e0a
    (cherry picked from commit 3a4f782441f6bdcfb8ee49a393937267fc246c56)

Billy Olsen (billy-olsen) wrote :
Billy Olsen (billy-olsen) wrote :
Billy Olsen (billy-olsen) wrote :
Billy Olsen (billy-olsen) wrote :

The attachment "groovy victoria patch" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch

Groovy should really just use the newly-released RC1 of Victoria.

Billy Olsen (billy-olsen) wrote :

Agreed that Groovy should just use the newly-released RC1.

Mathew Hodson (mhodson) on 2020-10-03
Changed in masakari (Ubuntu Focal):
importance: Undecided → Medium
Changed in masakari (Ubuntu Groovy):
importance: Undecided → Medium
description: updated
description: updated
Changed in masakari (Ubuntu Focal):
importance: Medium → High
status: New → Triaged
Changed in masakari (Ubuntu Groovy):
importance: Medium → High
status: New → Triaged
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package masakari - 10.0.0~rc1-0ubuntu1

---------------
masakari (10.0.0~rc1-0ubuntu1) groovy; urgency=medium

  [ Chris MacNaughton ]
  * d/control: Update VCS paths for move to lp:~ubuntu-openstack-dev.

  [ Corey Bryant ]
  * d/watch: Scope to 10.x series.
  * New upstream release candidate for OpenStack Victoria (LP: #1773765).
  * d/p/monkey-patch-original-current-thread.patch: Dropped. Fixed in rc1.
  * d/p/allow-bare-hostnames.patch: Dropped. Alternative patch merged in rc1.

 -- Corey Bryant <email address hidden> Thu, 08 Oct 2020 13:20:37 -0400

Changed in masakari (Ubuntu Groovy):
status: Triaged → Fix Released
Changed in cloud-archive:
status: In Progress → Fix Committed

Hello takahara.kengo, or anyone else affected,

Accepted masakari into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/masakari/9.0.0-0ubuntu0.20.04.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in masakari (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-focal
Corey Bryant (corey.bryant) wrote :

Hello takahara.kengo, or anyone else affected,

Accepted masakari into ussuri-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ussuri-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ussuri-needed to verification-ussuri-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ussuri-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ussuri-needed
Corey Bryant (corey.bryant) wrote :

Hello takahara.kengo, or anyone else affected,

Accepted masakari into train-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:train-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-train-needed to verification-train-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-train-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-train-needed
Corey Bryant (corey.bryant) wrote :

Hello takahara.kengo, or anyone else affected,

Accepted masakari into stein-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:stein-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-stein-needed to verification-stein-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-stein-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-stein-needed
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package masakari - 10.0.0~rc1-0ubuntu1~cloud0
---------------

 masakari (10.0.0~rc1-0ubuntu1~cloud0) focal-victoria; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 masakari (10.0.0~rc1-0ubuntu1) groovy; urgency=medium
 .
   [ Chris MacNaughton ]
   * d/control: Update VCS paths for move to lp:~ubuntu-openstack-dev.
 .
   [ Corey Bryant ]
   * d/watch: Scope to 10.x series.
   * New upstream release candidate for OpenStack Victoria (LP: #1773765).
   * d/p/monkey-patch-original-current-thread.patch: Dropped. Fixed in rc1.
   * d/p/allow-bare-hostnames.patch: Dropped. Alternative patch merged in rc1.

Changed in cloud-archive:
status: Fix Committed → Fix Released
Changed in charm-masakari:
assignee: nobody → Billy Olsen (billy-olsen)
Corey Bryant (corey.bryant) wrote :
Download full text (10.5 KiB)

Verified successfully for all corresponding proposed package versions:

== focal-proposed ==
root@f1:/usr/lib/python3/dist-packages# apt policy masakari-engine
masakari-engine:
  Installed: 9.0.0-0ubuntu0.20.04.3
  Candidate: 9.0.0-0ubuntu0.20.04.3
  Version table:
 *** 9.0.0-0ubuntu0.20.04.3 500
        500 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
root@f1:~# cd /usr/lib/python3/dist-packages
root@f1:/usr/lib/python3/dist-packages# python3 -m unittest masakari.tests.unit.engine.test_engine_mgr.EngineManagerUnitTestCase.test_check_expired_notifications
2020-10-26 18:39:39.505 4505 INFO masakari.engine.driver [-] Loading masakari notification driver 'taskflow_driver'
/usr/lib/python3/dist-packages/taskflow/atom.py:31: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  _sequence_types = (list, tuple, collections.Sequence)
/usr/lib/python3/dist-packages/taskflow/atom.py:32: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  _set_types = (set, collections.Set)
/usr/lib/python3/dist-packages/masakari/context.py:92: DeprecationWarning: Using the 'user' argument is deprecated in version '2.18' and will be removed in version '3.0', please use the 'user_id' argument instead
  super(RequestContext, self).__init__(
/usr/lib/python3/dist-packages/debtcollector/renames.py:43: DeprecationWarning: Using the 'tenant' argument is deprecated in version '2.18' and will be removed in version '3.0', please use the 'project_id' argument instead
  return wrapped(*args, **kwargs)
/usr/lib/python3/dist-packages/debtcollector/renames.py:43: DeprecationWarning: Using the 'domain' argument is deprecated in version '2.18' and will be removed in version '3.0', please use the 'domain_id' argument instead
  return wrapped(*args, **kwargs)
/usr/lib/python3/dist-packages/debtcollector/renames.py:43: DeprecationWarning: Using the 'user_domain' argument is deprecated in version '2.18' and will be removed in version '3.0', please use the 'user_domain_id' argument instead
  return wrapped(*args, **kwargs)
/usr/lib/python3/dist-packages/debtcollector/renames.py:43: DeprecationWarning: Using the 'project_domain' argument is deprecated in version '2.18' and will be removed in version '3.0', please use the 'project_domain_id' argument instead
  return wrapped(*args, **kwargs)
2020-10-26 18:39:40.122 4505 ERROR masakari.engine.manager [req-03b9518f-a271-4d8c-aeee-f6c3d3252a09 - - - - -] Periodic task 'check_expired_notifications': Notification a03ac64b-a29e-474f-b94c-77767454b323 is expired.
.
----------------------------------------------------------------------
Ran 1 test in 0.917s

OK

== ussuri-proposed ==
root@b1:~# apt policy masakari-engine
masakari-engine:
  Installed: 9.0.0-0ubuntu0.20.04.3~cloud0
  Candidate: 9.0.0-0ubuntu0.20.04.3~cloud0
  Version table:
 *** 9.0.0-0ubuntu0.20.04.3~cloud0 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-proposed/ussuri/main amd64 Pa...

tags: added: verification-done verification-done-focal verification-stein-done verification-train-done verification-ussuri-done
removed: verification-needed verification-needed-focal verification-stein-needed verification-train-needed verification-ussuri-needed

The verification of the Stable Release Update for masakari has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package masakari - 9.0.0-0ubuntu0.20.04.3

---------------
masakari (9.0.0-0ubuntu0.20.04.3) focal; urgency=medium

  [ Chris MacNaughton ]
  * d/control: Update VCS paths for move to lp:~ubuntu-openstack-dev.

  [ Billy Olsen ]
  * Check expired notifications and clean running notifications past an expiration
    to allow for a host to become manageable again (LP: #1773765).
    - d/p/check-expired-notifications.patch: Adds a periodic task to check for running
      notifications that have exceeded a threshold and removes them if necessary.

 -- Billy Olsen <email address hidden> Wed, 23 Sep 2020 20:19:36 -0700

Changed in masakari (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers