There is a possibility that 'running' notification will remain
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Masakari Charm |
Fix Released
|
Undecided
|
Billy Olsen | ||
Ubuntu Cloud Archive |
Fix Released
|
Medium
|
Billy Olsen | ||
Stein |
Fix Released
|
Medium
|
Billy Olsen | ||
Train |
Fix Released
|
Medium
|
Billy Olsen | ||
Ussuri |
Fix Released
|
Medium
|
Billy Olsen | ||
Victoria |
Fix Released
|
Medium
|
Billy Olsen | ||
masakari |
Fix Released
|
Medium
|
suzhengwei | ||
Stein |
Fix Released
|
Medium
|
Unassigned | ||
Train |
Fix Released
|
Medium
|
Unassigned | ||
Ussuri |
Fix Released
|
Medium
|
Unassigned | ||
Victoria |
Fix Released
|
Medium
|
suzhengwei | ||
masakari (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Focal |
Fix Released
|
High
|
Unassigned | ||
Groovy |
Fix Released
|
High
|
Unassigned |
Bug Description
[Impact]
masakari-engine has two periodic tasks, one for processing 'new' notifications and the other for processing 'error' notifications But it doesn't have a periodic task for processing 'running' notifications.
Looking at the code of masakari-engine, if the process of it goes down immediately after it changes notification status from 'new' to 'running', then the notification which status is 'running' will remain will not be processed by periodic tasks.
So, should masakari-engine's periodic task process the 'running' notification?
(Although it need to make such a logic that main process doesn't compete with periodic tasks.)
Or should the 'running' notification be handled by the operator?
[Test Case]
lxc launch ubuntu-daily:groovy g1 (or other corresponding release combination)
lxc exec g1 /bin/bash
sudo apt install masakari-engine
== expect test failure with old code ==
setup:
* copy new test code from patch to /usr/lib/
* modify /usr/lib/
test:
* cd /usr/lib/
* python3 -m unittest masakari.
== expect test success with patched code ==
setup: enable corresponding -proposed pocket
test:
* cd /usr/lib/
* python3 -m unittest masakari.
[Regression Potential]
A regression in this code could occur if either of the time intervals were calculated incorrectly which means a notification could be marked as failed perhaps long before the expiration interval. The defaults can be changed for check_expired_
Changed in masakari: | |
status: | New → In Progress |
Changed in masakari: | |
status: | In Progress → Fix Committed |
Changed in masakari: | |
status: | Fix Committed → Fix Released |
Changed in masakari (Ubuntu Focal): | |
importance: | Undecided → Medium |
Changed in masakari (Ubuntu Groovy): | |
importance: | Undecided → Medium |
description: | updated |
description: | updated |
Changed in masakari (Ubuntu Focal): | |
importance: | Medium → High |
status: | New → Triaged |
Changed in masakari (Ubuntu Groovy): | |
importance: | Medium → High |
status: | New → Triaged |
Changed in cloud-archive: | |
status: | In Progress → Fix Committed |
Changed in charm-masakari: | |
assignee: | nobody → Billy Olsen (billy-olsen) |
Changed in charm-masakari: | |
milestone: | none → 20.10 |
status: | Fix Committed → Fix Released |
Hit this bug yesterday, I was unable to reset my compute node status using
`openstack segment host update $SEGMENT_UUID node01 --on_maintenance False`.
After much digging I noticed that there was still a notification with "running".
I worked around this by logging into the DB and deleting the running record from the notifications table.