Constant exceptions "NotFound: Basic.consume: (404) NOT_FOUND - no queue abc in vhost '/'" in log
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.messaging |
Fix Released
|
Undecided
|
Kirill Bespalov |
Bug Description
Version: 9.0
Steps to reproduce:
1. Deploy environment MOS environment.
2. Run some tests on it (exact cause is unknown yet)
Expected results:
All logs are clean
Actual results:
In one of OpenStack components log you find a lot of exceptions like
NotFound: Basic.consume: (404) NOT_FOUND - no queue 'reply_
(full stack trace from neutron-server logs - http://
It happens due to the next HA race condition:
(1) A cluster consists of two nodes: A and B
(2) The queue 'abc' hosted on the node A.
(3) A consumer due to reconnection declare the queue on node B (not self).
(4) The node A is down and lose the queue 'abc'.
(5) The node B delete the queue metadata (because home node is down) and does not send the basic.cancel to consumers, because in this time they are not declared.
(6) The consumer trying declare self on missing queue and recieve 404.
Loosing a queue has an impact that server stops processing messages from it, which might be crucial to its work (depends on the queue).
Changed in oslo.messaging: | |
assignee: | nobody → Kirill Bespalov (k-besplv) |
description: | updated |
Fix proposed to branch: master /review. openstack. org/315700
Review: https:/