API hangs when queue is full

Bug #1422419 reported by Douglas Mendizábal
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Barbican
Won't Fix
Undecided
Unassigned

Bug Description

When using Rabbit MQ for queueing orders there is the possibility that Rabbit MQ may be full (out of disk space for example). When this happens, the API will hang waiting to send the message.

There are a few problems with this. First, if the API node is killed, the DB transaction is lost, since the transaction is not committed until the queue accepts the message.

Rabbit MQ does warn Barbican about the low disk space, however Barbican (or possibly oslo.messaging) did not log any of the low space warnings.

Revision history for this message
John Wood (john-wood-w) wrote :

Based on discussions at the mid-cycle, this situation should return a 503 status code to the client rather than trying to save the transaction. This would give clients an immediate feedback that something was wrong, with no expectation that Barbican will fulfill their request at that time. So I think the true bug is to figure out why the request is hanging, and instead return the 503.

The warning of low disk space would be good to get surfaced in Barbican logs, or else issue a bug against olso.messaging to add this to their logs.

Sungjin Yook (sungyook)
Changed in barbican:
assignee: nobody → Sungjin Yook (sungyook)
Revision history for this message
Sungjin Yook (sungyook) wrote :

Reproduced the defect. and will start working on the fix. Thanks for the details on irc. !

Revision history for this message
Sungjin Yook (sungyook) wrote :

The bug was confirmed as shown below. (from barbican-svc.log)
2015-11-06 14:18:09.645 ERROR oslo.messaging._drivers.impl_rabbit [req-3a35b2bc-63f8-494b-90d0-ce6618ec8781 31028101ca184086ad2a7f43006e88ba fe0915afe2cf4b7daacd6094c0b1e34b] The broker has blocked the connection: low on disk

and confirmed that order op's hung.

Changed in barbican:
status: New → Confirmed
Revision history for this message
Sungjin Yook (sungyook) wrote :
Download full text (3.4 KiB)

This defect may need to have fixes from both oslo.messaging and barbican.

Once order's posted to server, we will have this entry.

mysql> select * from orders;
+--------------------------------------+---------------------+---------------------+------------+---------+---------+------+--------------------------------------+-------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+--------------+------------+--------------------+----------------------------------+
| id | created_at | updated_at | deleted_at | deleted | status | type | project_id | error_status_code | error_reason | meta | secret_id | container_id | sub_status | sub_status_message | creator_id |
+--------------------------------------+---------------------+---------------------+------------+---------+---------+------+--------------------------------------+-------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+--------------+------------+--------------------+----------------------------------+
| 009ffd84-a31c-4e1a-a558-bcca16b629cb | 2015-11-09 06:02:45 | 2015-11-09 06:02:45 | NULL | 0 | PENDING | key | c5c20c03-cffc-4553-b485-7c69758a80aa | NULL | NULL | {"name": "secretname", "algorithm": "AES", "payload_content_type": "application/octet-stream", "expiration": null, "bit_length": 256, "mode": "cbc"} | NULL | NULL | NULL | NULL | f0d9becd7fa2455497e8613ba051f181 |

We see that PENDING state for the order that would stay PENDING forever since we have this rabbitMQ blocking service.

Also the reason of barbican order operation 'hang' is oslo_messaging displys ERROR about 'low on disk', but it never throws proper exception.
So Barbican does not know what to do.

I think the fixes have to come from both world (oslo_messaging and barbican).

My fix proposal is ..
1. fix oslo_messaging to raise 'MessageDeliveryFailure' exception when disk is low and messaging blocker service is blocked on rabbitMQ's end.

2. catch the 'MessageDeliveryFailure' exception when barbican cast order to oslo_messaging RPC frame work (from barbican/queue/client.py).

3. Log the 'low on disk' error in barbican.log

4. Move order DB state from PENDING to ERROR to indicate permanent error occurred.
    and return 503 HTTP status code and

    I consider an order as 'order history'. So, even failed order needs to be in the record, but as 'failed order'. Since we have logged why this order failed in the barbican log, the failed order has valid reason to stay in DB.

With this fix, it would be clear to a user since he/she c...

Read more...

Revision history for this message
Sungjin Yook (sungyook) wrote :

Need to try max retry oslo option to see if it can be used as work-around for this issue.

Changed in barbican:
assignee: Sungjin Yook (sungyook) → nobody
status: Confirmed → New
Revision history for this message
Grzegorz Grasza (xek) wrote :

Closing out bugs created before migration to StoryBoard. Please re-open if you are of the opinion it is still current.

Changed in barbican:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.