Barbican

API hangs when queue is full

Bug #1422419 reported by Douglas Mendizábal on 2015-02-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Barbican	Won't Fix	Undecided	Unassigned

Bug Description

When using Rabbit MQ for queueing orders there is the possibility that Rabbit MQ may be full (out of disk space for example). When this happens, the API will hang waiting to send the message.

There are a few problems with this. First, if the API node is killed, the DB transaction is lost, since the transaction is not committed until the queue accepts the message.

Rabbit MQ does warn Barbican about the low disk space, however Barbican (or possibly oslo.messaging) did not log any of the low space warnings.

Revision history for this message

John Wood (john-wood-w) wrote on 2015-02-20:

Based on discussions at the mid-cycle, this situation should return a 503 status code to the client rather than trying to save the transaction. This would give clients an immediate feedback that something was wrong, with no expectation that Barbican will fulfill their request at that time. So I think the true bug is to figure out why the request is hanging, and instead return the 503.

The warning of low disk space would be good to get surfaced in Barbican logs, or else issue a bug against olso.messaging to add this to their logs.

Sungjin Yook (sungyook) on 2015-11-06

Changed in barbican:
assignee:	nobody → Sungjin Yook (sungyook)

Revision history for this message

Sungjin Yook (sungyook) wrote on 2015-11-06:

Reproduced the defect. and will start working on the fix. Thanks for the details on irc. !

Revision history for this message

Sungjin Yook (sungyook) wrote on 2015-11-07:

The bug was confirmed as shown below. (from barbican-svc.log)
2015-11-06 14:18:09.645 ERROR oslo.messaging._drivers.impl_rabbit [req-3a35b2bc-63f8-494b-90d0-ce6618ec8781 31028101ca184086ad2a7f43006e88ba fe0915afe2cf4b7daacd6094c0b1e34b] The broker has blocked the connection: low on disk

and confirmed that order op's hung.

Changed in barbican:
status:	New → Confirmed

Revision history for this message

Sungjin Yook (sungyook) wrote on 2015-11-18:

Download full text (3.4 KiB)

This defect may need to have fixes from both oslo.messaging and barbican.

Once order's posted to server, we will have this entry.

mysql> select * from orders;
+--------------------------------------+---------------------+---------------------+------------+---------+---------+------+--------------------------------------+-------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+--------------+------------+--------------------+----------------------------------+
| id | created_at | updated_at | deleted_at | deleted | status | type | project_id | error_status_code | error_reason | meta | secret_id | container_id | sub_status | sub_status_message | creator_id |
+--------------------------------------+---------------------+---------------------+------------+---------+---------+------+--------------------------------------+-------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+--------------+------------+--------------------+----------------------------------+
| 009ffd84-a31c-4e1a-a558-bcca16b629cb | 2015-11-09 06:02:45 | 2015-11-09 06:02:45 | NULL | 0 | PENDING | key | c5c20c03-cffc-4553-b485-7c69758a80aa | NULL | NULL | {"name": "secretname", "algorithm": "AES", "payload_content_type": "application/octet-stream", "expiration": null, "bit_length": 256, "mode": "cbc"} | NULL | NULL | NULL | NULL | f0d9becd7fa2455497e8613ba051f181 |

We see that PENDING state for the order that would stay PENDING forever since we have this rabbitMQ blocking service.

Also the reason of barbican order operation 'hang' is oslo_messaging displys ERROR about 'low on disk', but it never throws proper exception.
So Barbican does not know what to do.

I think the fixes have to come from both world (oslo_messaging and barbican).

My fix proposal is ..
1. fix oslo_messaging to raise 'MessageDeliveryFailure' exception when disk is low and messaging blocker service is blocked on rabbitMQ's end.

2. catch the 'MessageDeliveryFailure' exception when barbican cast order to oslo_messaging RPC frame work (from barbican/queue/client.py).

3. Log the 'low on disk' error in barbican.log

4. Move order DB state from PENDING to ERROR to indicate permanent error occurred.
and return 503 HTTP status code and

I consider an order as 'order history'. So, even failed order needs to be in the record, but as 'failed order'. Since we have logged why this order failed in the barbican log, the failed order has valid reason to stay in DB.

With this fix, it would be clear to a user since he/she c...

This defect may need to have fixes from both oslo.messaging and barbican.

Once order's posted to server, we will have this entry.

mysql> select * from orders;
+--------------------------------------+---------------------+---------------------+------------+---------+---------+------+--------------------------------------+-------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+--------------+------------+--------------------+----------------------------------+
| id                                   | created_at          | updated_at          | deleted_at | deleted | status  | type | project_id                           | error_status_code | error_reason | meta                                                                                                                                                 | secret_id                            | container_id | sub_status | sub_status_message | creator_id                       |
+--------------------------------------+---------------------+---------------------+------------+---------+---------+------+--------------------------------------+-------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+--------------+------------+--------------------+----------------------------------+
| 009ffd84-a31c-4e1a-a558-bcca16b629cb | 2015-11-09 06:02:45 | 2015-11-09 06:02:45 | NULL       |       0 | PENDING | key  | c5c20c03-cffc-4553-b485-7c69758a80aa | NULL              | NULL         | {"name": "secretname", "algorithm": "AES", "payload_content_type": "application/octet-stream", "expiration": null, "bit_length": 256, "mode": "cbc"} | NULL                                 | NULL         | NULL       | NULL               | f0d9becd7fa2455497e8613ba051f181 |

We see that PENDING state for the order that would stay PENDING forever since we have this rabbitMQ blocking service.

Also the reason of barbican order operation 'hang' is oslo_messaging displys ERROR about 'low on disk', but it never throws proper exception.
So Barbican does not know what to do.

I think the fixes have to come from both world (oslo_messaging and barbican).

My fix proposal is ..
1. fix oslo_messaging to raise 'MessageDeliveryFailure' exception when disk is low and messaging blocker service is blocked on rabbitMQ's end.

2. catch the 'MessageDeliveryFailure' exception when barbican cast order to oslo_messaging RPC frame work (from barbican/queue/client.py).

3. Log the 'low on disk' error in barbican.log

4. Move order DB state from PENDING to ERROR to indicate permanent error occurred.
    and return 503 HTTP status code and

With this fix, it would be clear to a user since he/she can relate 'low on disk' error in the log to the failed order's state (ERROR state).

Currently working with oslo.messaging community about raising expception.
The discussion is going on via 
https://bugs.launchpad.net/oslo.messaging/+bug/1516745

Revision history for this message

Sungjin Yook (sungyook) wrote on 2016-01-18:

Need to try max retry oslo option to see if it can be used as work-around for this issue.

Douglas Mendizábal (dougmendizabal) on 2017-02-22

Changed in barbican:
assignee:	Sungjin Yook (sungyook) → nobody
status:	Confirmed → New

Revision history for this message

Grzegorz Grasza (xek) wrote on 2023-04-25:

Closing out bugs created before migration to StoryBoard. Please re-open if you are of the opinion it is still current.

Changed in barbican:
status:	New → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.