Concurrency problem for claim mechanism

Bug #1732080 reported by yangzhenyu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
zaqar
Invalid
Undecided
Unassigned

Bug Description

At present, zaqar's claim mechanism allows users to exclusively obtain messages in the queue, that is, the message can only be obtained once in a while. This exclusive cycle is defined using claim ttl. In this cycle, the message can only be claimed once.
But when I was doing the concurrency test, I found that the message could be claimed multiple times at the same time. The probability of such a situation is very small, dozens of tests may occur once.
So, I looked at zaqar's source code for the claim section. When zaqar is processing the claim, it first gets a batch of messages that can be used for the claim, and then updates the metadata of the messages about the claim, including the claim expiration time, the claim id, and so on. Then through the id to get the message and return.
The above steps have 3 processing mongodb process, but this process is not locked, it will lead to concurrency problems. So I think here need a lock mechanism, locked in the first step, unlocked after the second update of the database, and the lock will expire, to prevent deadlock.

Changed in zaqar:
assignee: nobody → yangzhenyu (cdyangzhenyu)
Changed in zaqar:
status: New → Invalid
assignee: yangzhenyu (cdyangzhenyu) → nobody
Revision history for this message
Feilong Wang (flwang) wrote :

hmm... May I know why it's tagged as 'Invalid'? Is it still a problem?

Revision history for this message
Feilong Wang (flwang) wrote :

I reviewed the code of MongoDB claim.py again, based on the comments at https://github.com/openstack/zaqar/blob/master/zaqar/storage/mongodb/claims.py#L169-L179

We shouldn't run into problem as you mentioned above. Even though there is a potential bug, we should be able to fix it without introducing a global lock. Thoughts?

Revision history for this message
yangzhenyu (cdyangzhenyu) wrote :

hi, Feilong
I have carefully studied this part of the source code, although no obvious locking behavior, but through the expiration time and c.id can achieve the same effect. In other words, there should be no problem here. The problem I encountered was my own modification, I'm sorry.

Revision history for this message
Feilong Wang (flwang) wrote :

Cool, thanks for the confirmation, Zhenyu.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.