race happens in api_nb
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
DragonFlow |
Fix Released
|
High
|
Li Ma |
Bug Description
Code: https:/
If some other operations are taken concurrently, the data related to the key is unpredictable. Here we do need a general transaction operations provided by db_api layer to deal with synchronization,
just like the transaction provided by sqlalchemy.
It is also noted that almost all the operations provided by api_nb has the potential race, because they follow the same pattern:
(1) read the key
(2) do some calculation, like adding some json data into previous value
(3) overwrite the key
If some concurrent write happens at the step (2), the data will be overwritten after and the concurrent written data is definitely lost.
Changed in dragonflow: | |
assignee: | nobody → Li Ma (nick-ma-z) |
summary: |
- race happens in add_subnet of api_nb + race happens in api_nb |
description: | updated |
Changed in dragonflow: | |
importance: | Undecided → Medium |
I tested this situation with etcd-driver and the problem described above really happened. If concurrent writing to the same key when a certain operation is conducted, the value of that key is corrupted.
(1) Solution 1
I suggest to define a set of transaction interfaces at db-api layer and enforce all the db-backend driver to follow the rules. Then, rewrite nb-api to do transactions at every operation.
Pros: it is straightforward and does not need to introduce other mechanism.
Cons: need to modify the db-api layer and nb-api layer, along with all the db drivers.
(2) Solution 2
Introduce DLM for nb-api layer. The distributed lock is to solve such a synchronization problem. The only problem is that DLM also needs a distributed coordination backend, like etcd or zookeeper. We need to share the same backend for these two distinct purposes.
Pros: separate the responsibility of synchronization from db-api to external system, so don't need to modify db-api and db-drivers.
Cons: introduce the external system to implement synchronization.