Control node deadlock in RibOutUpdates::TailDequeue
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R1.1 |
Fix Committed
|
Critical
|
Nischal Sheth | |||
R2.0 |
Fix Released
|
Critical
|
Nischal Sheth | |||
R2.1 |
Fix Released
|
Critical
|
Nischal Sheth | |||
Trunk |
Fix Committed
|
Critical
|
Nischal Sheth | |||
OpenContrail |
Fix Released
|
Critical
|
Nischal Sheth |
Bug Description
Control node deadlocked in RibOutUpdates:
when UpdatePack tried to acquire a lock on the next RouteUpdate in the
attr_set_ in UpdateQueue. This in turn happened since TailDequeue
already had a lock on the RouteUpdate in question since it was the previous one in the (temporal) queue_ in UpdateQueue.
Root cause is that calling clock_gettime with CLOCK_MONOTONIC does not
guarantee monotonically increasing timestamps.
(gdb) info threads
Id Target Id Frame
4 Thread 0x7f86bd323700 (LWP 25182) "contrail-contro" 0x00007f86c4ea3f2c in __lll_lock_wait ()
from /lib/x86_
3 Thread 0x7f86bcf22700 (LWP 25183) "contrail-contro" 0x00007f86c4ea3f2c in __lll_lock_wait ()
from /lib/x86_
2 Thread 0x7f86bcb21700 (LWP 25184) "contrail-contro" 0x00007f86c4ea1414 in pthread_
from /lib/x86_
* 1 Thread 0x7f86c69487c0 (LWP 25177) "contrail-contro" 0x00007f86c3f6e5fa in epoll_ctl () from /lib/x86_
Thread 4 is the lock owner
(gdb) frame 5
#5 0x00000000007622df in RibUpdateMonito
next_
973 RouteUpdatePtr update(mp, rt_update, &mutex_, &cond_var_);
Next queue element by attribute:
(gdb) p mp
$1 = (tbb::mutex *) 0x7f86b02eedb0
(gdb) p rt_update
$2 = (RouteUpdate *) 0x7f86b02eed90
#8 0x000000000071f95d in RibOutUpdates:
at controller/
187 if (!DequeueCommon
The previously processed by Tail dequeue (which is still locked is):
(gdb) p next_update
$3 = {entry_mutexp_ = 0x7f86b02eedb0, rt_update_ = 0x7f86b02eed90, monitor_mutexp_ = 0x7f86b0096b50, cond_var_ = 0x7f86b0096b78}
Previous and current elements have the same timestamp:
(gdb) p update.
$6 = 851624330824509
(gdb) p next_update.
$7 = 851624330824509
This causes the order of the attribute queue to dependent on the UpdateInfo pointer.
description: | updated |
Changed in opencontrail: | |
status: | New → In Progress |
Changed in juniperopenstack: | |
importance: | Undecided → Critical |
assignee: | nobody → Nischal Sheth (nsheth) |
status: | New → In Progress |
tags: | added: blocker |
Changed in opencontrail: | |
status: | In Progress → Fix Committed |
Changed in opencontrail: | |
status: | Fix Committed → Fix Released |
Reviewed: https:/ /review. opencontrail. org/6290 github. org/Juniper/ contrail- controller/ commit/ 6bb5d55d43d1e37 dec2372b21f09ee 1dacbb8c42
Committed: http://
Submitter: Zuul
Branch: R1.10
commit 6bb5d55d43d1e37 dec2372b21f09ee 1dacbb8c42
Author: Nischal Sheth <email address hidden>
Date: Fri Jan 16 13:08:02 2015 -0800
Fix deadlock in RibOutUpdates: :TailDequeue/ PeerDequeue
RibOutUpdates: :TailDequeue/ PeerDequeue can deadlock if 2 RouteUpdates
have the same timestamp. Calling clock_gettime with CLOCK_MONOTONIC
does not guarantee monotonically increasing timestamps.
Use atomic uint64_t to implement relative timestamp for RouteUpdate.
Change-Id: I328fc96405c51d ace5b5e8a79b800 26631a0bb4b
Partial-Bug: 1411855