Comment 0 for bug 1451306

Revision history for this message
Nischal Sheth (nsheth) wrote :

Noticed that peer Join to a table occasionally does not download all
routes in the table to the peer. Updates for some routes get stuck in
the bulk update queue for the peer.

Root cause is a concurrency issue in RibUpdateMonitor::MergeUpdate.

If there's no DBState for the route, EnqueueUpdateUnlocked is called
without locking the monitor mutex. If another thread is in GetNextUpdate,
it possible that EnqueueUpdateUnlocked returns false, even though
the last update on the queue is being dequeued by that thread. This
is because UpdateQueue::NextUpdate and UpdateQueue::MoveMarker
are 2 separate operations.

As a result, BgpExport::Join will not kick the SchedulingGroup to start
a tail dequeue for the bulk queue.