Activity log for bug #1577278

Date Who What changed Old value New value Message
2016-05-02 03:27:44 Nischal Sheth bug added bug
2016-05-02 03:27:59 Nischal Sheth nominated for series juniperopenstack/trunk
2016-05-02 03:27:59 Nischal Sheth bug task added juniperopenstack/trunk
2016-05-02 04:33:06 Nischal Sheth description TBD Existing implementation triggers a table walk for each (peer, table) join or leave request. It also triggers a separate walk per (peer, table) when all paths received from a peer need to to be marked stale/deleted as part of graceful restart. This behavior is fine when a single peer comes up or goes down, but it's sub-optimal when a bunch of peers go down or come up at roughly the same time. This happens if multiple vrouters encounter the same problem and crash or when the CN crashes and comes back up. In the latter case, we run into the so-called thundering herd problem wherein all vrouters connect to the CN at roughly the same time and then register to a large number of common tables. This causes a few problems: 1. The CN performs a large number of table walks that could potentially be combined into a much smaller number. 2. If there's a large number of peers and a large number of tables, the CN ends up triggering a very large number of walks at roughly the same time. This puts an unnecessary burden on the TaskScheduler since each table walk results in the creation of multiple Tasks (one per partition). 3. Not only does 1) above cause redundant table walks, it also results in redundant calls to BgpExport::Join/Leave and it's callees. Would be ideal to call the Join/Leave methods with a BitSet of peers to handle multiple peers at once. Note that the Join/Leave methods already handle a BitSet. 4. Since Join/Leave processing is done for 1 peer at a time, we also end up encoding each route update into a bgp/xmpp message for one or few peers at a time. Would be ideal to encode each route once and send it to all interested peers i.e. amortize the cost of encoding the update over many peers. Proposal is to rework implementation of bgp membership manager to address all the above issues. The membership manager can keep track of all pending (peer, table) requests and trigger a table walk for one table at a time. It can perform join/leave and receive path manipulation operations for all requesting peers for the table in question. Since each table is sharded across all partitions, triggering a single table walk still allows the Task infra to utilize all available threads/cores. Triggering one table walk at a time also allows the membership manager to accumulate multiple peer requests for all other tables.
2016-05-02 16:55:08 Nischal Sheth summary Rework bgp membership manager to improve efficiency Rework bgp membership manager to improve scalability
2016-05-10 00:18:18 OpenContrail Admin juniperopenstack/trunk: status New In Progress
2016-06-15 06:09:40 Nischal Sheth juniperopenstack/trunk: status In Progress Fix Committed
2016-06-15 06:09:51 Nischal Sheth juniperopenstack/trunk: milestone r3.1.0.0-fcs
2016-06-15 15:51:20 OpenContrail Admin juniperopenstack/trunk: status Fix Committed In Progress
2016-06-15 15:52:40 Nischal Sheth description Existing implementation triggers a table walk for each (peer, table) join or leave request. It also triggers a separate walk per (peer, table) when all paths received from a peer need to to be marked stale/deleted as part of graceful restart. This behavior is fine when a single peer comes up or goes down, but it's sub-optimal when a bunch of peers go down or come up at roughly the same time. This happens if multiple vrouters encounter the same problem and crash or when the CN crashes and comes back up. In the latter case, we run into the so-called thundering herd problem wherein all vrouters connect to the CN at roughly the same time and then register to a large number of common tables. This causes a few problems: 1. The CN performs a large number of table walks that could potentially be combined into a much smaller number. 2. If there's a large number of peers and a large number of tables, the CN ends up triggering a very large number of walks at roughly the same time. This puts an unnecessary burden on the TaskScheduler since each table walk results in the creation of multiple Tasks (one per partition). 3. Not only does 1) above cause redundant table walks, it also results in redundant calls to BgpExport::Join/Leave and it's callees. Would be ideal to call the Join/Leave methods with a BitSet of peers to handle multiple peers at once. Note that the Join/Leave methods already handle a BitSet. 4. Since Join/Leave processing is done for 1 peer at a time, we also end up encoding each route update into a bgp/xmpp message for one or few peers at a time. Would be ideal to encode each route once and send it to all interested peers i.e. amortize the cost of encoding the update over many peers. Proposal is to rework implementation of bgp membership manager to address all the above issues. The membership manager can keep track of all pending (peer, table) requests and trigger a table walk for one table at a time. It can perform join/leave and receive path manipulation operations for all requesting peers for the table in question. Since each table is sharded across all partitions, triggering a single table walk still allows the Task infra to utilize all available threads/cores. Triggering one table walk at a time also allows the membership manager to accumulate multiple peer requests for all other tables. Existing implementation triggers a table walk for each (peer, table) join or leave request. It also triggers a separate walk per (peer, table) when all paths received from a peer need to to be marked stale/deleted as part of graceful restart. This behavior is fine when a single peer comes up or goes down, but it's sub-optimal when a bunch of peers go down or come up at roughly the same time. This happens if multiple vrouters encounter the same problem and crash or when the CN crashes and comes back up. In the latter case, we run into the so-called thundering herd problem wherein all vrouters connect to the CN at roughly the same time and then register to a large number of common tables. This causes a few problems: 1. The CN performs a large number of unnecessary table walks. These could potentially be combined into a much smaller number. 2. If there's a large number of peers and a large number of tables, the CN ends up triggering a very large number of walks at roughly the same time. This puts an unnecessary burden on the TaskScheduler since each table walk results in the creation of multiple Tasks (one per partition). 3. Not only does 1) above cause redundant table walks, it also results in redundant calls to BgpExport::Join/Leave and it's callees. Would be ideal to call the Join/Leave methods with a BitSet of peers to handle multiple peers at once. Note that the Join/Leave methods already handle a BitSet. 4. Since Join/Leave processing is done for 1 peer at a time, we also end up encoding each route update into a bgp/xmpp message for one or few peers at a time. Would be ideal to encode each route once and send it to all interested peers i.e. amortize the cost of encoding the update over many peers. Proposal is to rework implementation of bgp membership manager to address all the above issues. The membership manager can keep track of all pending (peer, table) requests and trigger a table walk for one table at a time. It can perform join/leave and receive path manipulation operations for all requesting peers for the table in question. Since each table is sharded across all partitions, triggering a single table walk still allows the Task infra to utilize all available threads/cores. Triggering one table walk at a time also allows the membership manager to accumulate multiple peer requests for all other tables.
2016-06-18 17:23:09 OpenContrail Admin juniperopenstack/trunk: status In Progress Fix Committed