Juniper Openstack

Bug #1577278
Activity log

Activity log for bug #1577278

Date	Who	What changed	Old value	New value	Message
2016-05-02 03:27:44	Nischal Sheth	bug			added bug
2016-05-02 03:27:59	Nischal Sheth	nominated for series		juniperopenstack/trunk
2016-05-02 03:27:59	Nischal Sheth	bug task added		juniperopenstack/trunk
2016-05-02 04:33:06	Nischal Sheth	description	TBD	Existing implementation triggers a table walk for each (peer, table) join or leave request. It also triggers a separate walk per (peer, table) when all paths received from a peer need to to be marked stale/deleted as part of graceful restart. This behavior is fine when a single peer comes up or goes down, but it's sub-optimal when a bunch of peers go down or come up at roughly the same time. This happens if multiple vrouters encounter the same problem and crash or when the CN crashes and comes back up. In the latter case, we run into the so-called thundering herd problem wherein all vrouters connect to the CN at roughly the same time and then register to a large number of common tables. This causes a few problems: 1. The CN performs a large number of table walks that could potentially be combined into a much smaller number. 2. If there's a large number of peers and a large number of tables, the CN ends up triggering a very large number of walks at roughly the same time. This puts an unnecessary burden on the TaskScheduler since each table walk results in the creation of multiple Tasks (one per partition). 3. Not only does 1) above cause redundant table walks, it also results in redundant calls to BgpExport::Join/Leave and it's callees. Would be ideal to call the Join/Leave methods with a BitSet of peers to handle multiple peers at once. Note that the Join/Leave methods already handle a BitSet. 4. Since Join/Leave processing is done for 1 peer at a time, we also end up encoding each route update into a bgp/xmpp message for one or few peers at a time. Would be ideal to encode each route once and send it to all interested peers i.e. amortize the cost of encoding the update over many peers. Proposal is to rework implementation of bgp membership manager to address all the above issues. The membership manager can keep track of all pending (peer, table) requests and trigger a table walk for one table at a time. It can perform join/leave and receive path manipulation operations for all requesting peers for the table in question. Since each table is sharded across all partitions, triggering a single table walk still allows the Task infra to utilize all available threads/cores. Triggering one table walk at a time also allows the membership manager to accumulate multiple peer requests for all other tables.
2016-05-02 16:55:08	Nischal Sheth	summary	Rework bgp membership manager to improve efficiency	Rework bgp membership manager to improve scalability
2016-05-10 00:18:18	OpenContrail Admin	juniperopenstack/trunk: status	New	In Progress
2016-06-15 06:09:40	Nischal Sheth	juniperopenstack/trunk: status	In Progress	Fix Committed
2016-06-15 06:09:51	Nischal Sheth	juniperopenstack/trunk: milestone		r3.1.0.0-fcs
2016-06-15 15:51:20	OpenContrail Admin	juniperopenstack/trunk: status	Fix Committed	In Progress
2016-06-15 15:52:40	Nischal Sheth	description	Existing implementation triggers a table walk for each (peer, table) join or leave request. It also triggers a separate walk per (peer, table) when all paths received from a peer need to to be marked stale/deleted as part of graceful restart. This behavior is fine when a single peer comes up or goes down, but it's sub-optimal when a bunch of peers go down or come up at roughly the same time. This happens if multiple vrouters encounter the same problem and crash or when the CN crashes and comes back up. In the latter case, we run into the so-called thundering herd problem wherein all vrouters connect to the CN at roughly the same time and then register to a large number of common tables. This causes a few problems: 1. The CN performs a large number of table walks that could potentially be combined into a much smaller number. 2. If there's a large number of peers and a large number of tables, the CN ends up triggering a very large number of walks at roughly the same time. This puts an unnecessary burden on the TaskScheduler since each table walk results in the creation of multiple Tasks (one per partition). 3. Not only does 1) above cause redundant table walks, it also results in redundant calls to BgpExport::Join/Leave and it's callees. Would be ideal to call the Join/Leave methods with a BitSet of peers to handle multiple peers at once. Note that the Join/Leave methods already handle a BitSet. 4. Since Join/Leave processing is done for 1 peer at a time, we also end up encoding each route update into a bgp/xmpp message for one or few peers at a time. Would be ideal to encode each route once and send it to all interested peers i.e. amortize the cost of encoding the update over many peers. Proposal is to rework implementation of bgp membership manager to address all the above issues. The membership manager can keep track of all pending (peer, table) requests and trigger a table walk for one table at a time. It can perform join/leave and receive path manipulation operations for all requesting peers for the table in question. Since each table is sharded across all partitions, triggering a single table walk still allows the Task infra to utilize all available threads/cores. Triggering one table walk at a time also allows the membership manager to accumulate multiple peer requests for all other tables.	Existing implementation triggers a table walk for each (peer, table) join or leave request. It also triggers a separate walk per (peer, table) when all paths received from a peer need to to be marked stale/deleted as part of graceful restart. This behavior is fine when a single peer comes up or goes down, but it's sub-optimal when a bunch of peers go down or come up at roughly the same time. This happens if multiple vrouters encounter the same problem and crash or when the CN crashes and comes back up. In the latter case, we run into the so-called thundering herd problem wherein all vrouters connect to the CN at roughly the same time and then register to a large number of common tables. This causes a few problems: 1. The CN performs a large number of unnecessary table walks. These could potentially be combined into a much smaller number. 2. If there's a large number of peers and a large number of tables, the CN ends up triggering a very large number of walks at roughly the same time. This puts an unnecessary burden on the TaskScheduler since each table walk results in the creation of multiple Tasks (one per partition). 3. Not only does 1) above cause redundant table walks, it also results in redundant calls to BgpExport::Join/Leave and it's callees. Would be ideal to call the Join/Leave methods with a BitSet of peers to handle multiple peers at once. Note that the Join/Leave methods already handle a BitSet. 4. Since Join/Leave processing is done for 1 peer at a time, we also end up encoding each route update into a bgp/xmpp message for one or few peers at a time. Would be ideal to encode each route once and send it to all interested peers i.e. amortize the cost of encoding the update over many peers. Proposal is to rework implementation of bgp membership manager to address all the above issues. The membership manager can keep track of all pending (peer, table) requests and trigger a table walk for one table at a time. It can perform join/leave and receive path manipulation operations for all requesting peers for the table in question. Since each table is sharded across all partitions, triggering a single table walk still allows the Task infra to utilize all available threads/cores. Triggering one table walk at a time also allows the membership manager to accumulate multiple peer requests for all other tables.
2016-06-18 17:23:09	OpenContrail Admin	juniperopenstack/trunk: status	In Progress	Fix Committed