Activity log for bug #1677419

Date Who What changed Old value New value Message
2017-03-30 01:00:43 Jon Hickman bug added bug
2017-03-30 01:01:23 Jon Hickman bug added subscriber Netronome vRouter engineering
2017-03-30 01:02:38 Jon Hickman bug added subscriber Contrail Systems engineering
2017-03-30 01:03:59 Jon Hickman agiliovrouter: assignee Jon Hickman (jhickman8x3)
2017-03-30 14:01:53 Jon Hickman agiliovrouter: status New In Progress
2017-03-30 15:59:04 Jon Hickman bug added subscriber Savithru Lokanath
2017-03-30 18:11:22 Jon Hickman description Venu was running test where he was setting the vr_flow_entries to 4M and was running a PPS test with 1600 flows active but the performance was not as good as it was when we ran Contrail with vr_flow_entries at 512K. After looking at it we found that the Agilio vRouter offload was not updating all the flow statistics when the number of flow entries set to anything > 512K. This caused the flows to timeout and the extra work of adding and removing flows during the PPS test was noticeable. So we checked our Agilio vRouter requirement for number of flows supported and it says 1M flows. Now we are not doing 1M flows at the moment and that needs to be fixed but as we will explain further adding flow capability comes at a cost so we should be judicious in our decisions. Lets explain a little about the number of flow entries effects the way that must maintain flow statistics and how that plays into overhead burden even when flows are not active. The flow statistics table entries must be as large as the vr_flow_entries+flow table overflow limit. We have a internal flow statistics table that is kept for packet and byte counters associated with each flow. To keep the agent from timing out flows while packets are offloaded we must send to agent all the flow counter for every entry at a rate that faster than the shortest timeout value of 1sec. So right now we attempt to send back the entire flow stats at 3x per second to assure a 1 sec timeout still will work. So the default size of the table at 512K flow entries + 103K overflow entries is 615K worth of entries. (this is as large as we support with current 154 release) Increasing that to a table size of 1024K flow entries + 200K overflow entries is 1224K worth of entries. Making it increase to a table size of 4096K flow entries + 800K overflow entries is 4896K worth of entries. As you can see the larger the number of flow entries supported in the offload, the more overhead is taken for updating the agent with flow counters. This happens whether or not the 1 or 4M flows are active. Our sizing is presently static so we would live with the largest number allowed for operation with Agilio vRouter then factor in the rate at which the updates safely would need sent to agent to maintain a 1 sec timeout. Another option that can play with this is if we are not required to maintain the 1 sec timeout then the update rate could be reduced. I have not done the math yet on how how much of an impact it would have on fallback traffic to maintain 4M flow entries yet. This also has ramifications to mirroring because to offload mirroring we need to keep per flow entry the mirroring meta data. So there should probably be a discussion with Raja on: 1. What is the requirement for Agilio vRouter max flow entries. 2. Can the update rate be reduced via supporting a longer minimum timeout value. 3. Discussion about how this impacts mirroring meta data. Venu was running test where he was setting the vr_flow_entries to 4M and was running a PPS test with 1600 flows active but the performance was not as good as it was when we ran Contrail with vr_flow_entries at 512K. After looking at it we found that the Agilio vRouter offload was not updating all the flow statistics when the number of flow entries set to anything > 512K. This caused the flows to timeout and the extra work of adding and removing flows during the PPS test was noticeable. So we checked our Agilio vRouter requirement for number of flows supported and it says 1M flows. Now we are not doing 1M flows at the moment and that needs to be fixed but as we will explain further adding flow capability comes at a cost so we should be judicious in our decisions. Lets explain a little about the number of flow entries effects the way that must maintain flow statistics and how that plays into overhead burden even when flows are not active. The flow statistics table entries must be as large as the vr_flow_entries+flow table overflow limit. We have a internal flow statistics table that is kept for packet and byte counters associated with each flow. To keep the agent from timing out flows while packets are offloaded we must send to agent all the flow counter for every entry at a rate that faster than the shortest timeout value of 1sec. So right now we attempt to send back the entire flow stats at 3x per second to assure a 1 sec timeout still will work. So the default size of the table at 512K flow entries + 103 overflow entries is 615K worth of entries. (this is as large as we support with current 154 release) Increasing that to a table size of 1024K flow entries + 200 overflow entries is 1224K worth of entries. Making it increase to a table size of 4096K flow entries + 800 overflow entries is 4896K worth of entries. As you can see the larger the number of flow entries supported in the offload, the more overhead is taken for updating the agent with flow counters. This happens whether or not the 1 or 4M flows are active. Our sizing is presently static so we would live with the largest number allowed for operation with Agilio vRouter then factor in the rate at which the updates safely would need sent to agent to maintain a 1 sec timeout. Another option that can play with this is if we are not required to maintain the 1 sec timeout then the update rate could be reduced. I have not done the math yet on how how much of an impact it would have on fallback traffic to maintain 4M flow entries yet. This also has ramifications to mirroring because to offload mirroring we need to keep per flow entry the mirroring meta data. So there should probably be a discussion with Raja on: 1. What is the requirement for Agilio vRouter max flow entries. 2. Can the update rate be reduced via supporting a longer minimum timeout value. 3. Discussion about how this impacts mirroring meta data.
2017-04-04 11:47:50 Jon Hickman bug added subscriber Gareth
2017-04-04 11:48:23 Jon Hickman information type Proprietary Public
2017-04-04 17:59:16 Jeba Paulaiyan agiliovrouter: status In Progress New
2017-04-04 17:59:29 Jeba Paulaiyan agiliovrouter: status New In Progress
2017-04-04 17:59:53 Jeba Paulaiyan tags iconic
2017-04-05 15:36:41 Jon Hickman information type Public Private
2017-04-05 15:37:31 Jon Hickman removed subscriber Netronome vRouter engineering
2017-04-05 15:37:31 Jon Hickman removed subscriber Savithru Lokanath
2017-04-05 15:37:31 Jon Hickman removed subscriber Contrail Systems engineering
2017-04-05 15:37:31 Jon Hickman removed subscriber Gareth
2017-04-05 17:42:42 Jon Hickman information type Private Public
2017-04-05 17:43:09 Jon Hickman bug added subscriber Contrail Systems engineering
2017-04-05 17:43:21 Jon Hickman bug added subscriber Netronome vRouter engineering
2017-04-05 17:43:32 Jon Hickman bug added subscriber Savithru Lokanath
2017-04-06 15:42:55 Jeba Paulaiyan tags iconic iconic releasenote