[RFE] [LBaaS] ssh connection timeout

Bug #1457556 reported by Kevin Fox on 2015-05-21
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
octavia
In Progress
Wishlist
Reedip

Bug Description

In the V2 api, we need a way to tune the lb connection timeouts so that we can have a pool of ssh servers that have long running tcp connections. ssh sessions can last days to weeks and users get grumpy if the session times out if they are in the middle of doing something. Currently the timeouts are tuned to drop connections that are too long running regardless of if there is traffic on the connection or not. This is good for http, but bad for ssh.

Kevin Fox (kevpn) on 2015-05-21
tags: added: lbaas
Alan (kaihongd) wrote :

please try below solution and see if it can be workaround or not.

1. increase all underlay interfaces MTU, e.g. 9000
or
2. decrease the client and backend servers MTU to less than 1500. e.g. 1450

tags: added: rfe
Michael Johnson (johnsom) wrote :

I think we need to expose the following HAProxy options as a minimum:
retries
timeout connect
timeout client
timeout server
timeout http-keep-alive
timeout http-request
timeout tunnel

Nice to have:
timeout check
timeout client-fin
timeout queue
timeout server-fin

Kevin Fox (kevpn) wrote :

perhaps:
option tcpka

too.

Reedip (reedip-banerjee) wrote :

If this is currently open, can I take this up?

Kevin Fox (kevpn) wrote :

This is still a major problem for us... If your willing, please do. :)

In order to work around it, for now I've had to just patch the raw source code of the v1 lbaas with something like the patch below. But being able to do it per lb, would be much much better.. Since we're using v1, its relatively easy to patch in this hackish way since we only have to do it in one place. in v2 with amphora, it will be much harder. :/

Patch:
--- /usr/lib/python2.7/site-packages/neutron_lbaas/services/loadbalancer/drivers/haproxy/cfg.py.orig 2015-10-01 13:04:50.364724468 -0700
+++ /usr/lib/python2.7/site-packages/neutron_lbaas/services/loadbalancer/drivers/haproxy/cfg.py 2015-10-01 13:11:19.943959753 -0700
@@ -76,14 +76,26 @@

 def _build_defaults(config):
- opts = [
- 'log global',
- 'retries 3',
- 'option redispatch',
- 'timeout connect 5000',
- 'timeout client 50000',
- 'timeout server 50000',
- ]
+ opts = []
+ if config['vip']['protocol'] == 'TCP' and (config['vip']['protocol_port'] == 2811 or config['vip']['protocol_port'] == 22):
+ opts = [
+ 'log global',
+ 'retries 3',
+ 'option redispatch',
+ 'timeout connect 5000',
+ "timeout client %s" %(1 * 24 * 60 * 60 * 1000), #1 day
+ "timeout server %s" %(1 * 24 * 60 * 60 * 1000),
+ "option tcpka",
+ ]
+ else:
+ opts = [
+ 'log global',
+ 'retries 3',
+ 'option redispatch',
+ 'timeout connect 5000',
+ 'timeout client 50000',
+ 'timeout server 50000',
+ ]

     return itertools.chain(['defaults'], ('\t' + o for o in opts))

Thanks,
Kevin

Reedip (reedip-banerjee) wrote :

We can proceed for an option in both v1 and v2, but need to discuss it with other members as well.

Changed in neutron:
assignee: nobody → Reedip (reedip-banerjee)
Reedip (reedip-banerjee) wrote :

Note: V1 is deprecated in Liberty and would be moved out in future releases.
So maybe the priority of v2 would be a bit higher than v1

Kevin Fox (kevpn) wrote :

V2 only would be ok I think. We want to get to v2 anyway and if its fixed there, we can just move forward.

Thanks,
Kevin

Brandon Logan (brandon-logan) wrote :

Would adding a timeout option on the listener (for V2) be sufficient? this would set the timeout client and timeout server to the same value. Do you think its worthwhile to do a timeout on the listener for the timeout client option, and a timeout on the pool for the timeout server option?

Kevin Fox (kevpn) wrote :

Not sure. I've never had to have different values, but maybe others will.

Brandon Logan (brandon-logan) wrote :

I think we can go with timeout on listener first and if someone in the future wants to set them to different values then we can add it to the pool.

Akihiro Motoki (amotoki) wrote :

The current discussion sounds reasonable.

Changed in neutron:
importance: Undecided → Wishlist
status: New → Triaged
Henry Gessau (gessau) on 2015-11-24
summary: - lbaas ssh connection timeout
+ [RFE] [LBaaS] ssh connection timeout

Let's make sure we have a better understanding of how you intend to expose these attributes (to make sure they are not particularly tied to a specific backend), but this is a nice to have.

tags: added: rfe-approved
removed: rfe
Changed in neutron:
milestone: none → mitaka-1
Changed in neutron:
milestone: mitaka-1 → mitaka-2
Brandon Logan (brandon-logan) wrote :

a timeout value on the listener should solve this problem and I'm pretty sure all drivers would support this, though obviously they'd need to modify their drivers to support it. An extension would need to be made for this that just adds the timeout value to the listener in the RESOURCE_ATTRIBUTE_MAP. Then the plugin and octavia driver will need to support this as well at first.

Changed in neutron:
milestone: mitaka-2 → mitaka-3

This looks it's dead. Let's garbage collect it.

Changed in neutron:
milestone: mitaka-3 → none
status: Triaged → Incomplete
assignee: Reedip (reedip-banerjee) → nobody
Kevin Fox (kevpn) wrote :

Still desperately needed. No solution yet. Don't drop the bug please.

Reedip (reedip-banerjee) wrote :

Apologies, was caught up in other work, will put up a patch coming weekend ( though it will be a WIP for now)

Changed in neutron:
assignee: nobody → Reedip (reedip-banerjee)

Fix proposed to branch: master
Review: https://review.openstack.org/273896

Changed in neutron:
status: Incomplete → In Progress
Reedip (reedip-banerjee) on 2016-01-29
Changed in python-neutronclient:
assignee: nobody → Reedip (reedip-banerjee)

Fix proposed to branch: master
Review: https://review.openstack.org/273911

Changed in python-neutronclient:
status: New → In Progress
Changed in neutron:
milestone: none → mitaka-3
Changed in neutron:
milestone: mitaka-3 → mitaka-rc1
Changed in neutron:
milestone: mitaka-rc1 → newton-1

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/273911
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

What's holding this one back?

Changed in neutron:
milestone: newton-1 → newton-2

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/273911
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Yet again on the backburner.

Changed in python-neutronclient:
importance: Undecided → Wishlist

Perhaps its time has passed and we should consider abandoning the attempt to deliver this feature.

Changed in neutron:
assignee: Reedip (reedip-banerjee) → nobody
Changed in python-neutronclient:
assignee: Reedip (reedip-banerjee) → nobody
status: In Progress → Incomplete
Changed in neutron:
status: In Progress → Incomplete
Kevin Fox (kevpn) wrote :

Still a very needed feature.... It will only become more apparent as things like k8s start integrating with lbaasv2, or they will avoid using it.

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/273896
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

@Kevin: sure, if only someone wanted to work on it and get it to the finish line.

Kevin Fox (kevpn) wrote :

yeah. this is hampering adoption. I haven't been able to consider octavia yet, or switching away from lbaasv1 until there's a solution.

I've also been actively looking at other non neutron solutions. :/

Changed in neutron:
assignee: nobody → Reedip (reedip-banerjee)
status: Incomplete → In Progress
Changed in neutron:
milestone: newton-2 → newton-3
Changed in neutron:
milestone: newton-3 → newton-rc1
Changed in neutron:
milestone: newton-rc1 → none
status: In Progress → Incomplete
Changed in neutron:
status: Incomplete → In Progress

So it says it's in progress. Where are the patches?

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/273911
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/273896
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

affects: neutron → octavia
Reedip (reedip-banerjee) on 2017-03-20
affects: python-neutronclient → python-openstackclient
Changed in python-openstackclient:
status: Incomplete → New
assignee: nobody → Reedip (reedip-banerjee)
Michael Johnson (johnsom) wrote :

Reedip,
Our client code is in python-octaviaclient repository and tracked under the octavia project.
This doesn't need to be against python-openstackclient.

Reedip (reedip-banerjee) wrote :

Michael,
I can remove it... but the implementation in octaviaclient is for OSC itself, and besides, there is no project in launchpad for octaviaclient

no longer affects: python-openstackclient

Change abandoned by Michael Johnson (<email address hidden>) on branch: master
Review: https://review.openstack.org/412971
Reason: This has not been updated in 4 months.
Abandoning. You can restore the patch later if you feel it is still valuable and you want to continue working on it.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers