Add conntrackd support to HA routers in L3 agent

Bug #1365438 reported by Assaf Muller
62
This bug affects 10 people
Affects Status Importance Assigned to Milestone
neutron
In Progress
Medium
Unassigned

Bug Description

Open TCP sessions are discarded during HA router failover. Adding conntrackd support should solve this issue.

Some work has already been done in the following two patches:
https://review.openstack.org/#/c/71586/
https://review.openstack.org/#/c/80332/

Tags: l3-ha
Changed in neutron:
importance: Undecided → Medium
John Schwarz (jschwarz)
Changed in neutron:
assignee: nobody → John Schwarz (jschwarz)
Changed in neutron:
assignee: John Schwarz (jschwarz) → yong sheng gong (gongysh)
Revision history for this message
John Schwarz (jschwarz) wrote :

Hi,

Is there any progress with this bug?

John.

Revision history for this message
Qin TianHuan (646543317-j-deactivatedaccount) wrote :

Hi yong sheng,

Any progress with this bug? I'll do something on this bug to solve our current problem(which will implement conntracked in L3 HA Router), so if you don't have time on this bug, we can solve it together :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/184474

Changed in neutron:
assignee: yong sheng gong (gongysh) → Qin TianHuan (646543317-j)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by QthCN (<email address hidden>) on branch: master
Review: https://review.openstack.org/184474

Revision history for this message
Antonio Messina (arcimboldo) wrote :

I'm also interested in this bug, as we are running Neutron L3 agents in HA

Changed in neutron:
status: In Progress → New
Revision history for this message
Manjeet Singh Bhatia (manjeet-s-bhatia) wrote :

akamyshnikova I am interested in doing this. sorry it was unassigned i assigned it to me .
can i take that ?

Changed in neutron:
assignee: nobody → Manjeet Singh Bhatia (manjeet-s-bhatia)
Revision history for this message
Ann Taraday (akamyshnikova) wrote :

@Manjeet Singh Bhatia

Sure!

Revision history for this message
Hirofumi Ichihara (ichihara-hirofumi) wrote :

@Manjeet Singh Bhatia: I hope this feature is ready in Mitaka. Do you have plan about this?

Revision history for this message
Manjeet Singh Bhatia (manjeet-s-bhatia) wrote :

I cherry picked the changes now I am doing additional modifications. also lot of tests are failing in new neutron code base due to this. Ill first fix the tests., then will do operational testing over devstack. once it worked I will push it.

Revision history for this message
Hirofumi Ichihara (ichihara-hirofumi) wrote :

@Manjeet Singh Bhatia: Do you still have plan for this bug?

Revision history for this message
Ann Taraday (akamyshnikova) wrote :

Any updates?

Changed in neutron:
assignee: Manjeet Singh Bhatia (manjeet-s-bhatia) → nobody
Revision history for this message
Hirofumi Ichihara (ichihara-hirofumi) wrote :

I try it.

Changed in neutron:
assignee: nobody → Hirofumi Ichihara (ichihara-hirofumi)
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

I'm a bit concerned about the overhead of synchronizing conntrack between routers. In my experience, it is very significant and I actually doubt the benefit somewhat. So, maybe you can help me understand why this is important.

The linux kernel enables the nf_conntrack_tcp_loose sysctl option by default. When this is enabled. Conntrack will pick up existing connections traffic appears coming from the "right" direction, meaning the direction that would be allowed to initiate the session in the first place. In most cases, I imagine that this should be sufficient to maintain continuity for connections.

To me, the description of this bug doesn't adequately justify the developer work and the increase amount of overhead that this feature will add. For example, it doesn't cite any kind of real world experience or testing, it just merely states that connections are discarded and doesn't consider the potential mitigation from the tcp_loose option. I'm very concerned.

Changed in neutron:
status: New → Incomplete
Changed in neutron:
assignee: Hirofumi Ichihara (ichihara-hirofumi) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
Revision history for this message
Arturo Borrero Gonzalez (arturoborrero) wrote :

This is important for NAT-based connections:
* general SNAT being done by the neutron router on egress
* the floating IP NAT

I don't think TCP connections using NAT in a failover scenario will keep going unless the conntrack info has been synced to the other neutron router.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/917430

Changed in neutron:
status: Expired → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.