Instances on the same compute node unable to connect to each other's ports

Bug #1478925 reported by Ioana-Madalina Patrichi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Kevin Benton

Bug Description

Openstack version: Icehouse 2014.1.5
Nova version: 2.17.0

I have two instances created on the same compute node connected to a virtual network. I am trying to connect via the virtual network from one instance to another to some port to which no process is listening to and I am expecting to get a 'Connection refused' message from the kernel.

This works as expected with any two instances on the same virtual network that are located on different compute nodes, however, if the instances are created on the same compute node, the connection times out.

I have noticed that a temporary fix has been to tamper with the input iptables rules by moving the rule which drops packets in an invalid state after the rules for the other instances are defined, as such:

From:
-A neutron-openvswi-ic05bb97b-2 -m state --state INVALID -j DROP
-A neutron-openvswi-ic05bb97b-2 -m state --state RELATED,ESTABLISHED -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p tcp -m tcp --dport 22 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p tcp -m tcp --dport 35357 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p tcp -m tcp --dport 80 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p tcp -m tcp --dport 5000 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p tcp -m tcp -m multiport --dports 9000:9999 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.41/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.25/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.45/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.17/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 4.0.0.12/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.36/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.43/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.40/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.35/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 6.0.0.3/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.28/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 4.0.0.10/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.22/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.44/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.47/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.44/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.39/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.20/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.26/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.38/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.29/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.48/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 6.0.0.6/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.15/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.24/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 4.0.0.11/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.45/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.54/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.13/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.43/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.33/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.42/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.46/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.42/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.23/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.50/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.12/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.16/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.14/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.37/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 5.0.0.7/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.41/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.46/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.48/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.30/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.21/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.27/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 5.0.0.8/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 6.0.0.5/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 5.0.0.6/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.49/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p icmp -j RETURN

To:
-A neutron-openvswi-ic05bb97b-2 -m state --state RELATED,ESTABLISHED -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p tcp -m tcp --dport 22 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p tcp -m tcp --dport 35357 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p tcp -m tcp --dport 80 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p tcp -m tcp --dport 5000 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p tcp -m tcp -m multiport --dports 9000:9999 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.41/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.25/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.45/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.17/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 4.0.0.12/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.36/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.43/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.40/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.35/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 6.0.0.3/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.28/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 4.0.0.10/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.22/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.44/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.47/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.44/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.39/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.20/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.26/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.38/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.29/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.48/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 6.0.0.6/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.15/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.24/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 4.0.0.11/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.45/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.54/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.13/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.43/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.33/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.42/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.46/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.42/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.23/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.50/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.12/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.16/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.14/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.37/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 5.0.0.7/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.41/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 3.0.0.46/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.48/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.30/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.21/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.27/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 5.0.0.8/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 6.0.0.5/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 5.0.0.6/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -s 9.0.0.49/32 -j RETURN
-A neutron-openvswi-ic05bb97b-2 -p icmp -j RETURN
-A neutron-openvswi-ic05bb97b-2 -m state --state INVALID -j DROP

description: updated
tags: added: network
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Is this a Nova issue or Neutron issue? :)

Revision history for this message
Venkateswarlu Payidimarry (venky-payidimarry) wrote :

Did you enabled n-net service?
Check which services was enabled on compute node.

Changed in neutron:
status: New → Confirmed
assignee: nobody → Kevin Benton (kevinbenton)
Revision history for this message
Ioana-Madalina Patrichi (ioana-madalina-patrichi) wrote :

@Davanum: You're right, it should be classified more as a neutron issue probably. Thanks for reclassifying it!

no longer affects: nova
Revision history for this message
Kevin Benton (kevinbenton) wrote :

i confirmed this. working on a solution now

Revision history for this message
Ioana-Madalina Patrichi (ioana-madalina-patrichi) wrote :

@Payidimarry: I am not using Devstack, I am experiencing this issue on a multi node Openstack cluster. I am not sure whether enabling this is an option here.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

the TCP RST is being dropped on the connection initiator

Revision history for this message
Kevin Benton (kevinbenton) wrote :

ouch, this is going to be a bit painful to fix.

The issue is that when two instances are on the same node, a connection between them shares a conntrack entry. As one side sends the RST, conntrack immediately destroys the entry. Then when it comes time to match the entry on the other side, there is no longer an entry so it is dropped as invalid.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/207464

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Kevin Benton (kevinbenton) wrote :

cross-post of my answer on ask.openstack.org about this issue: https://ask.openstack.org/en/question/28300/iptables-invalid-rule-preventing-rst-packets-on-closed-ports-between-vms/

This issue is caused by the iptables setup in the reference OVS implementation in Neutron.

Each VM gets its own filtering bridge, so the path of a packet between two VMs on the same host looks like this:

    VM1 -> bridge1 (iptables filtering) -> OVS -> bridge2 (iptables filtering) -> VM2

In this setup each packet goes through a conntrack lookup twice (once on each bridge). This would normally not be an issue; however, the conntrack state is shared between the filtering bridges. This is normally not a problem because conntrack is keeping track of both sides of the TCP connection. The issue comes with the RST flag.

When conntrack encounters a TCP packet with a RST flag it immediately destroys the conntrack entry for that connection. This means that once the RST packet reaches the second filtering bridge, the conntrack state has already been removed, so the RST packet is marked as INVALID.

    VM1 -> bridge1 (iptables filtering) -> OVS -> bridge2 (iptables filtering) -> VM2
    RST >> conntrack destroys conn. >>>>>>>>> no match, INVALID DROP

If you run **conntrack -E -o timestamp** while attempting to make a connection that causes a RST, you can see the RST is destroying the state in conntrack:

    ~$ sudo conntrack -E -o timestamp
    [1438290214.284944] [NEW] tcp 6 120 SYN_SENT src=10.0.0.9 dst=10.0.0.10 sport=36397 dport=99 [UNREPLIED] src=10.0.0.10 dst=10.0.0.9 sport=99 dport=36397 zone=1
    [1438290214.285129] [DESTROY] tcp 6 src=10.0.0.9 dst=10.0.0.10 sport=36397 dport=99 [UNREPLIED] src=10.0.0.10 dst=10.0.0.9 sport=99 dport=36397 zone=1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/207464
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7e9b0e4ac53e83b18dd949564435710e86c7b81e
Submitter: Jenkins
Branch: master

commit 7e9b0e4ac53e83b18dd949564435710e86c7b81e
Author: Kevin Benton <email address hidden>
Date: Thu Jul 30 18:07:03 2015 -0700

    Use a conntrack zone per port in OVS

    Conntrack zones per network are not adequate because VMs
    on the same host communicating with each other cross iptables
    twice. If conntrack is sharing the same zone for each cross,
    the first one can remove the connection from the table on a RST
    and then the second one marks the RST as invalid.

    This patch adjusts the logic to use a conntrack zone per port
    instead of per network. In order to avoid interrupting upgrades
    or restarts, the initial zone map is built from the existing
    iptables rules so existing port->zone mappings are maintained.

    Closes-Bug: #1478925
    Change-Id: Ibe9e49653b2a280ea72cb95c2da64cd94c7739da

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/218710

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/pecan)
Download full text (155.6 KiB)

Reviewed: https://review.openstack.org/218710
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2c5f44e1b3bd4ed8a0b7232fd293b576cc8c1c87
Submitter: Jenkins
Branch: feature/pecan

commit f35d1c5c50dccbef1a2e079f967b82f0df0e22e9
Author: Adelina Tuvenie <email address hidden>
Date: Thu Aug 27 02:27:28 2015 -0700

    Fixes wrong neutron Hyper-V Agent name in constants

    Change Id03fb147e11541be309c1cd22ce27e70fadc28b5 moved the
    AGENT_TYPE_HYPERV constant from common.constants to
    plugins.ml2.drivers.hyperv.constants but change the value of the
    constant from 'HyperV agent' to 'hyperv'. This patch changes
    the name back to 'HyperV agent'

    Change-Id: If74b4b2a84811e266c8b12e70bf6bfe74ed4ea21
    Partial-Bug: #1487598

commit de604de334854e2eb6b4312ff57920564cbd4459
Author: OpenStack Proposal Bot <email address hidden>
Date: Sun Aug 30 01:39:06 2015 +0000

    Updated from global requirements

    Change-Id: Ie52aa3b59784722806726e4046bd07f4a4d97328

commit f0415ac20eaf5ab4abb9bd4839bf6d04ceee85d0
Author: armando-migliaccio <email address hidden>
Date: Fri Aug 28 13:53:04 2015 -0700

    Revert "Add support for unaddressed port"

    This implementation may expose a vulnerability where a malicious
    user can sieze the opportunity of a time window where a port
    may land unaddressed on a shared network, thus allowing him/her
    to suck up all the tenant traffic he/she wants....oh the shivers.

    This reverts commit d4c52b7f5a36a103a92bf9dcda7f371959112292.

    Change-Id: I7ebdaa8d3defa80eab90e460fde541a5bdd8864c

commit 013fdcd2a6d45dbe4de5d6e7077e5e9b60985ef9
Author: Assaf Muller <email address hidden>
Date: Fri Aug 28 16:41:07 2015 -0400

    Improve logging upon failure in iptables functional tests

    This will help us nail down a more accurate and efficient logstash
    query.

    Change-Id: Iee4238e358f7b056e373c7be8d6aa3202117a680
    Related-Bug: #1478847

commit 622dea818d851224a43d5276a81d5ce8a6eebb76
Author: Ivar Lazzaro <email address hidden>
Date: Mon Aug 17 17:17:42 2015 -0700

    handle gw_info outside of the db transaction on router creation

    Move the gateway interface creation outside the DB transaction
    to avoid lock timeout.

    Change-Id: I5a78d7f32e8ca912016978105221d5f34618af19
    Closes-bug: 1485809

commit 5b27d290a0a95f6247fc5a0fe6da1e7d905e6b2d
Author: Assaf Muller <email address hidden>
Date: Wed Aug 26 10:07:03 2015 -0400

    Remove ml2 resource extension success logging

    This is the cause of a tremendous amount of logs, for no
    perceivable gain. A normal dvr run in the gate shows this debug
    message around 120K times, which is way too much.

    Closes-Bug: #1489952

    Change-Id: I26fca8515d866a7cc1638d07fa33bc04479ae221

commit 8d3faf549cba2f58c872ef4121b2481e73464010
Author: huangpengtao <email address hidden>
Date: Fri Aug 28 23:20:46 2015 +0800

    Replace "prt" variable by "port"

    the local variable prt is meaningless,
    and port is used popular.

    Change-Id: I20849102cf5b4d84433c46791b4b1e2a22dc4739

commit ee374e7a5f4dea538fcd942f5...

tags: added: in-feature-pecan
Revision history for this message
Ioana-Madalina Patrichi (ioana-madalina-patrichi) wrote :

Thank you for the fix released! I can confirm that after upgrading to Kilo and applying the changes proposed I am able to get a Connection refused message when trying to connect to an unused port of an instance located on the same compute node.

Changed in neutron:
importance: Undecided → Medium
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → liberty-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-3 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.