IPVS incorrectly reverse-NATs traffic to LVS host

Bug #1681847 reported by Nick Moriarty
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

We have observed the following behaviour on our LVS systems, which is causing issues with our monitor scripts. The systems are running Ubuntu 14.04.5 LTS and I've tested both with the stock 3.13.0 kernel (-100 and -116) and the 4.4.0-72 xenial kernel.

Our systems are set up in direct routing mode for the services they handle. Our monitor scripts for DNS send test queries to our DNS servers using their real IPs. Sporadically, we have seen these checks fail as the DNS answers are coming back from the wrong IP address.

We debugged this using tcpdump, and found that the response packets coming into the LVS systems were using the correct IPs (i.e. the real IPs on the DNS servers). However, applications see the responses as coming from a VIP instead.

All of this has been established using UDP traffic.

I have tracked this behaviour down to a specific case, which I can only assume is associated with how the kernel handles LVS NAT connections (i.e. masquerade mode):
- If a DNS query is made on the LVS server to a DNS VIP, that creates an entry in the connection table, and is keyed on (srcip:sport -> dstip:dport) and associated with (vip:vport) - for example, (lvsip:50000 -> dnsip:53) associated with (dnsvip:53)
- If a subsequent DNS query is made from the same UDP port, the response is correct as seen by tcpdump
- When the response is seen by the application, the source IP address for the response is wrong

I have inferred that as the response comes back from dnsip:53 and there is a connection table entry, IPVS seems to assume NAT is in use, and translates it using the entry (lvsip:50000 -> dnsip:53). The application layer then sees the response from (dnsvip:53), which is incorrect.

/proc/version_signature (from both nodes):
  Ubuntu 3.13.0-100.147-generic 3.13.11-ckt39
  Ubuntu 4.4.0-72.93~14.04.1-generic 4.4.49

IPVS Configuration:

root@lvs5:~# ipvsadm -Sn
-A -t 144.32.128.183:25 -s wlc
-a -t 144.32.128.183:25 -r 144.32.129.29:25 -g -w 10
-a -t 144.32.128.183:25 -r 144.32.129.64:25 -g -w 10
-A -t 144.32.128.183:465 -s wlc
-a -t 144.32.128.183:465 -r 144.32.129.29:465 -g -w 10
-a -t 144.32.128.183:465 -r 144.32.129.64:465 -g -w 10
-A -t 144.32.128.183:587 -s wlc
-a -t 144.32.128.183:587 -r 144.32.129.29:587 -g -w 10
-a -t 144.32.128.183:587 -r 144.32.129.64:587 -g -w 10
-A -t 144.32.128.240:53 -s rr
-a -t 144.32.128.240:53 -r 144.32.128.227:53 -g -w 10
-A -t 144.32.128.242:53 -s rr
-a -t 144.32.128.242:53 -r 144.32.128.143:53 -g -w 10
-A -t 144.32.129.39:25 -s wlc
-a -t 144.32.129.39:25 -r 144.32.129.163:25 -g -w 10
-a -t 144.32.129.39:25 -r 144.32.129.164:25 -g -w 10
-A -t 144.32.129.39:465 -s wlc
-a -t 144.32.129.39:465 -r 144.32.129.163:465 -g -w 10
-a -t 144.32.129.39:465 -r 144.32.129.164:465 -g -w 10
-A -t 144.32.129.39:587 -s wlc
-a -t 144.32.129.39:587 -r 144.32.129.163:587 -g -w 10
-a -t 144.32.129.39:587 -r 144.32.129.164:587 -g -w 10
-A -u 144.32.128.240:53 -s rr
-a -u 144.32.128.240:53 -r 144.32.128.227:53 -g -w 10
-A -u 144.32.128.242:53 -s rr
-a -u 144.32.128.242:53 -r 144.32.128.143:53 -g -w 10
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Apr 11 13:18 seq
 crw-rw---- 1 root audio 116, 33 Apr 11 13:18 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.23
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=/dev/mapper/lvs5--vg-swap
IwConfig: Error: [Errno 2] No such file or directory
Lsusb:
 Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
 Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. PowerEdge R320
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-72-generic root=/dev/mapper/hostname--vg-root ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 4.4.0-72.93~14.04.1-generic 4.4.49
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-72-generic N/A
 linux-backports-modules-4.4.0-72-generic N/A
 linux-firmware 1.127.23
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
Uname: Linux 4.4.0-72-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: btcats daemon dba named oinstall yimsora
_MarkForUpload: True
dmi.bios.date: 01/29/2015
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.4.2
dmi.board.name: 0KM5PX
dmi.board.vendor: Dell Inc.
dmi.board.version: A02
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.4.2:bd01/29/2015:svnDellInc.:pnPowerEdgeR320:pvr:rvnDellInc.:rn0KM5PX:rvrA02:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R320
dmi.sys.vendor: Dell Inc.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1681847

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
description: updated
Revision history for this message
Nick Moriarty (nick-moriarty) wrote : BootDmesg.txt

apport information

tags: added: apport-collected trusty
description: updated
Revision history for this message
Nick Moriarty (nick-moriarty) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Nick Moriarty (nick-moriarty) wrote : Lspci.txt

apport information

Revision history for this message
Nick Moriarty (nick-moriarty) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Nick Moriarty (nick-moriarty) wrote : ProcEnviron.txt

apport information

Revision history for this message
Nick Moriarty (nick-moriarty) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Nick Moriarty (nick-moriarty) wrote : ProcModules.txt

apport information

Revision history for this message
Nick Moriarty (nick-moriarty) wrote : UdevDb.txt

apport information

Revision history for this message
Nick Moriarty (nick-moriarty) wrote : UdevLog.txt

apport information

Revision history for this message
Nick Moriarty (nick-moriarty) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc6

tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Nick Moriarty (nick-moriarty) wrote :

It's hard to tell whether this was the result of an upgrade - I think this behaviour has always been present in the 14.04 stock kernels, but we noticed it less until recently.

I'll look into testing the latest upstream kernel and get back to you.

If I get a chance I'll also try to look through the kernel source where we know the problem exists - I expect it relates to the code that handles the return path for NAT (masq) services, in relation to how it treats entries in the IPVS connection table.

Thanks

Revision history for this message
Nick Moriarty (nick-moriarty) wrote :

I've tested this with the latest upstream kernel (4.11.0-041100rc6), and the problem is still present.

tags: added: kernel-bug-exists-upstream
Revision history for this message
Nick Moriarty (nick-moriarty) wrote :

I think I may have tracked this down, but I haven't had a go at patching it yet.

In net/netfilter/ipvs/ip_vs_core.c - in handle_response():
        /* mangle the packet */
        if (pp->snat_handler && !pp->snat_handler(skb, pp, cp, iph))
                goto drop;

This calls the protocol-specific SNAT handler - however, it doesn't first check that (IP_VS_FWD_METHOD(cp) == IP_VS_CONN_F_MASQ), as is done in a couple of the other netfilter modules.

I think this means SNAT is *always* done on response packets, even if the connection information was for a non-NAT connection.

I'll update the issue once I've had a go at patching this.

Revision history for this message
Nick Moriarty (nick-moriarty) wrote :

I've attempted to patch this out by adding checks where snat_handler and dnat_handler are called (ip_vs_core.c and ip_vs_xmit.c), with no success. I have to surmise that either:
- My patches aren't being built correctly
- My checks don't work
- This isn't the code that's mangling the packets
- The connection table entries are being marked as MASQ connections even though they're DROUTE

The extra checks take the form (referring to the instance in the previous comment):
if (pp->snat_handler && (IP_VS_FWD_METHOD(cp) == IP_VS_CONN_F_MASQ) &&
    !pp->snat_handler(skb, pp, cp, iph))
 goto drop;

Similar guards were put on the dnat_handler calls. This should cause the NAT handler to be ignored unless the connection information (cp) is for a MASQ connection.

I should also note, in case it is useful, that in include/uapi/linux/ip_vs.h MASQ connections are marked with a value of 0 in the relevant bits of cp->flags.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Confirmed → Triaged
tags: added: kernel-bug-reported-upstream
Revision history for this message
Nick Moriarty (nick-moriarty) wrote :

This was reported as an upstream kernel bug:
http://archive.linuxvirtualserver.org/html/lvs-devel/2017-04/msg00014.html

An IPVS kernel developer has responded to the issue and a patch has been tested.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.