pptp vpn doesn't work with openvswitch firewall

Bug #1833175 reported by Yang Li
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
In Progress
Undecided
Yang Li

Bug Description

I have a VM with pptp vpn installed, and the vpn server is running. And I use security group rules like this:
# neutron security-group-rule-create --protocol tcp --port-range-min 0 --port-range-max 65535 --remote-ip-prefix 0.0.0.0/0 --direction ingress all

# neutron security-group-rule-create --protocol 47 --remote-ip-prefix 0.0.0.0/0 --direction egress all

# neutron security-group-rule-create --protocol 47 --remote-ip-prefix 0.0.0.0/0 --direction ingress all

Then I create a new VM to connect with vpn server, but seems there is a problem in the connectivity that no reply from the vpn server. I captured the vpn server's tap device with tcpdump, the information like this:
16:09:15.486548 fa:16:3e:26:7f:fe > fa:16:3e:e2:bd:f9, ethertype IPv4 (0x0800), length 75: 192.168.111.57 > 192.168.111.45: GREv1, call 0, seq 4, proto PPP (0x880b), length 41: LCP (0xc021), length 29: LCP, Conf-Request (0x01), id 1, length 27
16:09:18.490483 fa:16:3e:26:7f:fe > fa:16:3e:e2:bd:f9, ethertype IPv4 (0x0800), length 75: 192.168.111.57 > 192.168.111.45: GREv1, call 0, seq 5, proto PPP (0x880b), length 41: LCP (0xc021), length 29: LCP, Conf-Request (0x01), id 1, length 27
16:09:21.494344 fa:16:3e:26:7f:fe > fa:16:3e:e2:bd:f9, ethertype IPv4 (0x0800), length 75: 192.168.111.57 > 192.168.111.45: GREv1, call 0, seq 6, proto PPP (0x880b), length 41: LCP (0xc021), length 29: LCP, Conf-Request (0x01), id 1, length 27
16:09:24.498097 fa:16:3e:26:7f:fe > fa:16:3e:e2:bd:f9, ethertype IPv4 (0x0800), length 75: 192.168.111.57 > 192.168.111.45: GREv1, call 0, seq 7, proto PPP (0x880b), length 41: LCP (0xc021), length 29: LCP, Conf-Request (0x01), id 1, length 27
16:09:27.501446 fa:16:3e:26:7f:fe > fa:16:3e:e2:bd:f9, ethertype IPv4 (0x0800), length 75: 192.168.111.57 > 192.168.111.45: GREv1, call 0, seq 8, proto PPP (0x880b), length 41: LCP (0xc021), length 29: LCP, Conf-Request (0x01), id 1, length 27
16:09:30.504937 fa:16:3e:26:7f:fe > fa:16:3e:e2:bd:f9, ethertype IPv4 (0x0800), length 75: 192.168.111.57 > 192.168.111.45: GREv1, call 0, seq 9, proto PPP (0x880b), length 41: LCP (0xc021), length 29: LCP, Conf-Request (0x01), id 1, length 27

Seems the vpn server has replied the packet, but these packets are dropped somewhere. After some investigation, I found the packets are set to mark=1:
47,orig=(src=192.168.111.57,dst=192.168.111.45,sport=3840,dport=0),reply=(src=192.168.111.45,dst=192.168.111.57,sport=0,dport=3840),zone=2,mark=1

This flow set the incoming packet to mark=1:
 cookie=0x8bbcb4f28e827fee, duration=81.097s, table=82, n_packets=2, n_bytes=158, idle_age=1, priority=40,ct_state=+est,ip,reg5=0x23 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))

When I add 2 flows into br-int, the connectivity will be normal:
cookie=0x90b502a419df13a0, duration=0.477s, table=72, n_packets=0, n_bytes=0, idle_age=115, priority=70,ct_state=+est,ip,reg5=0x23,nw_proto=47 actions=resubmit(,73)
cookie=0x90b502a419df13a0, duration=0.450s, table=82, n_packets=0, n_bytes=0, idle_age=115, priority=70,ct_state=+est,ip,reg5=0x23,nw_proto=47 actions=NORMAL

Because pptp vpn is based on Grev1, I think there is a problem with Grev1.

BTW, if I create gre tunnel between VMs, the connectivity between them is fine. The gre tunnel is based on Grev0, so seems there is no problem with Grev0.

Tags: ovs-fw
Yang Li (yang-li)
description: updated
Yang Li (yang-li)
description: updated
tags: added: ovs-fw
Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

can you show all rules in your security group?

can you provide the dump of all flows?

Revision history for this message
LIU Yulong (dragon889) wrote :

Hi,
Could you please confirm if enable kod 'nf_conntrack_proto_gre' can fix the issue, or not?

Revision history for this message
LIU Yulong (dragon889) wrote :
Revision history for this message
Yang Li (yang-li) wrote :

hi yulong:
Before this test, I have modprobe ip_gre, ip_nat_pptp, ip_conntrack_pptp, in VMs and Hosts, then lsmod showed that the nf_conntrack_proto_gre is inuse:
# lsmod | grep pptp
nf_nat_pptp 13115 0
nf_nat_proto_gre 13009 1 nf_nat_pptp
nf_conntrack_pptp 19257 1 nf_nat_pptp
nf_conntrack_proto_gre 14434 1 nf_conntrack_pptp
nf_nat 26787 4 nf_nat_proto_gre,nf_nat_ipv4,nf_nat_ipv6,nf_nat_pptp
nf_conntrack 133053 9 nf_conntrack_proto_gre,nf_nat,nf_nat_ipv4,nf_nat_ipv6,nf_nat_pptp,xt_conntrack,nf_conntrack_ipv4,nf_conntrack_ipv6,nf_conntrack_pptp

And the gre tunnel worked fine, but pptp vpn tunnel was blocked, the difference between them is, the gre tunnel is based on Grev0, the pptp vpn is based on Grev1, seems the Grev1 tunnel connect status cannot be recognized normally.

Revision history for this message
Yang Li (yang-li) wrote :
Download full text (4.2 KiB)

My sg rule in the security group is:
# neutron security-group-rule-create --protocol tcp --port-range-min 0 --port-range-max 65535 --remote-ip-prefix 0.0.0.0/0 --direction ingress all

# neutron security-group-rule-create --protocol 47 --remote-ip-prefix 0.0.0.0/0 --direction egress all

# neutron security-group-rule-create --protocol 47 --remote-ip-prefix 0.0.0.0/0 --direction ingress all

The protocol 47 is for gre tunnel.

The count flows are too many, I captured the table 72 and table 82 flows without tcp:
# ovs-ofctl dump-flows br-int | grep table=72
 cookie=0xbab10b2068622b63, duration=260.878s, table=71, n_packets=0, n_bytes=0, idle_age=461, priority=65,ct_state=-trk,ip,reg5=0x13,in_port=19,dl_src=fa:16:3e:13:63:68,nw_src=192.168.111.17 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
 cookie=0xbab10b2068622b63, duration=260.878s, table=71, n_packets=0, n_bytes=0, idle_age=461, priority=65,ct_state=-trk,ipv6,reg5=0x13,in_port=19,dl_src=fa:16:3e:13:63:68,ipv6_src=fe80::f816:3eff:fe13:6368 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
 cookie=0xbab10b2068622b63, duration=260.869s, table=72, n_packets=0, n_bytes=0, idle_age=461, priority=50,ct_state=+inv+trk actions=drop
 cookie=0xbab10b2068622b63, duration=260.869s, table=72, n_packets=0, n_bytes=0, idle_age=461, priority=50,ct_mark=0x1,reg5=0x13 actions=drop
 cookie=0xbab10b2068622b63, duration=260.869s, table=72, n_packets=0, n_bytes=0, idle_age=461, priority=50,ct_state=+est-rel+rpl,ct_zone=10,ct_mark=0,reg5=0x13 actions=NORMAL
 cookie=0xbab10b2068622b63, duration=260.868s, table=72, n_packets=0, n_bytes=0, idle_age=461, priority=50,ct_state=-new-est+rel-inv,ct_zone=10,ct_mark=0,reg5=0x13 actions=NORMAL
 cookie=0xbab10b2068622b63, duration=260.868s, table=72, n_packets=0, n_bytes=0, idle_age=461, priority=40,ct_state=-est,reg5=0x13 actions=drop
 cookie=0xbab10b2068622b63, duration=260.866s, table=72, n_packets=0, n_bytes=0, idle_age=461, priority=40,ct_state=+est,ip,reg5=0x13 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))
 cookie=0xbab10b2068622b63, duration=260.866s, table=72, n_packets=0, n_bytes=0, idle_age=461, priority=40,ct_state=+est,ipv6,reg5=0x13 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))

# ovs-ofctl dump-flows br-int | grep table=82
 cookie=0xbab10b2068622b63, duration=264.578s, table=81, n_packets=0, n_bytes=0, idle_age=465, priority=90,ct_state=-trk,ip,reg5=0x13 actions=ct(table=82,zone=NXM_NX_REG6[0..15])
 cookie=0xbab10b2068622b63, duration=264.577s, table=81, n_packets=0, n_bytes=0, idle_age=465, priority=90,ct_state=-trk,ipv6,reg5=0x13 actions=ct(table=82,zone=NXM_NX_REG6[0..15])
 cookie=0xbab10b2068622b63, duration=264.567s, table=82, n_packets=0, n_bytes=0, idle_age=465, priority=70,ct_state=+est-rel-rpl,ip,reg5=0x13,nw_proto=47 actions=NORMAL
 cookie=0xbab10b2068622b63, duration=264.567s, table=82, n_packets=0, n_bytes=0, idle_age=465, priority=70,ct_state=+new-est,ip,reg5=0x13,nw_proto=47 actions=ct(commit,zone=NXM_NX_REG6[0..15]),NORMAL
 cookie=0xbab10b2068622b63, duration=264.577s, table=82, n_packets=0, n_bytes=0, idle_age=465, priority=50,ct_state=+inv+trk actions=drop
 cookie=0xbab10b2068622b63, durat...

Read more...

Revision history for this message
Yang Li (yang-li) wrote :

I have a private patch to solve this problem, but apparently the patch is only for gre :(

neutron/agent/linux/openvswitch_firewall/rules.py
 def create_accept_flows(flow):
     flow['ct_state'] = CT_STATES[0]
     result = [flow.copy()]
+ if flow.get('nw_proto') and flow['nw_proto'] == '47':
+ gre_flow = flow.copy()
+ gre_flow['ct_state'] = ovsfw_consts.OF_STATE_ESTABLISHED
+ result.append(gre_flow)
     flow['ct_state'] = CT_STATES[1]
     if flow['table'] == ovs_consts.RULES_INGRESS_TABLE:

Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

netfilter has a specific support for grev1 and even pptp control messages.
i suspect its connection state behaves a little differently from what ovs-fw expects.
have you investigated how connection tracking info looks like? (eg. "conntrack -L")

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/668569

Changed in neutron:
assignee: nobody → Yang Li (yang-li)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Yang Li (<email address hidden>) on branch: master
Review: https://review.opendev.org/668569
Reason: we should find out the root cause

Revision history for this message
Jesse (jesse-5) wrote :
Download full text (3.1 KiB)

Hi all,
I checked the kernel conntrack pptp module codes and OvS conntrack codes. It seems OvS conntrack don't support GREv1. To be more precise, OvS conntrack do not support reply (+rpl) state for GREv1 from my understanding.

To support pptp in conntrack in kernel, there is nf_conntrack_pptp module. It check pptp control messages to set conntrack record. Following is the GREv1 conntrack record.

# cat /proc/net/nf_conntrack | grep 192.168.111.68
ipv4 2 gre 47 17938 timeout=600, stream_timeout=18000 src=192.168.111.81 dst=192.168.111.68 srckey=0x0 dstkey=0x800 src=192.168.111.68 dst=192.168.111.81 srckey=0x800 dstkey=0x0 [ASSURED] mark=1 zone=4 use=2

There is srckey and dstkey in this record instead of source port and dst port for tcp/udp.

the srrkey and dstkey comes from pptp control message, like bellow:

14:16:52.336212 fa:16:3e:ce:7e:65 > fa:16:3e:0b:26:9a, ethertype IPv4 (0x0800), length 98: 192.168.111.68.1723 > 192.168.111.81.57468: Flags [P.], seq 157:189, ack 325, win 243, options [nop,nop,TS val 9992152 ecr 762174], length 32: pptp CTRL_MSGTYPE=OCRP CALL_ID(2048) PEER_CALL_ID(0) RESULT_CODE(1) ERR_CODE(0) CAUSE_CODE(0) CONN_SPEED(10000000) RECV_WIN(3) PROC_DELAY(0) PHY_CHAN_ID(0)

The srckey is CALL_ID and dstkey is PEER_CALL_ID.

But in GREv1 packets, there is only call id for one packet.

14:16:52.337267 fa:16:3e:0b:26:9a > fa:16:3e:ce:7e:65, ethertype IPv4 (0x0800), length 70: 192.168.111.81 > 192.168.111.68: GREv1, call 2048, seq 1, proto PPP (0x880b), length 36: LCP (0xc021), length 24: LCP, Conf-Request (0x01), id 1, length 22
14:16:52.340382 fa:16:3e:ce:7e:65 > fa:16:3e:0b:26:9a, ethertype IPv4 (0x0800), length 75: 192.168.111.68 > 192.168.111.81: GREv1, call 0, seq 0, proto PPP (0x880b), length 41: LCP (0xc021), length 29: LCP, Conf-Request (0x01), id 1, length 27
14:16:52.340469 fa:16:3e:ce:7e:65 > fa:16:3e:0b:26:9a, ethertype IPv4 (0x0800), length 74: 192.168.111.68 > 192.168.111.81: GREv1, call 0, seq 1, ack 1, proto PPP (0x880b), length 40: LCP (0xc021), length 24: LCP, Conf-Ack (0x02), id 1, length 22
14:16:55.342676 fa:16:3e:ce:7e:65 > fa:16:3e:0b:26:9a, ethertype IPv4 (0x0800), length 75: 192.168.111.68 > 192.168.111.81: GREv1, call 0, seq 2, proto PPP (0x880b), length 41: LCP (0xc021), length 29: LCP, Conf-Request (0x01), id 1, length 27

You can see client use call id 2048 (0x800) and server reply use call id 0 (0x0).

You can see kernel conntrack pptp module use this to check request and reply for GREv1 packets:
https://elixir.bootlin.com/linux/v3.11.5/source/net/netfilter/nf_conntrack_pptp.c#L573

But in OvS, it only check source ip, dst ip, source port, dst port to check whether it's reply packet.
https://github.com/openvswitch/ovs/blob/e32cd4c6292e81d047bafa882f0a1d1f3e7dc1f0/lib/conntrack.c#L399

So OvS cannot set reply (+rpl) state for the GREv1 reply packet. and OF_STATE_ESTABLISHED_REPLY = "+est-rel+rpl" cannot match the GREv1 reply packet, but OF_STATE_NOT_ESTABLISHED = "-est" can match.

So to fix this bug, we can submit codes for OvS to support pptp, it seems not easy, or use this patch: https://review.opendev.org/668569
I'm not security expert, but it seems there is no security risk for this ...

Read more...

Revision history for this message
Brian Haley (brian-haley) wrote :

But there is a security risk for the patch you mentioned - it's going to allow GRE into every instance, without a way to disable it. If there's a bug in the OVS or other kernel code then we should get it fixed. If there is a way neutron can instantiate a flow to work around it that would be OK too, but it doesn't look like that is the case?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.