Ubuntu linux distro, 4.15.0-62 kernel, server platform.
This OS is used as an IPSec VPN gateway. It serves up to several hundred concurrent connections
In an attempt to upgrade from the 4.4 kernel to 4.15, the team noticed that VPN gateway VMs were running out of physical memory after 12-48 hours, depending on load.
Attachments from a server machine in this state in attached leakinfo.txt
output of free -t
output of /proc/meminfo in out of memory condition
output of /slabtop -o -sc
/sys/kernel/debug/page_owner sorted and aggregated after server ran for 12 hrs and ran out of memory
Patches for 4.15 and 5.4
Highlight from page_owner, we can see the leak is a buffer associated with the ipsec impelementation. Each connection leaks 32k of memory via alloc_page with order=3
Ubuntu linux distro, 4.15.0-62 kernel, server platform.
This OS is used as an IPSec VPN gateway. It serves up to several hundred concurrent connections
In an attempt to upgrade from the 4.4 kernel to 4.15, the team noticed that VPN gateway VMs were running out of physical memory after 12-48 hours, depending on load.
Attachments from a server machine in this state in attached leakinfo.txt debug/page_ owner sorted and aggregated after server ran for 12 hrs and ran out of memory
output of free -t
output of /proc/meminfo in out of memory condition
output of /slabtop -o -sc
/sys/kernel/
Patches for 4.15 and 5.4
Highlight from page_owner, we can see the leak is a buffer associated with the ipsec impelementation. Each connection leaks 32k of memory via alloc_page with order=3
Page allocated via order 3, mask 0x1085220( GFP_ATOMIC| __GFP_NOWARN| __GFP_NORETRY| __GFP_COMP) from_freelist+ 0xd64/0x1250 pages_nodemask+ 0x11c/0x2e0 pages_current+ 0x6a/0xe0 frag_refill+ 0x71/0x100 head+0x265/ 0x3e0 [esp4] 0xbc/0x180 [esp4] resume+ 0x179/0x530 0x8e/0x230 output_ finish+ 0x2b/0x30 output+ 0x3a/0x50 output+ 0x43/0xc0 finish+ 0x51/0x80 0x38a/0x480 finish+ 0x122/0x410 receive_ skb_core+ 0x815/0xbd0
get_page_
__alloc_
alloc_
skb_page_
esp_output_
esp_output+
xfrm_output_
xfrm_output+
xfrm4_
__xfrm4_
xfrm4_
ip_forward_
ip_forward+
ip_rcv_
ip_rcv+0x292/0x360
__netif_
Patch to fix this issue in 4.15 (tested and verified on same server exhibiting above leak): xfrm_state. c b/net/xfrm/ xfrm_state. c xfrm_state. c xfrm_state. c gc_destroy( struct xfrm_state *x)
xfrm_dev_ state_free( x);
security_ xfrm_state_ free(x) ; x->xfrag. page);
diff --git a/net/xfrm/
index 728272f..7842f83 100644
--- a/net/xfrm/
+++ b/net/xfrm/
@@ -451,6 +451,10 @@ static void xfrm_state_
}
+
+ if(x->xfrag.page)
+ put_page(
+
kfree(x);
}
Patch for master branch (5.4 I believe) from Paul Wouters (<email address hidden>)
diff --git a/net/xfrm/ xfrm_state. c b/net/xfrm/ xfrm_state. c .f3423562d933 100644 xfrm_state. c xfrm_state. c state_destroy( struct xfrm_state *x)
x-> type->destructo r(x);
xfrm_ put_type( x->type) ; x->xfrag. page);
xfrm_ dev_state_ free(x) ;
security_ xfrm_state_ free(x) ;
xfrm_ state_free( x);
index c6f3c4a1bd99.
--- a/net/xfrm/
+++ b/net/xfrm/
@@ -495,6 +495,8 @@ static void ___xfrm_
}
+ if (x->xfrag.page)
+ put_page(
Severity: Critical - we are unable to use any kernel later than 4.11, and are sticking with 4.4 in production.