Activity log for bug #1853197

Date Who What changed Old value New value Message
2019-11-19 20:42:15 MIKE OLLIFF bug added bug
2019-11-19 20:42:15 MIKE OLLIFF attachment added additional data and patches https://bugs.launchpad.net/bugs/1853197/+attachment/5306500/+files/leakinfo.txt
2019-11-19 21:00:10 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2019-11-19 21:13:03 MIKE OLLIFF linux (Ubuntu): status Incomplete Confirmed
2019-11-19 21:40:52 MIKE OLLIFF description Ubuntu linux distro, 4.15.0-62 kernel, server platform. This OS is used as an IPSec VPN gateway. It serves up to several hundred concurrent connections In an attempt to upgrade from the 4.4 kernel to 4.15, the team noticed that VPN gateway VMs were running out of physical memory after 12-48 hours, depending on load. Attachments from a server machine in this state in attached leakinfo.txt output of free -t output of /proc/meminfo in out of memory condition output of /slabtop -o -sc /sys/kernel/debug/page_owner sorted and aggregated after server ran for 12 hrs and ran out of memory Patches for 4.15 and 5.4 Highlight from page_owner, we can see the leak is a buffer associated with the ipsec impelementation. Each connection leaks 32k of memory via alloc_page with order=3 Page allocated via order 3, mask 0x1085220(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP) get_page_from_freelist+0xd64/0x1250 __alloc_pages_nodemask+0x11c/0x2e0 alloc_pages_current+0x6a/0xe0 skb_page_frag_refill+0x71/0x100 esp_output_head+0x265/0x3e0 [esp4] esp_output+0xbc/0x180 [esp4] xfrm_output_resume+0x179/0x530 xfrm_output+0x8e/0x230 xfrm4_output_finish+0x2b/0x30 __xfrm4_output+0x3a/0x50 xfrm4_output+0x43/0xc0 ip_forward_finish+0x51/0x80 ip_forward+0x38a/0x480 ip_rcv_finish+0x122/0x410 ip_rcv+0x292/0x360 __netif_receive_skb_core+0x815/0xbd0 Patch to fix this issue in 4.15 (tested and verified on same server exhibiting above leak): diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 728272f..7842f83 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -451,6 +451,10 @@ static void xfrm_state_gc_destroy(struct xfrm_state *x) } xfrm_dev_state_free(x); security_xfrm_state_free(x); + + if(x->xfrag.page) + put_page(x->xfrag.page); + kfree(x); } Patch for master branch (5.4 I believe) from Paul Wouters (paul@nohats.ca) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index c6f3c4a1bd99..f3423562d933 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -495,6 +495,8 @@ static void ___xfrm_state_destroy(struct xfrm_state *x) x->type->destructor(x); xfrm_put_type(x->type); } + if (x->xfrag.page) + put_page(x->xfrag.page); xfrm_dev_state_free(x); security_xfrm_state_free(x); xfrm_state_free(x); Severity: Critical - we are unable to use any kernel later than 4.11, and are sticking with 4.4 in production. Ubuntu linux distro, 4.15.0-62 kernel, server platform. This OS is used as an IPSec VPN gateway. It serves up to several hundred concurrent connections In an attempt to upgrade from the 4.4 kernel to 4.15, the team noticed that VPN gateway VMs were running out of physical memory after 12-48 hours, depending on load. Attachments from a server machine in this state in attached leakinfo.txt output of free -t output of /proc/meminfo in out of memory condition output of /slabtop -o -sc /sys/kernel/debug/page_owner sorted and aggregated after server ran for 12 hrs and ran out of memory Patches for 4.15 and 5.4 Highlight from page_owner, we can see the leak is a buffer associated with the ipsec impelementation. Each connection leaks 32k of memory via alloc_page with order=3 100960 times: Page allocated via order 3, mask 0x1085220(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP)  get_page_from_freelist+0xd64/0x1250  __alloc_pages_nodemask+0x11c/0x2e0  alloc_pages_current+0x6a/0xe0  skb_page_frag_refill+0x71/0x100  esp_output_head+0x265/0x3e0 [esp4]  esp_output+0xbc/0x180 [esp4]  xfrm_output_resume+0x179/0x530  xfrm_output+0x8e/0x230  xfrm4_output_finish+0x2b/0x30  __xfrm4_output+0x3a/0x50  xfrm4_output+0x43/0xc0  ip_forward_finish+0x51/0x80  ip_forward+0x38a/0x480  ip_rcv_finish+0x122/0x410  ip_rcv+0x292/0x360  __netif_receive_skb_core+0x815/0xbd0 Patch to fix this issue in 4.15 (tested and verified on same server exhibiting above leak): diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 728272f..7842f83 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -451,6 +451,10 @@ static void xfrm_state_gc_destroy(struct xfrm_state *x)         }         xfrm_dev_state_free(x);         security_xfrm_state_free(x); + + if(x->xfrag.page) + put_page(x->xfrag.page); +         kfree(x); } Patch for master branch (5.4 I believe) from Paul Wouters (paul@nohats.ca) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index c6f3c4a1bd99..f3423562d933 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -495,6 +495,8 @@ static void ___xfrm_state_destroy(struct xfrm_state *x)                                 x->type->destructor(x);                                 xfrm_put_type(x->type);                 } + if (x->xfrag.page) + put_page(x->xfrag.page);                 xfrm_dev_state_free(x);                 security_xfrm_state_free(x);                 xfrm_state_free(x); Severity: Critical - we are unable to use any kernel later than 4.11, and are sticking with 4.4 in production.
2019-11-19 23:51:59 Terry Rudd bug added subscriber Terry Rudd
2019-11-29 11:24:48 Stefan Bader nominated for series Ubuntu Disco
2019-11-29 11:24:48 Stefan Bader bug task added linux (Ubuntu Disco)
2019-11-29 11:24:48 Stefan Bader nominated for series Ubuntu Eoan
2019-11-29 11:24:48 Stefan Bader bug task added linux (Ubuntu Eoan)
2019-11-29 11:24:48 Stefan Bader nominated for series Ubuntu Bionic
2019-11-29 11:24:48 Stefan Bader bug task added linux (Ubuntu Bionic)
2019-11-29 11:25:08 Stefan Bader linux (Ubuntu Bionic): importance Undecided High
2019-11-29 11:25:13 Stefan Bader linux (Ubuntu Disco): importance Undecided High
2019-11-29 11:25:16 Stefan Bader linux (Ubuntu Eoan): importance Undecided High
2019-11-29 11:25:26 Stefan Bader linux (Ubuntu Bionic): status New Triaged
2019-11-29 11:25:30 Stefan Bader linux (Ubuntu Disco): status New Triaged
2019-11-29 11:25:34 Stefan Bader linux (Ubuntu Eoan): status New Triaged
2019-11-29 11:27:37 Stefan Bader linux (Ubuntu): status Confirmed Invalid
2019-11-29 11:32:52 Stefan Bader description Ubuntu linux distro, 4.15.0-62 kernel, server platform. This OS is used as an IPSec VPN gateway. It serves up to several hundred concurrent connections In an attempt to upgrade from the 4.4 kernel to 4.15, the team noticed that VPN gateway VMs were running out of physical memory after 12-48 hours, depending on load. Attachments from a server machine in this state in attached leakinfo.txt output of free -t output of /proc/meminfo in out of memory condition output of /slabtop -o -sc /sys/kernel/debug/page_owner sorted and aggregated after server ran for 12 hrs and ran out of memory Patches for 4.15 and 5.4 Highlight from page_owner, we can see the leak is a buffer associated with the ipsec impelementation. Each connection leaks 32k of memory via alloc_page with order=3 100960 times: Page allocated via order 3, mask 0x1085220(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP)  get_page_from_freelist+0xd64/0x1250  __alloc_pages_nodemask+0x11c/0x2e0  alloc_pages_current+0x6a/0xe0  skb_page_frag_refill+0x71/0x100  esp_output_head+0x265/0x3e0 [esp4]  esp_output+0xbc/0x180 [esp4]  xfrm_output_resume+0x179/0x530  xfrm_output+0x8e/0x230  xfrm4_output_finish+0x2b/0x30  __xfrm4_output+0x3a/0x50  xfrm4_output+0x43/0xc0  ip_forward_finish+0x51/0x80  ip_forward+0x38a/0x480  ip_rcv_finish+0x122/0x410  ip_rcv+0x292/0x360  __netif_receive_skb_core+0x815/0xbd0 Patch to fix this issue in 4.15 (tested and verified on same server exhibiting above leak): diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 728272f..7842f83 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -451,6 +451,10 @@ static void xfrm_state_gc_destroy(struct xfrm_state *x)         }         xfrm_dev_state_free(x);         security_xfrm_state_free(x); + + if(x->xfrag.page) + put_page(x->xfrag.page); +         kfree(x); } Patch for master branch (5.4 I believe) from Paul Wouters (paul@nohats.ca) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index c6f3c4a1bd99..f3423562d933 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -495,6 +495,8 @@ static void ___xfrm_state_destroy(struct xfrm_state *x)                                 x->type->destructor(x);                                 xfrm_put_type(x->type);                 } + if (x->xfrag.page) + put_page(x->xfrag.page);                 xfrm_dev_state_free(x);                 security_xfrm_state_free(x);                 xfrm_state_free(x); Severity: Critical - we are unable to use any kernel later than 4.11, and are sticking with 4.4 in production. [SRU Justification] == Impact == An upstream change in v4.11 made xfrm loose memory (8 pages per ipsec connection). This was fixed in v5.4 by: commit 86c6739eda7d "xfrm: Fix memleak on xfrm state destroy" == Fix == Pick the upstream fix into all affected series. == Testcase == see below == Risk of Regression == Low, the change adds a single memory release case in one driver. The effect can be verified. --- Ubuntu linux distro, 4.15.0-62 kernel, server platform. This OS is used as an IPSec VPN gateway. It serves up to several hundred concurrent connections In an attempt to upgrade from the 4.4 kernel to 4.15, the team noticed that VPN gateway VMs were running out of physical memory after 12-48 hours, depending on load. Attachments from a server machine in this state in attached leakinfo.txt output of free -t output of /proc/meminfo in out of memory condition output of /slabtop -o -sc /sys/kernel/debug/page_owner sorted and aggregated after server ran for 12 hrs and ran out of memory Patches for 4.15 and 5.4 Highlight from page_owner, we can see the leak is a buffer associated with the ipsec impelementation. Each connection leaks 32k of memory via alloc_page with order=3 100960 times: Page allocated via order 3, mask 0x1085220(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP)  get_page_from_freelist+0xd64/0x1250  __alloc_pages_nodemask+0x11c/0x2e0  alloc_pages_current+0x6a/0xe0  skb_page_frag_refill+0x71/0x100  esp_output_head+0x265/0x3e0 [esp4]  esp_output+0xbc/0x180 [esp4]  xfrm_output_resume+0x179/0x530  xfrm_output+0x8e/0x230  xfrm4_output_finish+0x2b/0x30  __xfrm4_output+0x3a/0x50  xfrm4_output+0x43/0xc0  ip_forward_finish+0x51/0x80  ip_forward+0x38a/0x480  ip_rcv_finish+0x122/0x410  ip_rcv+0x292/0x360  __netif_receive_skb_core+0x815/0xbd0 Patch to fix this issue in 4.15 (tested and verified on same server exhibiting above leak): diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 728272f..7842f83 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -451,6 +451,10 @@ static void xfrm_state_gc_destroy(struct xfrm_state *x)         }         xfrm_dev_state_free(x);         security_xfrm_state_free(x); + + if(x->xfrag.page) + put_page(x->xfrag.page); +         kfree(x); } Patch for master branch (5.4 I believe) from Paul Wouters (paul@nohats.ca) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index c6f3c4a1bd99..f3423562d933 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -495,6 +495,8 @@ static void ___xfrm_state_destroy(struct xfrm_state *x)                                 x->type->destructor(x);                                 xfrm_put_type(x->type);                 } + if (x->xfrag.page) + put_page(x->xfrag.page);                 xfrm_dev_state_free(x);                 security_xfrm_state_free(x);                 xfrm_state_free(x); Severity: Critical - we are unable to use any kernel later than 4.11, and are sticking with 4.4 in production.
2019-11-29 11:34:24 Stefan Bader linux (Ubuntu Eoan): assignee Stefan Bader (smb)
2019-11-29 11:34:34 Stefan Bader linux (Ubuntu Disco): assignee Stefan Bader (smb)
2019-11-29 11:34:38 Stefan Bader linux (Ubuntu Bionic): assignee Stefan Bader (smb)
2019-11-29 12:30:52 Bernd Schütte bug added subscriber Bernd Schütte
2019-12-02 06:54:18 Khaled El Mously linux (Ubuntu Bionic): status Triaged Fix Committed
2019-12-02 06:54:21 Khaled El Mously linux (Ubuntu Disco): status Triaged Fix Committed
2019-12-02 06:54:23 Khaled El Mously linux (Ubuntu Eoan): status Triaged Fix Committed
2019-12-03 15:43:22 Ubuntu Kernel Bot tags ipsec kernel kernel-bug leak linux memory vpn ipsec kernel kernel-bug leak linux memory verification-needed-disco vpn
2019-12-03 15:44:57 Ubuntu Kernel Bot tags ipsec kernel kernel-bug leak linux memory verification-needed-disco vpn ipsec kernel kernel-bug leak linux memory verification-needed-bionic verification-needed-disco vpn
2019-12-05 11:27:32 Ubuntu Kernel Bot tags ipsec kernel kernel-bug leak linux memory verification-needed-bionic verification-needed-disco vpn ipsec kernel kernel-bug leak linux memory verification-needed-bionic verification-needed-disco verification-needed-eoan vpn
2019-12-09 07:38:05 Bernd Schütte linux (Ubuntu Bionic): status Fix Committed Confirmed
2019-12-10 06:27:14 Stefan Bader linux (Ubuntu Bionic): status Confirmed Fix Committed
2019-12-10 06:27:43 Stefan Bader tags ipsec kernel kernel-bug leak linux memory verification-needed-bionic verification-needed-disco verification-needed-eoan vpn ipsec kernel kernel-bug leak linux memory verification-done-bionic verification-needed-disco verification-needed-eoan vpn
2019-12-18 06:43:01 Aleksei tags ipsec kernel kernel-bug leak linux memory verification-done-bionic verification-needed-disco verification-needed-eoan vpn ipsec kernel kernel-bug leak linux memory verification-done-bionic verification-done-eoan verification-needed-disco vpn
2019-12-19 19:13:35 Khaled El Mously tags ipsec kernel kernel-bug leak linux memory verification-done-bionic verification-done-eoan verification-needed-disco vpn ipsec kernel kernel-bug leak linux memory verification-done-bionic verification-done-disco verification-done-eoan vpn
2020-01-06 12:53:38 Launchpad Janitor linux (Ubuntu Eoan): status Fix Committed Fix Released
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-14895
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-14896
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-14897
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-14901
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-18660
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-19055
2020-01-06 12:53:38 Launchpad Janitor cve linked 2019-19072
2020-01-06 13:12:44 Launchpad Janitor linux (Ubuntu Disco): status Fix Committed Fix Released
2020-01-06 13:12:44 Launchpad Janitor cve linked 2019-2214
2020-01-06 13:26:17 Launchpad Janitor linux (Ubuntu Bionic): status Fix Committed Fix Released
2020-01-06 13:26:17 Launchpad Janitor cve linked 2019-19083