Activity log for bug #1765980

Date Who What changed Old value New value Message
2018-04-21 16:44:43 Michael Sparmann bug added bug
2018-04-21 17:00:05 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2018-04-21 18:00:28 Michael Sparmann linux (Ubuntu): status Incomplete Confirmed
2018-04-21 18:13:41 Michael Sparmann description I've spent the last few days tracking down an issue where an attempt to shutdown an LXD container after several hours of host uptime on Ubuntu Bionic (4.15.0-15.16-generic) would cause a kworker thread to start spinning on one CPU core and all subsequent container start/stop operations to fail. The underlying issue is that a kworker thread (executing cleanup_net) spins in inet_frags_exit_net, waiting for sum_frag_mem_limit(nf) to become zero, which never happens becacuse it has underflowed to some negative multiple of 64. That kworker thread keeps holding net_mutex and therefore blocks any further container start/stops. That in turn is triggered by receiving a fragmented IPv6 MDNS packet in my instance, but it could probably be triggered by any fragmented IPv6 traffic. The reason for the frag mem limit counter to underflow is nf_ct_frag6_reasm deducting more from it than the sum of all previous nf_ct_frag6_queue calls added, due to pskb_expand_head (called through skb_unclone) adding a multiple of 64 to the SKB's truesize, due to kmalloc_reserve allocating some additional slack space to the buffer. Removing this line: size = SKB_WITH_OVERHEAD(ksize(data)); or making it conditional with nhead or ntail being nonzero works around the issue, but a proper fix for this seems complicated. There is already a comment saying "It is not generally safe to change skb->truesize." right about the offending modification of truesize, but the if statement guarding it apparently doesn't keep out all problematic cases. I'll leave figuring out the proper way to fix this to the maintainers of this area... ;) I've spent the last few days tracking down an issue where an attempt to shutdown an LXD container after several hours of host uptime on Ubuntu Bionic (4.15.0-15.16-generic) would cause a kworker thread to start spinning on one CPU core and all subsequent container start/stop operations to fail. The underlying issue is that a kworker thread (executing cleanup_net) spins in inet_frags_exit_net, waiting for sum_frag_mem_limit(nf) to become zero, which never happens becacuse it has underflowed to some negative multiple of 64. That kworker thread keeps holding net_mutex and therefore blocks any further container start/stops. That in turn is triggered by receiving a fragmented IPv6 MDNS packet in my instance, but it could probably be triggered by any fragmented IPv6 traffic. The reason for the frag mem limit counter to underflow is nf_ct_frag6_reasm deducting more from it than the sum of all previous nf_ct_frag6_queue calls added, due to pskb_expand_head (called through skb_unclone) adding a multiple of 64 to the SKB's truesize, due to kmalloc_reserve allocating some additional slack space to the buffer. Removing this line: size = SKB_WITH_OVERHEAD(ksize(data)); or making it conditional with nhead or ntail being nonzero works around the issue, but a proper fix for this seems complicated. There is already a comment saying "It is not generally safe to change skb->truesize." right above the offending modification of truesize, but the if statement guarding it apparently doesn't keep out all problematic cases. I'll leave figuring out the proper way to fix this to the maintainers of this area... ;)
2018-04-23 19:57:02 Joseph Salisbury linux (Ubuntu): importance Undecided Medium
2018-04-23 19:57:46 Joseph Salisbury tags bionic kernel-da-key
2018-04-23 19:58:11 Joseph Salisbury nominated for series Ubuntu Bionic
2018-04-23 19:58:11 Joseph Salisbury bug task added linux (Ubuntu Bionic)
2018-04-23 19:58:20 Joseph Salisbury linux (Ubuntu Bionic): status Confirmed Incomplete
2018-06-24 04:17:51 Launchpad Janitor linux (Ubuntu Bionic): status Incomplete Expired
2018-06-24 04:17:52 Launchpad Janitor linux (Ubuntu): status Incomplete Expired
2019-02-21 18:09:10 Stoiko Ivanov linux (Ubuntu Bionic): status Expired Confirmed