Ubuntu
linux package

IPv6 fragments with nf_conntrack_reasm loaded cause net_mutex deadlock upon LXD container shutdown

Bug #1765980 reported by Michael Sparmann on 2018-04-21

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Expired	Medium	Unassigned
	Bionic	Confirmed	Medium	Unassigned

Bug Description

I've spent the last few days tracking down an issue where an attempt to shutdown an LXD container after several hours of host uptime on Ubuntu Bionic (4.15.0-15.16-generic) would cause a kworker thread to start spinning on one CPU core and all subsequent container start/stop operations to fail.

The underlying issue is that a kworker thread (executing cleanup_net) spins in inet_frags_exit_net, waiting for sum_frag_mem_limit(nf) to become zero, which never happens becacuse it has underflowed to some negative multiple of 64. That kworker thread keeps holding net_mutex and therefore blocks any further container start/stops. That in turn is triggered by receiving a fragmented IPv6 MDNS packet in my instance, but it could probably be triggered by any fragmented IPv6 traffic.

The reason for the frag mem limit counter to underflow is nf_ct_frag6_reasm deducting more from it than the sum of all previous nf_ct_frag6_queue calls added, due to pskb_expand_head (called through skb_unclone) adding a multiple of 64 to the SKB's truesize, due to kmalloc_reserve allocating some additional slack space to the buffer.

Removing this line:
size = SKB_WITH_OVERHEAD(ksize(data));
or making it conditional with nhead or ntail being nonzero works around the issue, but a proper fix for this seems complicated.
There is already a comment saying "It is not generally safe to change skb->truesize." right above the offending modification of truesize, but the if statement guarding it apparently doesn't keep out all problematic cases.
I'll leave figuring out the proper way to fix this to the maintainers of this area... ;)

See original description

Tags:

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2018-04-21: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1765980

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Michael Sparmann (theseven) wrote on 2018-04-21:

The cause of the issue is already understood, and the machine currently isn't running an unmodified kernel for debugging reasons. Apport logs won't help here.
Contact me if you need specific information.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Michael Sparmann (theseven) on 2018-04-21

description:

updated

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-04-23:

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.16 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17-rc2

Changed in linux (Ubuntu):
importance:	Undecided → Medium
tags:	added: bionic kernel-da-key
Changed in linux (Ubuntu Bionic):
status:	Confirmed → Incomplete

Revision history for this message

Michael Sparmann (theseven) wrote on 2018-04-24:

So far I have not been able to reproduce it on the mainline kernel linked above.
However, given the intermittent nature of the problem, I'm not convinced that this was actually fixed.
The source code related to the underlying root cause looks unchanged, and the symptoms may well be hidden away for my load pattern by unrelated changing resulting in different kmalloc behavior.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-06-24:

[Expired for linux (Ubuntu Bionic) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Bionic):
status:	Incomplete → Expired

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-06-24:

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status:	Incomplete → Expired

Revision history for this message

Owen Valentine (chessmango) wrote on 2018-11-02:

This has expired, but affects me too here.
I'm specifically using Proxmox, which uses Ubuntu for upstream kernel - currently at 4.15.18, and I see the same symptoms.
Specifically LXC, specifically at stop and specifically kworker using 100% of one core and preventing startup of other containers.
I've found that this isn't present in Proxmox's own kernel 4.15.10-1 specifically.

Revision history for this message

Stoiko Ivanov (siv0) wrote on 2019-02-21:

While digging into this - I found the following commit which might contain a fix:
https://github.com/torvalds/linux/commit/ebaf39e6032faf77218220707fc3fa22487784e0

Changed in linux (Ubuntu Bionic):
status:	Expired → Confirmed

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-03-14:

It's is mainline since v4.20-rc6. Have you tried it?

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

IPv6 fragments with nf_conntrack_reasm loaded cause net_mutex deadlock upon LXD container shutdown

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
linux package