Ubuntu
linux package

4.4.0-145-generic Kernel Panic ip6_expire_frag_queue

Disco (19.04)
Bug #1824687

Bug #1824687 reported by Dirk on 2019-04-14

This bug affects 2 people

	Status	Importance	Assigned to
linux (Ubuntu)	Invalid	High	Unassigned
Xenial	Fix Released	High	Stefan Bader
Cosmic	Invalid	High	Unassigned
Disco	Won't Fix	High	Unassigned

Bug Description

[SRU Justification]

== Impact ==

Since 05c0b86b96 "ipv6: frags: rewrite ip6_expire_frag_queue()" the 16.04/4.4 kernel crashes whenever that functions gets called (on busy systems this can be every 3-4 hours). While this potentially affects Cosmic and later, too, the fix differs on later kernels (Bionic is not yet affected as it does not yet carry updates to the frags handling).

== Fix ==

For Xenial and Cosmic, the proposed fix would be additional changes to ip6_expipre_frag_queue(), taken from follow-up changes to ip_expire().
For Disco, I would hold back because we have a backlog of stable patches there and depending on what got backported to 5.0.y there would be a simpler fix.
For current development kernels, one just needs to ensure that the following upstream change is included: 47d3d7fdb10a "ip6: fix skb leak in ip6frag_expire_frag_queue()".

== Testcase ==

Unfortunately this could not be re-created locally. But a test kernel which had the proposed fix applied was showing good testing (see comment #37 and #38).

== Risk of Regression ==

The modified function is only called in rare cases and the positive testing in production would cover this. So I would consider it low.

---

Description: Ubuntu 16.04.6 LTS
Release: 16.04

After upgrading our server to this Kernel we experience frequent Kernel panics (Attachment).
Every 3 hours.
Our machine has a throuput of about 600 Mbits/s
The Panics are around the area of ip6_expire_frag_queue.

  __pskb_pull_tail
  ip6_dst_lookup_tail
  _decode_session6
  __xfrm_decode_session
  icmpv6_route_lookup
  icmp6_send

It seems similar to Bug Report in Debian.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=922488

According to the bug finder of above bug it also occurred after using a Kernel with the change of
rewrite ip6_expire_frag_queue()

Intermediate solution. We disabled IPv6 on this machine to avoid further Panics.
Please let me know what information is missing. The ubuntu-bug linux was send. And I hope it is attached to this report.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-145-generic 4.4.0-145.171
ProcVersionSignature: Ubuntu 4.4.0-145.171-generic 4.4.176
Uname: Linux 4.4.0-145-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: amd64
Date: Sun Apr 14 11:40:11 2019
InstallationDate: Installed on 2018-03-18 (391 days ago)
InstallationMedia: Ubuntu-Server 16.04.4 LTS "Xenial Xerus" - Release amd64 (20180228)
ProcEnviron:
LANGUAGE=en_GB:en
TERM=xterm-256color
PATH=(custom, no user)
LANG=en_GB.UTF-8
SHELL=/bin/bash
SourcePackage: linux-signed
UpgradeStatus: Upgraded to xenial on 2018-10-21 (174 days ago)
---
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Apr 12 21:04 seq
crw-rw---- 1 root audio 116, 33 Apr 12 21:04 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 16.04
HibernationDevice: RESUME=/dev/mapper/tor3--vg-swap_1
InstallationDate: Installed on 2018-03-18 (393 days ago)
InstallationMedia: Ubuntu-Server 16.04.4 LTS "Xenial Xerus" - Release amd64 (20180228)
IwConfig: Error: [Errno 2] No such file or directory
Lsusb:
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0557:2221 ATEN International Co., Ltd Winbond Hermon
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
LANGUAGE=en_GB:en
TERM=xterm-256color
PATH=(custom, no user)
LANG=en_GB.UTF-8
SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-145-generic root=/dev/mapper/hostname--vg-root ro
ProcVersionSignature: Ubuntu 4.4.0-145.171-generic 4.4.176
RelatedPackageVersions:
linux-restricted-modules-4.4.0-145-generic N/A
linux-backports-modules-4.4.0-145-generic N/A
linux-firmware 1.157.21
RfKill: Error: [Errno 2] No such file or directory
Tags: xenial xenial
Uname: Linux 4.4.0-145-generic x86_64
UpgradeStatus: Upgraded to xenial on 2018-10-21 (176 days ago)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 10/08/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.0c
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: X9SRE/X9SRE-3F/X9SRi/X9SRi-3F
dmi.board.vendor: Supermicro
dmi.board.version: 1.2
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.0c:bd10/08/2012:svnSupermicro:pnX9SRE/X9SRE-3F/X9SRi/X9SRi-3F:pvr0123456789:rvnSupermicro:rnX9SRE/X9SRE-3F/X9SRi/X9SRi-3F:rvr1.2:cvnSupermicro:ct3:cvr0123456789:
dmi.product.name: X9SRE/X9SRE-3F/X9SRi/X9SRi-3F
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

See original description

Tags:

CVE References

Revision history for this message

Dirk (iggs) wrote on 2019-04-14:

Screenshot_2019-04-12_18-28-33.png Edit (45.2 KiB, image/png)
Dependencies.txt Edit (2.8 KiB, text/plain; charset="utf-8")
ProcCpuinfoMinimal.txt Edit (1.0 KiB, text/plain; charset="utf-8")

Stefan Bader (smb) on 2019-04-15

affects:

linux-signed (Ubuntu) → linux (Ubuntu)

Revision history for this message

Stefan Bader (smb) wrote on 2019-04-15:

Which kernel version was used before (and did not show this crash)? Can you reproduce the issue on a non-production server (which would allow to experiment with the HWE (4.15) kernel)?

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2019-04-15: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1824687

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete
Changed in linux (Ubuntu Xenial):
status:	New → Incomplete

Revision history for this message

Dirk (iggs) wrote on 2019-04-15: CRDA.txt

CRDA.txt Edit (392 bytes, text/plain)

apport information

tags:	added: apport-collected
description:	updated

Revision history for this message

Dirk (iggs) wrote on 2019-04-15: CurrentDmesg.txt

CurrentDmesg.txt Edit (63.7 KiB, text/plain)

apport information

Revision history for this message

Dirk (iggs) wrote on 2019-04-15: Lspci.txt

Lspci.txt Edit (78.5 KiB, text/plain)

apport information

Revision history for this message

Dirk (iggs) wrote on 2019-04-15: ProcCpuinfo.txt

ProcCpuinfo.txt Edit (4.0 KiB, text/plain)

apport information

Revision history for this message

Dirk (iggs) wrote on 2019-04-15: ProcCpuinfoMinimal.txt

ProcCpuinfoMinimal.txt Edit (1.0 KiB, text/plain)

apport information

Revision history for this message

Dirk (iggs) wrote on 2019-04-15: ProcInterrupts.txt

ProcInterrupts.txt Edit (3.7 KiB, text/plain)

apport information

Revision history for this message

Dirk (iggs) wrote on 2019-04-15: ProcModules.txt

#10

ProcModules.txt Edit (4.9 KiB, text/plain)

apport information

Revision history for this message

Dirk (iggs) wrote on 2019-04-15: UdevDb.txt

#11

UdevDb.txt Edit (168.2 KiB, text/plain)

apport information

Revision history for this message

Dirk (iggs) wrote on 2019-04-15: WifiSyslog.txt

#12

WifiSyslog.txt Edit (73.9 KiB, text/plain)

apport information

Revision history for this message

Dirk (iggs) wrote on 2019-04-15:

#13

added logs of apport-collect 1824687

and then change the status of the bug to 'Confirmed'.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Dirk (iggs) wrote on 2019-04-15:

#14

regarding #2 I do not know which kernel ran before.

I assume linux-image-4.4.0-143 due to following apt history logs.
I however do not know if we rebooted.

I have a different type of server here with ubuntu. I can try to stress it and to see.
But I doubt I get the same quality of traffic since the other machine is an tor exit.

Start-Date: 2019-04-03 06:09:30
Commandline: /usr/bin/unattended-upgrade
Install: linux-modules-extra-4.4.0-145-generic:amd64 (4.4.0-145.171, automatic), linux-modules-4.4.0-145-generic:amd64 (4.4.0-145.171, automatic), linux-headers-4.4.0-145:amd64 (4.4.0-145.171, automatic), linux-image-4.4.0-145-generic:amd64 (4.4.0-145.171, automatic), linux-headers-4.4.0-145-generic:amd64 (4.4.0-145.171, automatic)
Upgrade: linux-headers-generic:amd64 (4.4.0.143.151, 4.4.0.145.153), linux-image-generic:amd64 (4.4.0.143.151, 4.4.0.145.153), linux-generic:amd64 (4.4.0.143.151, 4.4.0.145.153)
End-Date: 2019-04-03 06:10:20

Start-Date: 2019-03-17 06:21:38
Commandline: /usr/bin/unattended-upgrade
Install: linux-modules-4.4.0-143-generic:amd64 (4.4.0-143.169, automatic), linux-headers-4.4.0-143:amd64 (4.4.0-143.169, automatic), linux-image-4.4.0-143-generic:amd64 (4.4.0-143.169, automatic), linux-headers-4.4.0-143-generic:amd64 (4.4.0-143.169, automatic), linux-modules-extra-4.4.0-143-generic:amd64 (4.4.0-143.169, automatic)
Upgrade: linux-headers-generic:amd64 (4.4.0.142.148, 4.4.0.143.151), linux-image-generic:amd64 (4.4.0.142.148, 4.4.0.143.151), linux-generic:amd64 (4.4.0.142.148, 4.4.0.143.151)
End-Date: 2019-03-17 06:22:23

Revision history for this message

Stefan Bader (smb) wrote on 2019-04-16:

#15

Knowing which was the last good kernel would be good to minimize the delta of changes. Note that if you are able to interact with the grub loader at boot, you can go back to at least the previous kernel before the reboot.
For the trace it would be good to capture the full message. If the server has IPMI capabilities you could add a console= kernel command-line to have messages observable through SOL.

Revision history for this message

Dirk (iggs) wrote on 2019-04-16:

#16

I spend the better part of 2h to install java on an old windows to satisfy the IPMI needs.
I was able to start SOL. BUT it is just a black windows displaying nothing.
I give up. on this - sorry I do not know how to produce an better screenshot.
(Yes I googled). IPMI Viewer produced the same bad results.

Regards grub. Does this help:
root@XXXX:~# update-grub2
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.4.0-145-generic
Found initrd image: /boot/initrd.img-4.4.0-145-generic
Found linux image: /boot/vmlinuz-4.4.0-143-generic
Found initrd image: /boot/initrd.img-4.4.0-143-generic
done

Revision history for this message

Stefan Bader (smb) wrote on 2019-04-17:

#17

The latter means that you still have 4.4.0-143 around and could select that if you had any way of interfacing with the booting server. So you could go back and confirm the regression happened between 143 and 145.

About IPMI, I don't know how one would do that with Windows, but using a Linux box, there is a package (name in Ubuntu, might vary on other distros) called ipmitool which can be used to do the SOL session without any java and from a terminal window. Of course in any case to see anything you have to figure out which ttyS# on the server is mapped to the SOL session (ttyS0 or ttyS1 usually). And then something like "console=ttyS#,115200n8" has to be added to the default arguments in /etc/default/grub to tell the kernel to re-direct the console to that serial port.

just for completeness the command to initial a SOL session would be:
ipmitool -Ilanplus -H<ip/name of ipmi interface> -U<ipmi user> -P<ipmi password> sol activate

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-04-29:

#18

I have had this crash, with the ip6_expire_frag_queue stack trace, more than 18 times since 2019-04-16 on more than 10 different servers in 8 different countries. There have been some more crashes, but from these ones the panic dump managed to go out to a remote syslog server where it's easy to grep. Crash count by kernel version; these are on both trusty and xenial:

2 crashes: 4.4.0-144-generic #170~14.04.1-Ubuntu
8 crashes: 4.4.0-145-generic #171-Ubuntu
8 crashes: 4.4.0-146-generic #172-Ubuntu

Downgrading to 4.4.0-143 now, as that build does not seem to have the "ipv6: frags: rewrite ip6_expire_frag_queue()" change; it first appears in 4.4.0-144-generic image. I think by tomorrow it's clear whether that kernel is stable as we're now having multiple crashes per day (last crash 50 minutes ago).

These are routers running NAT & firewall & some applications, with substantial IPv6 traffic.

Interestingly the crashes only happen on bare hardware. We have a much
larger number of VMs doing the same thing, most of them now running
4.4.0-146, and none of them have crashed like this. The hardware instances
do have a larger number of CPU cores, the VMs only have 2 or 4.

I am also seeing crashes on 4.15.0-48-generic hwe kernel running on xenial,
but no stack trace to show yet.

Attaching kernel stack trace file containing several crashes on various servers (hessu-ipv6_expire_frag_queue-crashes.txt).

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-04-29:

#19

5 kernel stack traces of crashes on 4.4.0-145 and -146, on 4 different hardware nodes Edit (33.8 KiB, text/plain)

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-04-29:

#20

kernel.org bug ticket, showing similar crashes on 4.9 and 4.19 kernels: https://bugzilla.kernel.org/show_bug.cgi?id=202669

Revision history for this message

Stefan Bader (smb) wrote on 2019-04-30:

#21

Thanks for the stack traces. Those help a lot to pinpoint the problem. Will be taking a look.

Changed in linux (Ubuntu Xenial):
assignee:	nobody → Stefan Bader (smb)
importance:	Undecided → High
status:	Incomplete → Triaged

Revision history for this message

Stefan Bader (smb) wrote on 2019-04-30:

#22

The issue is a check which is causing a oops/crash when a send buffer is referenced more than once when calling pskb_expand_head(). As mentioned in comment #18, this seems to be introduced by a series of patches modifying the way fragments are handled.

The networking code is quite complex, so I am not sure whether some detail I found actually is causing this issues (one backport claims to drop some extraneous initialization in ipv6 which was not done in the ipv4 counterpart), but I created a test kernel to see what happens. If someone could give http://people.canonical.com/~smb/lp1824687/ a try and let me know I would highly appreciate.

Revision history for this message

Dirk (iggs) wrote on 2019-04-30:

#23

Thanks for the updated Kernel.
Sorry for the late reply. Changes of ipmi where without success.

However I could install and boot your kernel
Linux version 4.4.0-144-generic (smb@kathleen) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) ) #170+lp1824687v1 SMP Tue Apr 30 11:18:53 UTC 2019 (Ubuntu 4.4.0-144.170+lp1824687v1-generic 4.4.176)

I activated IPv6 Traffic. We should see if the machine will panic.

Revision history for this message

Dirk (iggs) wrote on 2019-05-01:

#24

O.k. tested with the Kernel Provided. It does not improve the situtation.
Machine crashed first time after about 2hours - same error as always.
I rebootet it - took 2-3 hours until next crash.

Revision history for this message

Stefan Bader (smb) wrote on 2019-05-03:

#25

As a status update: thanks for testing. I pity it did not help. So far I was looking through all related changes in that set but could not find anything that immediately stuck out. Thinking more over the crash stacktrace it is a netfilter contrack timer expiring which causes a call into ip6_expire_frag_queue() and that got rewritten in "ipv6: frags: rewrite ip6_expire_frag_queue()" to use the first entry in the frag list for sending an ICMP message. And before doing that, it calls skb_get() which does increment the user refcount. That might actually be the issue but it is still done that way in any kernel since v4.18 upstream. Could be that nobody is using those under heavy ipv6 traffic, yet. Since I am not that familiar with the network stack, I would like to reach out to upstream with that question.

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-05-03:

#26

There is the kernel.org bug ticket which describes similar oopsing through ip6_expire_frag_queue() in 4.9 and 4.19 kernels: https://bugzilla.kernel.org/show_bug.cgi?id=202669

I also saw crashes on 4.15.0-48-generic on a server running the same task; I don't have stack traces to show yet since they didn't get out to the remote syslog server.

Revision history for this message

Stefan Bader (smb) wrote on 2019-05-03:

#27

From the upstream discussion thread it looks like I was on the right track (https://marc.info/?l=linux-netdev&m=155688404826002&w=2). For confirmation I am building another set of test kernel packages and once this can be confirmed will proceed to SRU this into the other series. This looks to have remained unnoticed so far, so anything after 4.18 and all the older kernels which have backported those changes would be affected.

Changed in linux (Ubuntu Cosmic):
importance:	Undecided → High
status:	New → Triaged
Changed in linux (Ubuntu Disco):
importance:	Undecided → High
status:	New → Triaged
Changed in linux (Ubuntu):
importance:	Undecided → High
status:	Confirmed → Triaged

Revision history for this message

Stefan Bader (smb) wrote on 2019-05-03:

#28

Ok, http://people.canonical.com/~smb/lp1824687/ has been updated with a v2 set which has the upstream patch backported.

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-05-03:

#29

Thank you! I can test this on Monday, weekend is starting here in 2 minutes and this is not the greatest moment to start testing. :)

Revision history for this message

Stefan Bader (smb) wrote on 2019-05-07:

#30

Just a reminder for the test kernel. If this can be tested soon, it could make it into the next update cycle which starts next week. But for that it has to be submitted before end of Wednesday.

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-05-15:

#31

Sorry for the delay. I'm back in the office now and deploying the test kernel today to a few servers, and to additional ones tomorrow if it's OK on the first ones.

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-05-17:

#32

Stack trace of 2019-05-16 crash with lp1824687v2 test kernel Edit (9.0 KiB, text/plain)

Unfortunately 4.4.0-144-generic #170+lp1824687v2 testing kernel still crashes. I have 4 hardware instances running it now, there were 2 panics (Australia, Sweden) within 24 hours. I installed linux-crashdump on them after the first crash to get the panic logs reliably. Attached a log from the second panic.

Revision history for this message

Stefan Bader (smb) wrote on 2019-05-17:

#33

Thanks, quickly glancing at this it looks to be different as in crashing now at a different occasion (when releasing a buffer). I will have to take a closer look but probably not today.

Revision history for this message

Stefan Bader (smb) wrote on 2019-05-21:

#34

Spend a little more time on this yesterday. While it is somewhat clear that this results from fixing the original issue (now it crashes when releasing memory a little later), My past experience of looking at network issues like that is that memory dumps are of rather limited use as the reasons lie in the past and by the time crashes happen all the interesting state is already lost.
On the other hand I also would rather avoid making experiments in production environments (if that can be avoided). But I am not sure how much chance there is for that.

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-05-21:

#35

I've now got 6 crashes within past 24 hours on the #170+lp1824687v2 testing kernel on a *single* server. It's a production environment, so I'll roll back for now. Two latest backtraces:

[ 6251.834160] Call Trace:
[ 6251.834166] <IRQ>
[ 6251.834174] [<ffffffff8173d130>] skb_release_head_state+0x90/0xb0
[ 6251.834189] [<ffffffff8173dd62>] skb_release_all+0x12/0x30
[ 6251.834203] [<ffffffff8173ddd2>] kfree_skb+0x32/0xa0
[ 6251.834798] [<ffffffff817dca2e>] inet_frag_destroy+0x7e/0x100
[ 6251.835388] [<ffffffffc04b7260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[ 6251.835979] [<ffffffff8182b522>] ip6_expire_frag_queue+0x102/0x110
[ 6251.836562] [<ffffffffc04b727f>] nf_ct_frag6_expire+0x1f/0x30 [nf_defrag_ipv6]
[ 6251.837154] [<ffffffff810f3b57>] call_timer_fn+0x37/0x140
[ 6251.837746] [<ffffffffc04b7260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[ 6251.838350] [<ffffffff810f5464>] run_timer_softirq+0x234/0x330
[ 6251.838961] [<ffffffff8108a339>] __do_softirq+0x109/0x2b0
[ 6251.839574] [<ffffffff8108a655>] irq_exit+0xa5/0xb0
[ 6251.840192] [<ffffffff818660c0>] smp_apic_timer_interrupt+0x50/0x70
[ 6251.843313] [<ffffffff8186383c>] apic_timer_interrupt+0xcc/0xe0
[ 6251.843944] <EOI>
[ 6251.843952] [<ffffffff8173cf29>] ? kfree_skbmem+0x59/0x60
[ 6251.845153] [<ffffffff8126047d>] ? __fsnotify_parent+0x5d/0x130
[ 6251.845744] [<ffffffff8121c0ab>] vfs_read+0xfb/0x130
[ 6251.846316] [<ffffffff8121cd85>] SyS_read+0x55/0xc0
[ 6251.846868] [<ffffffff8186281b>] entry_SYSCALL_64_fastpath+0x22/0xcb

[ 1037.665436] Call Trace:
[ 1037.665442] <IRQ>
[ 1037.665452] [<ffffffffc05131d7>] nf_skb_free+0x17/0x20 [nf_defrag_ipv6]
[ 1037.665469] [<ffffffff817dca23>] inet_frag_destroy+0x73/0x100
[ 1037.665484] [<ffffffffc0513260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[ 1037.665501] [<ffffffff8182b522>] ip6_expire_frag_queue+0x102/0x110
[ 1037.665516] [<ffffffffc051327f>] nf_ct_frag6_expire+0x1f/0x30 [nf_defrag_ipv6]
[ 1037.665534] [<ffffffff810f3b57>] call_timer_fn+0x37/0x140
[ 1037.665548] [<ffffffffc0513260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[ 1037.665569] [<ffffffff810f5464>] run_timer_softirq+0x234/0x330
[ 1037.665585] [<ffffffff8108a339>] __do_softirq+0x109/0x2b0
[ 1037.665598] [<ffffffff8108a655>] irq_exit+0xa5/0xb0
[ 1037.666290] [<ffffffff818660c0>] smp_apic_timer_interrupt+0x50/0x70
[ 1037.666929] [<ffffffff8186383c>] apic_timer_interrupt+0xcc/0xe0
[ 1037.667566] <EOI>
[ 1037.667578] [<ffffffff813af1e0>] ? audit_unix_sk_addr+0x40/0x40
[ 1037.669394] [<ffffffff817cfc20>] ? inet_recvmsg+0xb0/0xb0
[ 1037.670423] [<ffffffff817cfc42>] ? inet_sendmsg+0x22/0xa0
[ 1037.671441] [<ffffffff81735b7e>] sock_sendmsg+0x3e/0x50
[ 1037.672440] [<ffffffff81735c15>] sock_write_iter+0x85/0xf0
[ 1037.673409] [<ffffffff8121b6bf>] do_iter_readv_writev+0x6f/0xa0
[ 1037.674353] [<ffffffff8121c40f>] do_readv_writev+0x18f/0x230
[ 1037.675273] [<ffffffff8121b8c9>] ? __vfs_read+0x29/0x40
[ 1037.676167] [<ffffffff8121c539>] vfs_writev+0x39/0x50
[ 1037.677035] [<ffffffff8121d269>] SyS_writev+0x59/0xf0
[ 1037.677873] [<ffffffff8186281b>] entry_SYSCALL_64_fastpath+0x22/0xcb

I've now got 6 crashes within past 24 hours on the #170+lp1824687v2 testing kernel on a *single* server. It's a production environment, so I'll roll back for now. Two latest backtraces:

[ 6251.834160] Call Trace:
[ 6251.834166]  <IRQ>
[ 6251.834174]  [<ffffffff8173d130>] skb_release_head_state+0x90/0xb0
[ 6251.834189]  [<ffffffff8173dd62>] skb_release_all+0x12/0x30
[ 6251.834203]  [<ffffffff8173ddd2>] kfree_skb+0x32/0xa0
[ 6251.834798]  [<ffffffff817dca2e>] inet_frag_destroy+0x7e/0x100
[ 6251.835388]  [<ffffffffc04b7260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[ 6251.835979]  [<ffffffff8182b522>] ip6_expire_frag_queue+0x102/0x110
[ 6251.836562]  [<ffffffffc04b727f>] nf_ct_frag6_expire+0x1f/0x30 [nf_defrag_ipv6]
[ 6251.837154]  [<ffffffff810f3b57>] call_timer_fn+0x37/0x140
[ 6251.837746]  [<ffffffffc04b7260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[ 6251.838350]  [<ffffffff810f5464>] run_timer_softirq+0x234/0x330
[ 6251.838961]  [<ffffffff8108a339>] __do_softirq+0x109/0x2b0
[ 6251.839574]  [<ffffffff8108a655>] irq_exit+0xa5/0xb0
[ 6251.840192]  [<ffffffff818660c0>] smp_apic_timer_interrupt+0x50/0x70
[ 6251.843313]  [<ffffffff8186383c>] apic_timer_interrupt+0xcc/0xe0
[ 6251.843944]  <EOI>
[ 6251.843952]  [<ffffffff8173cf29>] ? kfree_skbmem+0x59/0x60
[ 6251.845153]  [<ffffffff8126047d>] ? __fsnotify_parent+0x5d/0x130
[ 6251.845744]  [<ffffffff8121c0ab>] vfs_read+0xfb/0x130
[ 6251.846316]  [<ffffffff8121cd85>] SyS_read+0x55/0xc0
[ 6251.846868]  [<ffffffff8186281b>] entry_SYSCALL_64_fastpath+0x22/0xcb

[ 1037.665436] Call Trace:
[ 1037.665442]  <IRQ> 
[ 1037.665452]  [<ffffffffc05131d7>] nf_skb_free+0x17/0x20 [nf_defrag_ipv6]
[ 1037.665469]  [<ffffffff817dca23>] inet_frag_destroy+0x73/0x100
[ 1037.665484]  [<ffffffffc0513260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[ 1037.665501]  [<ffffffff8182b522>] ip6_expire_frag_queue+0x102/0x110
[ 1037.665516]  [<ffffffffc051327f>] nf_ct_frag6_expire+0x1f/0x30 [nf_defrag_ipv6]
[ 1037.665534]  [<ffffffff810f3b57>] call_timer_fn+0x37/0x140
[ 1037.665548]  [<ffffffffc0513260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6]
[ 1037.665569]  [<ffffffff810f5464>] run_timer_softirq+0x234/0x330
[ 1037.665585]  [<ffffffff8108a339>] __do_softirq+0x109/0x2b0
[ 1037.665598]  [<ffffffff8108a655>] irq_exit+0xa5/0xb0
[ 1037.666290]  [<ffffffff818660c0>] smp_apic_timer_interrupt+0x50/0x70
[ 1037.666929]  [<ffffffff8186383c>] apic_timer_interrupt+0xcc/0xe0
[ 1037.667566]  <EOI> 
[ 1037.667578]  [<ffffffff813af1e0>] ? audit_unix_sk_addr+0x40/0x40
[ 1037.669394]  [<ffffffff817cfc20>] ? inet_recvmsg+0xb0/0xb0
[ 1037.670423]  [<ffffffff817cfc42>] ? inet_sendmsg+0x22/0xa0
[ 1037.671441]  [<ffffffff81735b7e>] sock_sendmsg+0x3e/0x50
[ 1037.672440]  [<ffffffff81735c15>] sock_write_iter+0x85/0xf0
[ 1037.673409]  [<ffffffff8121b6bf>] do_iter_readv_writev+0x6f/0xa0
[ 1037.674353]  [<ffffffff8121c40f>] do_readv_writev+0x18f/0x230
[ 1037.675273]  [<ffffffff8121b8c9>] ? __vfs_read+0x29/0x40
[ 1037.676167]  [<ffffffff8121c539>] vfs_writev+0x39/0x50
[ 1037.677035]  [<ffffffff8121d269>] SyS_writev+0x59/0xf0
[ 1037.677873]  [<ffffffff8186281b>] entry_SYSCALL_64_fastpath+0x22/0xcb

Revision history for this message

Stefan Bader (smb) wrote on 2019-05-28:

#36

So far I have not been successful to trigger the code path which leads to the crashes on my test system. I have, however been able to extend the patch I had in v2 in a way that makes me a bit more hopeful that it might get us somewhere. Potentially not the most optimized handling but that could wait. The problem is a bit that all the changes come from a set of changes where I am not sure upstream really tested the intermediate steps too well. Anyhow, you would find the new debs again at http://people.canonical.com/~smb/lp1824687/
I know it sucks, but I would appreciate if we could put that again into production stress.

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-05-31:

#37

Thanks, I deployed the v4 debs on one server which was particularly unstable, and it's still up after 1 day and 8 hours now. I'll deploy more widely on Monday and Tuesday.

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-06-05:

#38

I've now got the v4 debs on 5 servers, and not a single crash since they were installed on each. Looks good to me. Thank you!

Stefan Bader (smb) on 2019-06-06

description:

updated

Stefan Bader (smb) on 2019-06-06

Changed in linux (Ubuntu Xenial):
status:	Triaged → Fix Committed
Changed in linux (Ubuntu Cosmic):
status:	Triaged → Fix Committed

Stefan Bader (smb) on 2019-06-11

Changed in linux (Ubuntu Cosmic):
status:	Fix Committed → Incomplete

Revision history for this message

Stefan Bader (smb) wrote on 2019-06-11:

#39

I reverted the changes to Cosmic because that needs at least a different approach. In that version the rbtree usage is not yet present and the IPv4 expire function does the exactly same thing (increment the refcount of the skb) and we have no hard evidence this actually causes crashes in the 4.18 kernel. So for now only keep the xenial change.

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2019-06-18:

#40

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:

added: verification-needed-xenial

Revision history for this message

Dirk (iggs) wrote on 2019-06-19:

#41

I tested with Kernel:
Linux tor3 4.4.0-149-generic #175+lp1824687v4 SMP Mon May 27 17:21:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

IPv6 is enabled and the system is under usual load.
No crashes in 24h.
For me this is a clear indication that the problem is fixed. Before there where crashes all 2-4 hours.

Therefore Verified for Xenial.

How can I add tags ?

Revision history for this message

Dirk (iggs) wrote on 2019-06-19:

#42

Verfied fix by reporter

tags:

added: verification-done-xenial
removed: amd64 apport-bug apport-collected verification-needed-xenial xenial

Revision history for this message

Heikki Hannikainen (hessu) wrote on 2019-06-20:

#43

I deployed the actual -proposed kernel 4.4.0-152.179 on 4 servers, and it is stable for us. Previously there were multiple crashes per day. Confirming, verification done. Thank you!

tags:

added: amd64 apport-bug apport-collected xenial

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-07-24:

#44

Download full text (30.5 KiB)

This bug was fixed in the package linux - 4.4.0-157.185

---------------
linux (4.4.0-157.185) xenial; urgency=medium

* linux: 4.4.0-157.185 -proposed tracker (LP: #1837476)

  * systemd 229-4ubuntu21.22 ADT test failure with linux 4.4.0-156.183 (storage)
    (LP: #1837235)
    - Revert "block/bio: Do not zero user pages"
    - Revert "block: Clear kernel memory before copying to user"
    - Revert "bio_copy_from_iter(): get rid of copying iov_iter"

linux (4.4.0-156.183) xenial; urgency=medium

* linux: 4.4.0-156.183 -proposed tracker (LP: #1836880)

* BCM43602 802.11ac Wireless regression - PCI ID 14e4:43ba (LP: #1836801)
- brcmfmac: add eth_type_trans back for PCIe full dongle

linux (4.4.0-155.182) xenial; urgency=medium

* linux: 4.4.0-155.182 -proposed tracker (LP: #1834918)

* Geneve tunnels don't work when ipv6 is disabled (LP: #1794232)
- geneve: correctly handle ipv6.disable module parameter

  * Kernel modules generated incorrectly when system is localized to a non-
    English language (LP: #1828084)
    - scripts: override locale from environment when running recordmcount.pl

* Handle overflow in proc_get_long of sysctl (LP: #1833935)
- sysctl: handle overflow in proc_get_long

This bug was fixed in the package linux - 4.4.0-157.185

---------------
linux (4.4.0-157.185) xenial; urgency=medium

* linux: 4.4.0-157.185 -proposed tracker (LP: #1837476)

linux (4.4.0-156.183) xenial; urgency=medium

* linux: 4.4.0-156.183 -proposed tracker (LP: #1836880)

* BCM43602 802.11ac Wireless regression - PCI ID 14e4:43ba (LP: #1836801)
    - brcmfmac: add eth_type_trans back for PCIe full dongle

linux (4.4.0-155.182) xenial; urgency=medium

* linux: 4.4.0-155.182 -proposed tracker (LP: #1834918)

* Geneve tunnels don't work when ipv6 is disabled (LP: #1794232)
    - geneve: correctly handle ipv6.disable module parameter

* Kernel modules generated incorrectly when system is localized to a non-
    English language (LP: #1828084)
    - scripts: override locale from environment when running recordmcount.pl

* Handle overflow in proc_get_long of sysctl (LP: #1833935)
    - sysctl: handle overflow in proc_get_long

* Xenial update: 4.4.181 upstream stable release (LP: #1832661)
    - x86/speculation/mds: Revert CPU buffer clear on double fault exit
    - x86/speculation/mds: Improve CPU buffer clear documentation
    - ARM: exynos: Fix a leaked reference by adding missing of_node_put
    - crypto: vmx - fix copy-paste error in CTR mode
    - crypto: crct10dif-generic - fix use via crypto_shash_digest()
    - crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
    - ALSA: usb-audio: Fix a memory leak bug
    - ALSA: hda/hdmi - Consider eld_valid when reporting jack event
    - ALSA: hda/realtek - EAPD turn on later
    - ASoC: max98090: Fix restore of DAPM Muxes
    - ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
    - mm/mincore.c: make mincore() more conservative
    - ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
    - mfd: da9063: Fix OTP control register names to match datasheets for
      DA9063/63L
    - tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
    - ext4: actually request zeroing of inode table after grow
    - ext4: fix ext4_show_options for file systems w/o journal
    - Btrfs: do not start a transaction at iterate_extent_inodes()
    - bcache: fix a race between cache register and cacheset unregister
    - bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
    - ipmi:ssif: compare block number correctly for multi-part return messages
    - crypto: gcm - Fix error return code in crypto_gcm_create_common()
    - crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
    - crypto: chacha20poly1305 - set cra_name correctly
    - crypto: salsa20 - don't access already-freed walk.iv
    - crypto: arm/aes-neonbs - don't access already-freed walk.iv
    - writeback: synchronize sync(2) against cgroup writeback membership switches
    - fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going
      into workqueue when umount
    - ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
    - KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
    - net: avoid weird emergency message
    - net/mlx4_core: Change the error print to info print
    - ppp: deflate: Fix possible crash in deflate_init
    - tipc: switch order of device registration to fix a crash
    - tipc: fix modprobe tipc failed after switch order of device registration
    - stm class: Fix channel free in stm output free path
    - md: add mddev->pers to avoid potential NULL pointer dereference
    - intel_th: msu: Fix single mode with IOMMU
    - of: fix clang -Wunsequenced for be32_to_cpu()
    - cifs: fix strcat buffer overflow and reduce raciness in
      smb21_set_oplock_level()
    - media: ov6650: Fix sensor possibly not detected on probe
    - NFS4: Fix v4.0 client state corruption when mount
    - clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider
    - fuse: fix writepages on 32bit
    - fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
    - iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114
    - ceph: flush dirty inodes before proceeding with remount
    - tracing: Fix partial reading of trace event's id file
    - memory: tegra: Fix integer overflow on tick value calculation
    - perf intel-pt: Fix instructions sampling rate
    - perf intel-pt: Fix improved sample timestamp
    - perf intel-pt: Fix sample timestamp wrt non-taken branches
    - fbdev: sm712fb: fix brightness control on reboot, don't set SR30
    - fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75
    - fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F
    - fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA
    - fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping
      VRAM
    - fbdev: sm712fb: fix support for 1024x768-16 mode
    - fbdev: sm712fb: use 1024x768 by default on non-MIPS, fix garbled display
    - fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting
    - PCI: Mark Atheros AR9462 to avoid bus reset
    - dm delay: fix a crash when invalid device is specified
    - xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink
    - xfrm6_tunnel: Fix potential panic when unloading xfrm6_tunnel module
    - vti4: ipip tunnel deregistration fixes.
    - xfrm4: Fix uninitialized memory read in _decode_session4
    - KVM: arm/arm64: Ensure vcpu target is unset on reset failure
    - power: supply: sysfs: prevent endless uevent loop with
      CONFIG_POWER_SUPPLY_DEBUG
    - ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
    - perf bench numa: Add define for RUSAGE_THREAD if not present
    - Revert "Don't jump to compute_result state from check_result state"
    - md/raid: raid5 preserve the writeback action after the parity check
    - btrfs: Honour FITRIM range constraints during free space trim
    - fbdev: sm712fb: fix memory frequency by avoiding a switch/case fallthrough
    - ext4: do not delete unlinked inode from orphan list on failed truncate
    - KVM: x86: fix return value for reserved EFER
    - bio: fix improper use of smp_mb__before_atomic()
    - Revert "scsi: sd: Keep disk read-only when re-reading partition"
    - crypto: vmx - CTR: always increment IV as quadword
    - gfs2: Fix sign extension bug in gfs2_update_stats
    - Btrfs: fix race between ranged fsync and writeback of adjacent ranges
    - btrfs: sysfs: don't leak memory when failing add fsid
    - fbdev: fix divide error in fb_var_to_videomode
    - hugetlb: use same fault hash key for shared and private mappings
    - fbdev: fix WARNING in __alloc_pages_nodemask bug
    - media: cpia2: Fix use-after-free in cpia2_exit
    - media: vivid: use vfree() instead of kfree() for dev->bitmap_cap
    - ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit
    - at76c50x-usb: Don't register led_trigger if usb_register_driver failed
    - perf tools: No need to include bitops.h in util.h
    - gfs2: Fix lru_count going negative
    - cxgb4: Fix error path in cxgb4_init_module
    - mmc: core: Verify SD bus width
    - powerpc/boot: Fix missing check of lseek() return value
    - ASoC: imx: fix fiq dependencies
    - spi: pxa2xx: fix SCR (divisor) calculation
    - brcm80211: potential NULL dereference in
      brcmf_cfg80211_vndr_cmds_dcmd_handler()
    - rtc: 88pm860x: prevent use-after-free on device remove
    - w1: fix the resume command API
    - dmaengine: pl330: _stop: clear interrupt status
    - mac80211/cfg80211: update bss channel on channel switch
    - ASoC: fsl_sai: Update is_slave_mode with correct value
    - mwifiex: prevent an array overflow
    - net: cw1200: fix a NULL pointer dereference
    - bcache: return error immediately in bch_journal_replay()
    - bcache: fix failure in journal relplay
    - bcache: add failure check to run_cache_set() for journal replay
    - bcache: avoid clang -Wunintialized warning
    - x86/build: Move _etext to actual end of .text
    - smpboot: Place the __percpu annotation correctly
    - x86/mm: Remove in_nmi() warning from 64-bit implementation of
      vmalloc_fault()
    - mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC
      versions
    - HID: logitech-hidpp: use RAP instead of FAP to get the protocol version
    - pinctrl: pistachio: fix leaked of_node references
    - dmaengine: at_xdmac: remove BUG_ON macro in tasklet
    - media: coda: clear error return value before picture run
    - media: ov6650: Move v4l2_clk_get() to ov6650_video_probe() helper
    - media: au0828: stop video streaming only when last user stops
    - media: ov2659: make S_FMT succeed even if requested format doesn't match
    - audit: fix a memory leak bug
    - media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable()
    - media: pvrusb2: Prevent a buffer overflow
    - powerpc/numa: improve control of topology updates
    - sched/core: Check quota and period overflow at usec to nsec conversion
    - sched/core: Handle overflow in cpu_shares_write_u64
    - USB: core: Don't unbind interfaces following device reset failure
    - x86/irq/64: Limit IST stack overflow check to #DB stack
    - i40e: don't allow changes to HW VLAN stripping on active port VLANs
    - RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure
    - hwmon: (vt1211) Use request_muxed_region for Super-IO accesses
    - hwmon: (smsc47m1) Use request_muxed_region for Super-IO accesses
    - hwmon: (smsc47b397) Use request_muxed_region for Super-IO accesses
    - hwmon: (pc87427) Use request_muxed_region for Super-IO accesses
    - hwmon: (f71805f) Use request_muxed_region for Super-IO accesses
    - scsi: libsas: Do discovery on empty PHY to update PHY info
    - mmc_spi: add a status check for spi_sync_locked
    - mmc: sdhci-of-esdhc: add erratum eSDHC5 support
    - mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support
    - PM / core: Propagate dev->power.wakeup_path when no callbacks
    - extcon: arizona: Disable mic detect if running when driver is removed
    - s390: cio: fix cio_irb declaration
    - cpufreq: ppc_cbe: fix possible object reference leak
    - cpufreq/pasemi: fix possible object reference leak
    - cpufreq: pmac32: fix possible object reference leak
    - x86/build: Keep local relocations with ld.lld
    - iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion
    - iio: hmc5843: fix potential NULL pointer dereferences
    - iio: common: ssp_sensors: Initialize calculated_time in
      ssp_common_process_data
    - rtlwifi: fix a potential NULL pointer dereference
    - brcmfmac: fix missing checks for kmemdup
    - b43: shut up clang -Wuninitialized variable warning
    - brcmfmac: convert dev_init_lock mutex to completion
    - brcmfmac: fix race during disconnect when USB completion is in progress
    - scsi: ufs: Fix regulator load and icc-level configuration
    - scsi: ufs: Avoid configuring regulator with undefined voltage range
    - arm64: cpu_ops: fix a leaked reference by adding missing of_node_put
    - x86/ia32: Fix ia32_restore_sigcontext() AC leak
    - chardev: add additional check for minor range overlap
    - HID: core: move Usage Page concatenation to Main item
    - ASoC: eukrea-tlv320: fix a leaked reference by adding missing of_node_put
    - ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put
    - cxgb3/l2t: Fix undefined behaviour
    - spi: tegra114: reset controller on probe
    - media: wl128x: prevent two potential buffer overflows
    - virtio_console: initialize vtermno value for ports
    - tty: ipwireless: fix missing checks for ioremap
    - rcutorture: Fix cleanup path for invalid torture_type strings
    - usb: core: Add PM runtime calls to usb_hcd_platform_shutdown
    - scsi: qla4xxx: avoid freeing unallocated dma memory
    - media: m88ds3103: serialize reset messages in m88ds3103_set_frontend
    - media: go7007: avoid clang frame overflow warning with KASAN
    - media: saa7146: avoid high stack usage with clang
    - scsi: lpfc: Fix SLI3 commands being issued on SLI4 devices
    - spi : spi-topcliff-pch: Fix to handle empty DMA buffers
    - spi: rspi: Fix sequencer reset during initialization
    - spi: Fix zero length xfer bug
    - ASoC: davinci-mcasp: Fix clang warning without CONFIG_PM
    - ipv6: Consider sk_bound_dev_if when binding a raw socket to an address
    - llc: fix skb leak in llc_build_and_send_ui_pkt()
    - net-gro: fix use-after-free read in napi_gro_frags()
    - net: stmmac: fix reset gpio free missing
    - usbnet: fix kernel crash after disconnect
    - tipc: Avoid copying bytes beyond the supplied data
    - bnxt_en: Fix aggregation buffer leak under OOM condition.
    - net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
    - crypto: vmx - ghash: do nosimd fallback manually
    - xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
    - Revert "tipc: fix modprobe tipc failed after switch order of device
      registration"
    - tipc: fix modprobe tipc failed after switch order of device registration -v2
    - sparc64: Fix regression in non-hypervisor TLB flush xcall
    - include/linux/bitops.h: sanitize rotate primitives
    - xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()
    - usb: xhci: avoid null pointer deref when bos field is NULL
    - USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor
    - USB: sisusbvga: fix oops in error path of sisusb_probe
    - USB: Add LPM quirk for Surface Dock GigE adapter
    - USB: rio500: refuse more than one device at a time
    - USB: rio500: fix memory leak in close after disconnect
    - media: usb: siano: Fix general protection fault in smsusb
    - media: usb: siano: Fix false-positive "uninitialized variable" warning
    - media: smsusb: better handle optional alignment
    - scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
    - scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
    - Btrfs: fix race updating log root item during fsync
    - ALSA: hda/realtek - Set default power save node to 0
    - drm/nouveau/i2c: Disable i2c bus access after ->fini()
    - tty: serial: msm_serial: Fix XON/XOFF
    - tty: max310x: Fix external crystal register setup
    - memcg: make it work on sparse non-0-node systems
    - kernel/signal.c: trace_signal_deliver when signal_group_exit
    - CIFS: cifs_read_allocate_pages: don't iterate through whole page array on
      ENOMEM
    - binder: Replace "%p" with "%pK" for stable
    - binder: replace "%p" with "%pK"
    - brcmfmac: Add length checks on firmware events
    - brcmfmac: screening firmware event packet
    - brcmfmac: revise handling events in receive path
    - brcmfmac: fix incorrect event channel deduction
    - brcmfmac: add length checks in scheduled scan result handler
    - brcmfmac: add subtype check for event handling in data path
    - userfaultfd: don't pin the user memory in userfaultfd_file_create()
    - Revert "x86/build: Move _etext to actual end of .text"
    - net: cdc_ncm: GetNtbFormat endian fix
    - usb: gadget: fix request length error for isoc transfer
    - media: uvcvideo: Fix uvc_alloc_entity() allocation alignment
    - ethtool: fix potential userspace buffer overflow
    - neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit
    - net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query
    - net: rds: fix memory leak in rds_ib_flush_mr_pool
    - pktgen: do not sleep with the thread lock held.
    - rcu: locking and unlocking need to always be at least barriers
    - parisc: Use implicit space register selection for loading the coherence
      index of I/O pdirs
    - fuse: fallocate: fix return with locked inode
    - MIPS: pistachio: Build uImage.gz by default
    - genwqe: Prevent an integer overflow in the ioctl
    - drm/gma500/cdv: Check vbt config bits when detecting lvds panels
    - fs: stream_open - opener for stream-like files so that read and write can
      run simultaneously without deadlock
    - fuse: Add FOPEN_STREAM to use stream_open()
    - ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled
    - ethtool: check the return value of get_regs_len
    - Linux 4.4.181

* CVE-2019-2054
    - arm/ptrace: run seccomp after ptrace

* CVE-2018-12126 // CVE-2018-12127 // CVE-2018-12130
    - x86/speculation: Remove redundant arch_smt_update() invocation

* Revert x86/vdso linker changes from #1830890 as this causes glibc
    2.29-0ubuntu3 FTBFS on eoan (LP: #1834315)
    - Revert "x86/vdso: Pass --eh-frame-hdr to the linker"
    - Revert "x86: vdso: Use $LD instead of $CC to link"

* CONFIG_LOG_BUF_SHIFT set to 14 is too low on arm64 (LP: #1824864)
    - [Config] CONFIG_LOG_BUF_SHIFT=18 on all 64bit arches

* CVE-2019-11833
    - ext4: zero out the unused memory region in the extent tree block

* idle-page oopses when accessing page frames that are out of range
    (LP: #1833410)
    - mm/page_idle.c: fix oops because end_pfn is larger than max_pfn

* Performance degradation when copying from LVM snapshot backed by NVMe disk
    (LP: #1833319)
    - NVMe: Allow request merges

* Bluetooth regressions with Xenial kernel 4.4.0-152.179 (LP: #1833698)
    - Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR
      connections"

* 4.4.0-145-generic Kernel Panic  ip6_expire_frag_queue (LP: #1824687)
    - SAUCE: ipv6: frags: fix skb extraction in ip6_expire_frag_queue()

* [Xenial] Customer can not SSH to Linux VM due to "VSC State Unhealthy"
    (LP: #1826416)
    - vmbus: fix missing signaling in hv_signal_on_read()

* Xenial update: 4.4.180 upstream stable release (LP: #1830176)
    - kbuild: simplify ld-option implementation
    - KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number
    - cifs: do not attempt cifs operation on smb2+ rename error
    - MIPS: scall64-o32: Fix indirect syscall number load
    - trace: Fix preempt_enable_no_resched() abuse
    - sched/numa: Fix a possible divide-by-zero
    - ceph: ensure d_name stability in ceph_dentry_hash()
    - ceph: fix ci->i_head_snapc leak
    - nfsd: Don't release the callback slot unless it was actually held
    - sunrpc: don't mark uninitialised items as VALID.
    - USB: Add new USB LPM helpers
    - USB: Consolidate LPM checks to avoid enabling LPM twice
    - powerpc/xmon: Add RFI flush related fields to paca dump
    - powerpc/64s: Improve RFI L1-D cache flush fallback
    - powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
    - Revert "UBUNTU: SAUCE: powerpc/64s: Add support for a store forwarding
      barrier at kernel entry/exit"
    - powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit
    - powerpc/64s: Add barrier_nospec
    - powerpc/64s: Add support for ori barrier_nospec patching
    - powerpc/64s: Patch barrier_nospec in modules
    - powerpc/64s: Enable barrier_nospec based on firmware settings
    - powerpc/64: Use barrier_nospec in syscall entry
    - powerpc: Use barrier_nospec in copy_from_user()
    - powerpc/64s: Enhance the information in cpu_show_spectre_v1()
    - powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2
    - powerpc/64: Disable the speculation barrier from the command line
    - powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.
    - powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC
    - powerpc/64: Call setup_barrier_nospec() from setup_arch()
    - powerpc/64: Make meltdown reporting Book3S 64 specific
    - powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E
    - powerpc/asm: Add a patch_site macro & helpers for patching instructions
    - powerpc/64s: Add new security feature flags for count cache flush
    - powerpc/64s: Add support for software count cache flush
    - powerpc/pseries: Query hypervisor for count cache flush settings
    - powerpc/powernv: Query firmware for count cache flush settings
    - powerpc: Avoid code patching freed init sections
    - powerpc/fsl: Add infrastructure to fixup branch predictor flush
    - powerpc/fsl: Add macro to flush the branch predictor
    - powerpc/fsl: Fix spectre_v2 mitigations reporting
    - powerpc/fsl: Add nospectre_v2 command line argument
    - powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)
    - powerpc/fsl: Update Spectre v2 reporting
    - powerpc/security: Fix spectre_v2 reporting
    - powerpc/fsl: Fix the flush of branch predictor.
    - tipc: handle the err returned from cmd header function
    - slip: make slhc_free() silently accept an error pointer
    - intel_th: gth: Fix an off-by-one in output unassigning
    - fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
    - NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
    - netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
    - tipc: check bearer name with right length in tipc_nl_compat_bearer_enable
    - tipc: check link name with right length in tipc_nl_compat_link_set
    - bpf: reject wrong sized filters earlier
    - Revert "block/loop: Use global lock for ioctl() operation."
    - ipv4: add sanity checks in ipv4_link_failure()
    - team: fix possible recursive locking when add slaves
    - net: stmmac: move stmmac_check_ether_addr() to driver probe
    - ipv4: set the tcp_min_rtt_wlen range from 0 to one day
    - powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used
    - powerpc/fsl: Flush branch predictor when entering KVM
    - powerpc/fsl: Emulate SPRN_BUCSR register
    - powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)
    - powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit platforms
    - powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup'
    - powerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg
    - Documentation: Add nospectre_v1 parameter
    - usbnet: ipheth: prevent TX queue timeouts when device not ready
    - usbnet: ipheth: fix potential null pointer dereference in ipheth_carrier_set
    - qlcnic: Avoid potential NULL pointer dereference
    - netfilter: bridge: set skb transport_header before entering
      NF_INET_PRE_ROUTING
    - sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init()
    - usb: gadget: net2280: Fix overrun of OUT messages
    - usb: gadget: net2280: Fix net2280_dequeue()
    - usb: gadget: net2272: Fix net2272_dequeue()
    - ARM: dts: pfla02: increase phy reset duration
    - net: ks8851: Dequeue RX packets explicitly
    - net: ks8851: Reassert reset pin if chip ID check fails
    - net: ks8851: Delay requesting IRQ until opened
    - net: ks8851: Set initial carrier state to down
    - net: xilinx: fix possible object reference leak
    - net: ibm: fix possible object reference leak
    - net: ethernet: ti: fix possible object reference leak
    - scsi: qla4xxx: fix a potential NULL pointer dereference
    - usb: u132-hcd: fix resource leak
    - ceph: fix use-after-free on symlink traversal
    - scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN
    - libata: fix using DMA buffers on stack
    - kconfig/[mn]conf: handle backspace (^H) key
    - ALSA: line6: use dynamic buffers
    - ipv4: ip_do_fragment: Preserve skb_iif during fragmentation
    - ipv6/flowlabel: wait rcu grace period before put_pid()
    - ipv6: invert flowlabel sharing check in process and user mode
    - bnxt_en: Improve multicast address setup logic.
    - packet: validate msg_namelen in send directly
    - USB: yurex: Fix protection fault after device removal
    - USB: w1 ds2490: Fix bug caused by improper use of altsetting array
    - USB: core: Fix unterminated string returned by usb_string()
    - USB: core: Fix bug caused by duplicate interface PM usage counter
    - HID: debug: fix race condition with between rdesc_show() and device removal
    - rtc: sh: Fix invalid alarm warning for non-enabled alarm
    - bonding: show full hw address in sysfs for slave entries
    - jffs2: fix use-after-free on symlink traversal
    - debugfs: fix use-after-free on symlink traversal
    - rtc: da9063: set uie_unsupported when relevant
    - vfio/pci: use correct format characters
    - scsi: storvsc: Fix calculation of sub-channel count
    - net: hns: Use NAPI_POLL_WEIGHT for hns driver
    - net: hns: Fix WARNING when remove HNS driver with SMMU enabled
    - hugetlbfs: fix memory leak for resv_map
    - xsysace: Fix error handling in ace_setup
    - ARM: orion: don't use using 64-bit DMA masks
    - ARM: iop: don't use using 64-bit DMA masks
    - usb: usbip: fix isoc packet num validation in get_pipe
    - staging: iio: adt7316: allow adt751x to use internal vref for all dacs
    - staging: iio: adt7316: fix the dac read calculation
    - staging: iio: adt7316: fix the dac write calculation
    - Input: snvs_pwrkey - initialize necessary driver data before enabling IRQ
    - selinux: never allow relabeling on context mounts
    - x86/mce: Improve error message when kernel cannot recover, p2
    - media: v4l2: i2c: ov7670: Fix PLL bypass register values
    - scsi: libsas: fix a race condition when smp task timeout
    - ASoC:soc-pcm:fix a codec fixup issue in TDM case
    - ASoC: cs4270: Set auto-increment bit for register writes
    - ASoC: tlv320aic32x4: Fix Common Pins
    - perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS
    - scsi: csiostor: fix missing data copy in csio_scsi_err_handler()
    - iommu/amd: Set exclusion range correctly
    - genirq: Prevent use-after-free and work list corruption
    - usb: dwc3: Fix default lpm_nyet_threshold value
    - scsi: qla2xxx: Fix incorrect region-size setting in optrom SYSFS routines
    - Bluetooth: hidp: fix buffer overflow
    - Bluetooth: Align minimum encryption key size for LE and BR/EDR connections
    - UAS: fix alignment of scatter/gather segments
    - ipv6: fix a potential deadlock in do_ipv6_setsockopt()
    - ASoC: Intel: avoid Oops if DMA setup fails
    - timer/debug: Change /proc/timer_stats from 0644 to 0600
    - netfilter: compat: initialize all fields in xt_init
    - platform/x86: sony-laptop: Fix unintentional fall-through
    - iio: adc: xilinx: fix potential use-after-free on remove
    - HID: input: add mapping for Expose/Overview key
    - HID: input: add mapping for keyboard Brightness Up/Down/Toggle keys
    - libnvdimm/btt: Fix a kmemdup failure check
    - s390/dasd: Fix capacity calculation for large volumes
    - s390/3270: fix lockdep false positive on view->lock
    - KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in
      tracing
    - tools lib traceevent: Fix missing equality check for strcmp
    - init: initialize jump labels before command line option parsing
    - ipvs: do not schedule icmp errors from tunnels
    - s390: ctcm: fix ctcm_new_device error return code
    - gpu: ipu-v3: dp: fix CSC handling
    - cw1200: fix missing unlock on error in cw1200_hw_scan()
    - Don't jump to compute_result state from check_result state
    - x86/microcode/intel: Add a helper which gives the microcode revision
    - x86: stop exporting msr-index.h to userland
    - x86/microcode/intel: Check microcode revision before updating sibling
      threads
    - x86/MCE: Save microcode revision in machine check records
    - x86/bugs: Add AMD's variant of SSB_NO
    - x86/bugs: Add AMD's SPEC_CTRL MSR usage
    - x86/bugs: Switch the selection of mitigation from CPU vendor to CPU features
    - x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR
    - x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
    - x86/microcode: Update the new microcode revision unconditionally
    - x86/mm: Use WRITE_ONCE() when setting PTEs
    - x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
    - x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
    - x86/speculation: Propagate information about RSB filling mitigation to sysfs
    - x86/speculation: Update the TIF_SSBD comment
    - x86/speculation: Clean up spectre_v2_parse_cmdline()
    - x86/speculation: Move STIPB/IBPB string conditionals out of
      cpu_show_common()
    - x86/speculation: Disable STIBP when enhanced IBRS is in use
    - x86/speculation: Rename SSBD update functions
    - x86/speculation: Reorganize speculation control MSRs update
    - x86/Kconfig: Select SCHED_SMT if SMP enabled
    - x86/speculation: Mark string arrays const correctly
    - x86/speculataion: Mark command line parser data __initdata
    - x86/speculation: Add command line control for indirect branch speculation
    - x86/speculation: Prepare for per task indirect branch speculation control
    - x86/process: Consolidate and simplify switch_to_xtra() code
    - x86/speculation: Avoid __switch_to_xtra() calls
    - x86/speculation: Prepare for conditional IBPB in switch_mm()
    - x86/speculation: Split out TIF update
    - x86/speculation: Prepare arch_smt_update() for PRCTL mode
    - x86/speculation: Prevent stale SPEC_CTRL msr content
    - x86/speculation: Add prctl() control for indirect branch speculation
    - x86/speculation: Enable prctl mode for spectre_v2_user
    - x86/speculation: Add seccomp Spectre v2 user space protection mode
    - x86/speculation: Provide IBPB always command line options
    - x86/cpu/bugs: Use __initconst for 'const' init data
    - USB: serial: use variable for status
    - USB: serial: fix unthrottle races
    - bridge: Fix error path for kobject_init_and_add()
    - net: ucc_geth - fix Oops when changing number of buffers in the ring
    - packet: Fix error path in packet_init
    - vlan: disable SIOCSHWTSTAMP in container
    - vrf: sit mtu should not be updated when vrf netdev is the link
    - ipv4: Fix raw socket lookup for local traffic
    - bonding: fix arp_validate toggling in active-backup mode
    - drivers/virt/fsl_hypervisor.c: dereferencing error pointers in ioctl
    - drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl
    - powerpc/booke64: set RI in default MSR
    - powerpc/lib: fix book3s/32 boot failure due to code patching
    - Linux 4.4.180
    - SAUCE: Clarify IBRS/IBPB runtime state change messages
    - SAUCE: x86/speculation: Move STIBP hunks
    - SAUCE: powerpc/speculation: Support 'mitigations=' cmdline option
    - SAUCE: x86/speculation: Update 'mitigations=' documentation
    - SAUCE: Show 'pti' instead of 'kaiser' in /proc/cpuinfo
    - SAUCE: perf/bench: Drop definition of BIT in numa.c
    - SAUCE: x86/speculation: Fix SSB command line documentation

* CVE-2018-12126 // CVE-2018-12127 // CVE-2018-12130 // CVE-2019-11091
    - SAUCE: Synchronize MDS mitigations with upstream
    - Documentation: Correct the possible MDS sysfs values
    - x86/speculation/mds: Fix documentation typo

* CVE-2019-11091
    - x86/mds: Add MDSUM variant to the MDS documentation

-- Stefan Bader <stefan.bader@canonical.com>  Tue, 23 Jul 2019 10:55:25 +0200

Changed in linux (Ubuntu Xenial):
status:	Fix Committed → Fix Released

Brad Figg (brad-figg) on 2019-07-24

tags:

added: cscc

Terry Rudd (terrykrudd) on 2019-08-14

Changed in linux (Ubuntu Cosmic):
status:	Incomplete → Invalid

Steve Langasek (vorlon) on 2020-07-02

Changed in linux (Ubuntu Disco):
status:	Triaged → Won't Fix

Po-Hsu Lin (cypressyew) on 2021-04-06

Changed in linux (Ubuntu):
status:	Triaged → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

linux-kernel-bugs #202669
[NEW] Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

4.4.0-145-generic Kernel Panic ip6_expire_frag_queue

Bug Description

CVE References

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package