Soft lockup with "block nbdX: Attempted send on closed socket" spam
- Vivid (15.04)
- Bug #1505564
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
High
|
Dan Streetman | ||
Trusty |
Fix Released
|
Undecided
|
Unassigned | ||
Vivid |
Fix Released
|
Undecided
|
Unassigned | ||
Wily |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Some of our nova compute hosts regularly freeze, sometimes for a few hours, with kern.log getting spammed with:
block nbdX: Attempted send on closed socket
and a few "CPU soft lockup" messages (see attached log). This clears up when the queue gets cleared, eg :
block nbdX: queue cleared
trusty hosts with kernel version 3.19.0-30-generic.
---
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Nov 24 12:23 seq
crw-rw---- 1 root audio 116, 33 Nov 24 12:23 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 14.04
IwConfig: Error: [Errno 2] No such file or directory
MachineType: HP ProLiant DL385 G7
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=screen-
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.127.18
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty uec-images
Uname: Linux 3.19.0-36-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 02/02/2014
dmi.bios.vendor: HP
dmi.bios.version: A18
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:
dmi.product.name: ProLiant DL385 G7
dmi.sys.vendor: HP
Related branches
CVE References
Junien F (axino) wrote : BootDmesg.txt | #1 |
tags: | added: apport-collected trusty uec-images |
description: | updated |
Junien F (axino) wrote : CRDA.txt | #2 |
Junien F (axino) wrote : CurrentDmesg.txt | #3 |
Junien F (axino) wrote : Lspci.txt | #4 |
Junien F (axino) wrote : Lsusb.txt | #5 |
Junien F (axino) wrote : ProcCpuinfo.txt | #6 |
Junien F (axino) wrote : ProcInterrupts.txt | #7 |
Junien F (axino) wrote : ProcModules.txt | #8 |
Junien F (axino) wrote : UdevDb.txt | #9 |
Junien F (axino) wrote : UdevLog.txt | #10 |
Junien F (axino) wrote : WifiSyslog.txt | #11 |
Junien F (axino) wrote : | #12 |
Second host now
tags: | added: staging |
description: | updated |
Junien F (axino) wrote : BootDmesg.txt | #13 |
Junien F (axino) wrote : CRDA.txt | #14 |
Junien F (axino) wrote : CurrentDmesg.txt | #15 |
Junien F (axino) wrote : Lspci.txt | #16 |
Junien F (axino) wrote : Lsusb.txt | #17 |
Junien F (axino) wrote : ProcCpuinfo.txt | #18 |
Junien F (axino) wrote : ProcInterrupts.txt | #19 |
Junien F (axino) wrote : ProcModules.txt | #20 |
Junien F (axino) wrote : UdevDb.txt | #21 |
Junien F (axino) wrote : UdevLog.txt | #22 |
Junien F (axino) wrote : WifiSyslog.txt | #23 |
Brad Figg (brad-figg) wrote : Status changed to Confirmed | #24 |
This change was made by a bot.
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
Junien F (axino) wrote : | #25 |
I think that this may be a duplicate of #1500739, the symptoms certainly look the same.
Changed in linux (Ubuntu): | |
assignee: | nobody → Rafael David Tinoco (inaddy) |
Junien F (axino) wrote : BootDmesg.txt | #26 |
description: | updated |
Junien F (axino) wrote : CRDA.txt | #27 |
Junien F (axino) wrote : CurrentDmesg.txt | #28 |
Junien F (axino) wrote : Lspci.txt | #29 |
Junien F (axino) wrote : Lsusb.txt | #30 |
Junien F (axino) wrote : ProcCpuinfo.txt | #31 |
Junien F (axino) wrote : ProcInterrupts.txt | #32 |
Junien F (axino) wrote : ProcModules.txt | #33 |
Junien F (axino) wrote : UdevDb.txt | #34 |
Junien F (axino) wrote : UdevLog.txt | #35 |
Junien F (axino) wrote : WifiSyslog.txt | #36 |
Junien F (axino) wrote : | #37 |
This issue just hit us again, this time I sent an NMI to the server to get a dump. It's available at https:/
apport information post-reboot is available above.
We've been trying to see if the issue appeared somewhere in the 3.13 series, hence the 3.13.0-29-generic kernel version.
Thanks !
Junien F (axino) wrote : | #38 |
I'm just now realizing that the crashdump above may have been taken too late (when the kernel wasn't locked up anymre), because I could ssh to the server when I took it.
I was seeing the "block nbdX: Attempted send on closed socket" kernel log spam on the serial when I sent the NMI, but _perhaps_ these messages were just earlier messages that the serial was still catching up with.
Anyway, I got 2 new dumps, and these 2 were triggered automaticallt by kernel.
Junien F (axino) wrote : | #39 |
First dump + apport (post reboot) below
description: | updated |
Junien F (axino) wrote : BootDmesg.txt | #40 |
Junien F (axino) wrote : CRDA.txt | #41 |
Junien F (axino) wrote : CurrentDmesg.txt | #42 |
Junien F (axino) wrote : Lspci.txt | #43 |
Junien F (axino) wrote : Lsusb.txt | #44 |
Junien F (axino) wrote : ProcCpuinfo.txt | #45 |
Junien F (axino) wrote : ProcInterrupts.txt | #46 |
Junien F (axino) wrote : ProcModules.txt | #47 |
Junien F (axino) wrote : UdevDb.txt | #48 |
Junien F (axino) wrote : UdevLog.txt | #49 |
Junien F (axino) wrote : | #50 |
Junien F (axino) wrote : | #51 |
Second apport+dump below
description: | updated |
Junien F (axino) wrote : BootDmesg.txt | #52 |
Junien F (axino) wrote : CRDA.txt | #53 |
Junien F (axino) wrote : CurrentDmesg.txt | #54 |
Junien F (axino) wrote : Lspci.txt | #55 |
Junien F (axino) wrote : Lsusb.txt | #56 |
Junien F (axino) wrote : ProcCpuinfo.txt | #57 |
Junien F (axino) wrote : ProcInterrupts.txt | #58 |
Junien F (axino) wrote : ProcModules.txt | #59 |
Junien F (axino) wrote : UdevDb.txt | #60 |
Junien F (axino) wrote : UdevLog.txt | #61 |
Junien F (axino) wrote : | #62 |
Junien F (axino) wrote : | #63 |
sha1 sums for all 3 dumps below :
6b63d74566b6df0
3a8cbdd9e51af4f
1ebd57dea13cf65
Junien F (axino) wrote : | #64 |
Upgraded all the kernels to lts-vivid (3.19.0-
description: | updated |
Junien F (axino) wrote : BootDmesg.txt | #65 |
Junien F (axino) wrote : CRDA.txt | #66 |
Junien F (axino) wrote : CurrentDmesg.txt | #67 |
Junien F (axino) wrote : Lspci.txt | #68 |
Junien F (axino) wrote : Lsusb.txt | #69 |
Junien F (axino) wrote : ProcCpuinfo.txt | #70 |
Junien F (axino) wrote : ProcInterrupts.txt | #71 |
Junien F (axino) wrote : ProcModules.txt | #72 |
Junien F (axino) wrote : UdevDb.txt | #73 |
Junien F (axino) wrote : UdevLog.txt | #74 |
Junien F (axino) wrote : WifiSyslog.txt | #75 |
Junien F (axino) wrote : | #76 |
crashdump available at https:/
Junien F (axino) wrote : BootDmesg.txt | #77 |
description: | updated |
Junien F (axino) wrote : CRDA.txt | #78 |
Junien F (axino) wrote : CurrentDmesg.txt | #79 |
Junien F (axino) wrote : Lspci.txt | #80 |
Junien F (axino) wrote : Lsusb.txt | #81 |
Junien F (axino) wrote : ProcCpuinfo.txt | #82 |
Junien F (axino) wrote : ProcInterrupts.txt | #83 |
Junien F (axino) wrote : ProcModules.txt | #84 |
Junien F (axino) wrote : UdevDb.txt | #85 |
Junien F (axino) wrote : UdevLog.txt | #86 |
Junien F (axino) wrote : WifiSyslog.txt | #87 |
Junien F (axino) wrote : | #88 |
Yet another crash, on another node this time (still a 100% Nova compute node). apport information is above, crashdump is at https:/
Thanks !
Rafael David Tinoco (rafaeldtinoco) wrote : | #89 |
Junien, I'm on it right now.. will update here asap.
Rafael David Tinoco (rafaeldtinoco) wrote : | #91 |
I'm attaching the crash tool output from the 3.13 kernel dump.
Much likely related to the situation already found in the following case:
-> https:/
Handled by Chris Arges and I on LKML discussions with Ingo and Linus:
-> http://
FOR NOW, it is LIKELY that I'll rely on already known recommendations for Proliant (including the ones related to X2APIC mode):
-> https:/
So we can TRY TO GUARANTEE that there are no LOST IRQs (IPIs) using the firmware you're using. Hopefully with the proper APIC mode set, like HP recommends, we will not have those IPIs problems.
OBS: Whenever IPIs are lost (we've seen this on some nested KVMs and some buggy HW) we can be locked up in the SMP callback state machine. This means that the state machine looses IPIs ACKs and the state machine loops forever trying to shutdown the CPU for the SMP task queue to continue.
I'll provide SOON a comment with SUGGESTIONS and asking for FEEDBACK.
################
For now, from the 3.13 kernel dump, the most interesting part:
We had 7 CPUs executing the migration kernel thread (for the SMP callback state machine execution):
#### migration tasks (state machine loop)
> 93 2 4 ffff8808147b47d0 RU 0.0 0 0 [migration/4]
> 118 2 9 ffff881814a2c7d0 RU 0.0 0 0 [migration/9]
> 123 2 10 ffff88081404c7d0 RU 0.0 0 0 [migration/10]
> 128 2 11 ffff881814a4c7d0 RU 0.0 0 0 [migration/11]
> 138 2 13 ffff881814a647d0 RU 0.0 0 0 [migration/13]
> 165 2 18 ffff8810149ec7d0 RU 0.0 0 0 [migration/18]
> 195 2 24 ffff881014a647d0 RU 0.0 0 0 [migration/24]
This logic will try to migrate tasks from one CPU to another. In order for that to happen they have to rely on the state machine logic of shutting CPUs down before migrating the tasks (turning off IRQs, etc). The state machine - shutting down the CPUs on phases - relies on the SMP callbacks bellow.
We had 3 CPUs in a part of the kernel that we have already identified to be problematic under certain conditions and/or HW.
** > 17247 1 23 ffff881007055fc0 RU 1.6 7358428 2192548 qemu-system-x86
PID: 17247 TASK: ffff881007055fc0 CPU: 23 COMMAND: "qemu-system-x86"
#0 [ffff88203eac6e58] crash_nmi_callback at ffffffff8103fb72
#1 [ffff88203eac6e68] nmi_handle at ffffffff8171f188
#2 [ffff88203eac6ec8] do_nmi at ffffffff8171f350
#3 [ffff88203eac6ef0] end_repeat_nmi at ffffffff8171e5f1
[exception RIP: generic_
RIP: ffffffff810db712 RSP: ffff8810ea7c96e0 RFLAGS: 00000202
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000202
RDX: ffff8810ea7c96e0 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff810db712 R8: ffffffff810db712 R9: 0000000000000018
R10: ffff8810ea7c96e0 R11: 0000000000000202 R12: ffffffffffffffff
R13: 0000000000000206 R14: 000000007bc87bc6 R15: ffff8814959f76c0
ORIG_RAX: ffff8814959f76c0 CS: 0010 SS: 0018
--- <NMI exception stack> -...
Rafael David Tinoco (rafaeldtinoco) wrote : | #92 |
Changed in linux (Ubuntu): | |
status: | Confirmed → In Progress |
Rafael David Tinoco (rafaeldtinoco) wrote : Re: [Bug 1505564] Re: Soft lockup with "block nbdX: Attempted send on closed socket" spam | #93 |
Hello Junien,
(recommendations with *)
I'm replying to you and to the LP bug so it gets proper documentation.
Under comment #91:
https:/
You can see my kernel dump analysis, where I am showing you that the
OS is stuck in a "migration thread", possibly because of a lack of
IPIs synchronisation (maybe even an IPI being lost). We have already
seen cases like this - specially in nested virtualisation environments
- and this has been discussed in LKML.
Before we move further I need you to follow some kind of "best
practices" for Proliant Servers:
1 - NMIs caused during MWAIT instruction (caused by intel_idle module):
& HP Proliant Servers - Kernel Panic - NMI - DL360 & DL380 - HPWDT module loaded
(https:/
(https:/
* Firmware: Configure a maximum of a C3 c-state for CPU savings (CPU C-STATES)
* Firmware: Disable packed CPU c-state
* Firmware: Disable Cooperative Power Management
* Make sure NOT TO LOAD HPWDT kernel module (LP: #1432837 Fix Released
3.13.0-49.81)
2 - Recently discovered NMIs caused by a BUG in Intel microcode
(https:/
** If you have Intel based Proliant Servers, because of Intel
microcode issue, use at least* 3.13.0-35.61.
3 - X2APIC support for HP Proliant Servers
(https:/
* For Proliant prior to G8 (<= G7) use "nox2apic intremap=off" into grub cmdline
* For Proliant G8 use "intremap=
4 - HP Proliant Latest Firmware
MOST IMPORTANT
Upgrade server firmware to latest version
There were numerous firmware fixes from HP.
---> If we are facing a firmware problem - related to IPIs, the
inter-processor
is reproducible in the latest firmware in order to work together with
HP ROM engineering team.
Summary:
Could you follow all these steps and provide feedback ? I understand
this might take awhile if you have a big number of servers and - if so
- I would take a statistical approach here, by changing only half of
the servers and sticking with the first half as the "control group",
for future comparisons.
Is this feasible ? Looking forward to hearing your feedback.
Best Regards
Rafael Tinoco
Sustaining Engineering
Chris Stratford (chris-gondolin) wrote : | #94 |
Hi Rafael,
I've been continuing Junien's investigations into this problem. The machines have had all the BIOS and firmware updates I could find on HP's website (although in the case of a DL385-G7 the latest appears to be February 2014!) One of them only lasted a day before crashing again.
So, step 2 was to add "nox2apic intremap=off" to the DL385-G7s. I added it to only one of them initially. That machine lasted 9 days before we had another kernel panic ("NMI watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [migration/
I've also added "intremap=
I''m tempted to try upgrading them to linux-image-
Rafael David Tinoco (rafaeldtinoco) wrote : | #95 |
Hello Chris,
Could you clarify the following statement:
"""
So, step 2 was to add "nox2apic intremap=off" to the DL385-G7s. I added it to only one of them initially. That machine lasted 9 days before we had another kernel panic ("NMI watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [migration/
"""
So, I'm not sure if you are "panic'ing on hung tasks" (sysctl option). The way I read this is that the machine showed a soft lockup BUT the kernel did not crash and recovered after some time. This might indicate that, after workload was reduced, the kernel could get back on track with migration kthread. Could you clarify this ?
You did right.
< G8 cmdline == "nox2apic intremap=off"
>= G8 cmdline == "intremap=
So, if the kernel (G7) had a soft lockup warning but had no "hard lockups" (race conditions), then we are good. Judging by the G8, it looks like that after the change it is still running. Could you clarify if you changed the c-states (min and packing) firmware options ?
I would recommend you staying in 3.13 if they show stable after firmware version/options and cmdline were changed. This way we have a way to "compare" things. As long as they don't have HARD lockups, I think we will be good.
Let me know if you need any other clarification.
Cheers!
Rafael Tinoco
Junien F (axino) wrote : | #96 |
Hi Rafael,
For starters, the server Chris mentioned above didn't panic because the kernel.
Then, we're still running 3.19 (all the nodes got rebooted to 3.19.0-33-generic). Let me know if you wish us to get back to 3.13.
I verified that all the firmwares were the most recent ones, and they were.
I rebooted all the nodes with the proper x2apic kernel options. I also disabled all C-States, and also set everything relevant to "performance". You can see the changes here : http://
Unfortunately, even with all this, we had a G7 that panic'ed and crashdump'ed about ~1h after I set it back in the compute pool. You will find the apport and crashdump below.
Let me know what are the next steps.
Thanks !
Junien F (axino) wrote : BootDmesg.txt | #97 |
description: | updated |
Junien F (axino) wrote : CRDA.txt | #98 |
Junien F (axino) wrote : CurrentDmesg.txt | #99 |
Junien F (axino) wrote : Lspci.txt | #100 |
Junien F (axino) wrote : Lsusb.txt | #101 |
Junien F (axino) wrote : ProcCpuinfo.txt | #102 |
Junien F (axino) wrote : ProcInterrupts.txt | #103 |
Junien F (axino) wrote : ProcModules.txt | #104 |
Junien F (axino) wrote : UdevDb.txt | #105 |
Junien F (axino) wrote : UdevLog.txt | #106 |
Junien F (axino) wrote : WifiSyslog.txt | #107 |
Junien F (axino) wrote : | #108 |
apport above, crash dump is at https:/
tags: | added: kernel-key |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
Rafael David Tinoco (rafaeldtinoco) wrote : | #109 |
Thank Junien, I'm downloading the crash dump (10GB) and will update you as soon as I open it.
Rafael David Tinoco (rafaeldtinoco) wrote : | #110 |
Hello Junien,
After your last crash - similar to previous ones - one thing called my attention: For the first time we had one CPU RCU stall detected by another CPU. This made me think that it wasn't only related to the SMP logic - like I believed - but the stall occurred also somewhere else.
----
[ 5792.466770] INFO: rcu_sched detected stalls on CPUs/tasks: { 7} (detected by 15, t=15003 jiffies, g=182379, c=182378, q=0)
----
And this stall happened before Async I/O callbacks started to be suppressed:
----
[ 5793.190218] block nbd6: Attempted send on closed socket
[ 5793.190221] blk_update_request: 1154 callbacks suppressed
[ 5793.190223] blk_update_request: I/O error, dev nbd6, sector 125828992
[ 5793.190226] buffer_io_error: 1151 callbacks suppressed
[ 5793.190227] Buffer I/O error on dev nbd6, logical block 125828992, async page read
[ 5793.190235] block nbd6: Attempted send on closed socket
[ 5793.190237] blk_update_request: I/O error, dev nbd6, sector 125828993
[ 5793.190238] Buffer I/O error on dev nbd6, logical block 125828993, async page read
[ 5793.190242] block nbd6: Attempted send on closed socket
[ 5793.190243] blk_update_request: I/O error, dev nbd6, sector 125828994
[ 5793.190245] Buffer I/O error on dev nbd6, logical block 125828994, async page read
[ 5793.190248] block nbd6: Attempted send on closed socket
----
Digging upstream (from 3.13 to HEAD) I could see there were not a huge amount of fixes:
----
$ git log --pretty=oneline v3.13..HEAD -- drivers/block/nbd.c | wc -l
31
----
For nbd.c and I identified an improvement on nbd timeout handling:
----
commit 7e2893a16d3e710
Author: Markus Pargmann <email address hidden>
Date: Mon Aug 17 08:20:00 2015 +0200
nbd: Fix timeout detection
----
This fix is pretty recent (4.3) and it fit to the case: 3.18 kernel facing the same issue.
Later I found out that Debian had a similar bug:
----
https:/
https:/
----
for kernel 3.16, complaining about messages like this:
----
[ 5793.190242] block nbd6: Attempted send on closed socket
----
And the lack of proper timeout for nbd connections (now based on timeout after IO submission).
SO...
The backport shall be easy* and I'll probably make one PPA containing a 3.18 (+ this patch) available for you tomorrow.
* 2 out of 12 hunks FAILED -- saving rejects to file drivers/
* Debian has a 3.16 version already
Thank you
Rafael Tinoco
Junien F (axino) wrote : | #111 |
Thanks for your update Rafael. Since nova-compute doesn't do anything useful with qemu-nbd anyway, I'm going to try to "soft-disable" it (divert + symlink to /bin/true), and we'll see if we can repro the crashes. I'll keep you posted.
I'll also try your patched kernel as soon as it's ready, of course :)
Junien F (axino) wrote : | #112 |
Hi Rafael,
WIth qemu-nbd symlinked to /bin/true, no crash so far...
Rafael David Tinoco (rafaeldtinoco) wrote : | #113 |
Junien,
I faced minor issues on backport yesterday and today is holiday in Brazil. I'll get back to this soon. Nevertheless, it is good feedback that this "qemu-nbd" workaround is probably making the system more stable.
I'll get back to you soon.
Thank you
Rafael
description: | updated |
Rafael David Tinoco (rafaeldtinoco) wrote : | #114 |
Rafael David Tinoco (rafaeldtinoco) wrote : | #115 |
tags: | added: patch |
Rafael David Tinoco (rafaeldtinoco) wrote : | #116 |
Testing patches I have attached above:
inaddy@
Formatting './test.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 preallocation=
inaddy@
[ 34.348125] nbd: registered device at major 43
[ 317.034493] nbd0: unknown partition table
inaddy@
Device Boot Start End Blocks Id System
/dev/nbd0p1 2048 2097151 1047552 83 Linux
inaddy@
inaddy@
inaddy@
inaddy@
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 1.15586 s, 232 MB/s
Hopefully they won't cause any regression the PPA to be provided soon.
Rafael David Tinoco (rafaeldtinoco) wrote : | #117 |
Hello Junien,
Based on my previous feedbacks, I've created the following PPA:
https:/
With a Trusty HWE kernel (vivid) + 2 patches:
nbd: Restructure debugging prints
nbd: Fix timeout detection
For you to use and provide me feedback.
I've done minor tests and it looks like there are no regressions.
Hopefully these patches will address the problem.
If they do, I'll work on fixing Trusty, Vivid, Wily and Xenial.
Cheers
Rafael Tinoco
PS: I'm still finishing kernel compilation and will copy packages
to the PPA as soon as it is ready (it might take a few min/hours).
Rafael David Tinoco (rafaeldtinoco) wrote : | #118 |
Okay,
PPA is ready:
https:/
Please upgrade kernel to:
linux-lts-vivid - 3.19.0-
By doing:
$ sudo add-apt-repository ppa:inaddy/
$ sudo apt-get update
$ sudo apt-get install linux-image-
And make sure packages are being installed from PPA. Then reboot server using the hotfixed kernel.
I'm looking forward on hearing feedback if this kernel mitigated the issues.
Cheers
Rafael Tinoco
Junien F (axino) wrote : | #119 |
Hi Rafael,
I applied the patch earlier today.
No crash so far, which was nearly impossible before !
This looks very promising, I'll keep you posted tomorrow.
Rafael David Tinoco (rafaeldtinoco) wrote : | #120 |
Junien,
That is good feedback. I also received another request to backport this to 3.13 SO I'll be providing the hotfixed kernel in the same PPA soon (tomorrow morning most likely).
Attaching the 3.13 patches (just for reference since the SRU process requires me to send all those patches to kernel-team mailing list).
Lets see if things continue good. If, by any chance, you are able to test this 3.13 kernel - maybe in another server - please provide me feedback also.
Thank you very much
Cheers
Rafael Tinoco
Rafael David Tinoco (rafaeldtinoco) wrote : | #123 |
Note to self:
The commit being backported to 3.19 and 3.13 has to contain this race fix:
commit dcc909d90ccdbb7
Author: Markus Pargmann <email address hidden>
Date: Tue Oct 6 20:03:54 2015 +0200
nbd: Add locking for tasks
The timeout handling introduced in
introduces a race condition which may lead to killing of tasks that are
not in nbd context anymore. This was not observed or reproducable yet.
This patch adds locking to critical use of task_recv and task_send to
avoid killing tasks that already left the NBD thread functions. This
lock is only acquired if a timeout occures or the nbd device
starts/stops.
Reported-by: Ben Hutchings <email address hidden>
Signed-off-by: Markus Pargmann <email address hidden>
Reviewed-by: Ben Hutchings <email address hidden>
Fixes: 7e2893a16d3e ("nbd: Fix timeout detection")
Signed-off-by: Jens Axboe <email address hidden>
Also.
Junien F (axino) wrote : BootDmesg.txt | #124 |
description: | updated |
Junien F (axino) wrote : CRDA.txt | #125 |
Junien F (axino) wrote : CurrentDmesg.txt | #126 |
Junien F (axino) wrote : Lspci.txt | #127 |
Junien F (axino) wrote : Lsusb.txt | #128 |
Junien F (axino) wrote : ProcCpuinfo.txt | #129 |
Junien F (axino) wrote : ProcInterrupts.txt | #130 |
Junien F (axino) wrote : ProcModules.txt | #131 |
Junien F (axino) wrote : UdevDb.txt | #132 |
Junien F (axino) wrote : UdevLog.txt | #133 |
Junien F (axino) wrote : WifiSyslog.txt | #134 |
Junien F (axino) wrote : | #135 |
Unfortunately, one server managed to crashdump, even with your patched kernel. apport is above, crashdump is at https:/
I've diverted qemu-nbd again.
Please let me know the next steps.
Thanks !
Rafael David Tinoco (rafaeldtinoco) wrote : | #136 |
Junien,
Sorry for the delay. After sometime dealing with some other priorities, I'm coming back to this. I'm downloading the dump and will take a look. Lets see what this bug is related with.
Tks for providing it. Will report something back soon.
Dan Streetman (ddstreet) wrote : | #137 |
I've dl'ed the dump and I'm reviewing it.
Dan Streetman (ddstreet) wrote : | #138 |
Ok, here's my analysis of the latest dump.
There are 3 kernel migrate threads waiting; this is the cause of the softlockup - specifically pid 101 on cpu 13 is where the softlockup (and then panic, due to panic on softlockup enabled) happens, and the other 2 migrate threads (pid 79 and 151) are also waiting. All are waiting for multi_cpu_stop to finish. The way multi_cpu_stop works is: the caller sets up one or more cpus to coordinate stopping; in multi_cpu_stop, the state machine moves from MULTI_STOP_PREPARE through disable irqs, to run (the provided function), to exit when done. However, only the specified cpus (in the cpumask) will run the function. The state machine doesn't proceed to the next step until all cpus have processed the current state.
This is where the problem comes in. In this case, it's a migration of tasks from one numa node to another, via numa rebalancing. In this particular case, there are 3 rebalancing events happening: cpu 3 and cpu 10, cpu 3 and cpu 13, cpu 3 and cpu 20. the migrate threads on cpus 10, 13, and 20 are running multi_cpu_stop, but it's stuck waiting because cpu 3 still has it in its queue.
cpu 3 is writing bytes to the serial port, and currently waiting for confirmation that the serial port write completed. This wait is done via checking the serial port register for CTS, then if it's not set delaying for 1us, and trying again. However, this is all inside a held spinlock, with irqs disabled. So while this serial port r/w is being done, nothing else will run on this cpu. But - the code limits this to 1 second, so presumably it shouldn't lock up the cpu for longer than 1 second or so (I haven't dug too far into this, so the function may be called multiple times with the lock held).
For whatever reason, that serial port r/w seems to be taking a long time. The migrate threads on the other cpus are waiting for it to finish, so that the migrate thread on cpu 3 can run, and move the multi_cpu_stop state machine along. But that doesn't happen in time to avoid the softlockup detector.
The multi_cpu_stop function could arguably use the addition of touch_nmi_
back on cpu 3 (that the others are waiting on), the way that delay is implemented is using the TSC. Unfortunately, the TSC is a generally unreliable clock source, so it's possible there is a problem in the delay function.
To determine that, can you please boot with the "notsc" parameter, which will change the udelay function to use a simple loop instead of the TSC, and reproduce the softlockup?
Changed in linux (Ubuntu): | |
assignee: | Rafael David Tinoco (inaddy) → Dan Streetman (ddstreet) |
Junien F (axino) wrote : | #139 |
Hi Dan,
Thanks for your investigation. Sorry for the delay, but finally I managed to reboot the compute nodes with the "notsc" kernel parameter. I also disabled the qemu-nbd workaround.
Once that was done, it didn't take long for a node to crash, which would indicate that notsc didn't fix the problem. However, the host got stuck and didn't dump anything. OK then. It happened a second time a few minutes after on a different host, so I thought I'd investigate this more.
It turns out, the kernel booted through kexec fails booting probably because of the notsc option : https:/
I'm a bit worried about the following line :
[ 0.000000] tsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC completely
which is also displayed during "regular" boots (eg not through kexec).
I guess I can remove "notsc" from the kexec command line, but this will take additional time. I thought I'd let you know the current status in the meantime.
Cheers
Dan Streetman (ddstreet) wrote : | #140 |
> It turns out, the kernel booted through kexec fails booting probably because of the notsc option :
> https:/
hmm, that's weird, but if notsc is all that changed i assume it is the problem.
> I'm a bit worried about the following line :
> [ 0.000000] tsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC completely
that's normal with notsc, the tsc is still there, it's just not used for the udelay function. but if it doesn't help the problem, no need to keep it.
> I guess I can remove "notsc" from the kexec command line, but this will take additional time.
> I thought I'd let you know the current status in the meantime.
ok thanks. I'll be out next week for the holidays, but continue looking at this Jan 1.
Junien F (axino) wrote : | #141 |
Re-reading comment #318 Dan, I realize that we may be investigating a symptom and not the root cause.
Whenever the soft-lockup happens, the serial console does get flooded "block nbdX: Attempted send on closed socket". If the serial console getting flooded causes soft lockups, then it is indeed a concerning issue, but shouldn't we focus, in this bug, on making nbd not flood the console in the first place ?
Dan Streetman (ddstreet) wrote : | #142 |
Well, yes I agree, it does look like the serial port causing the softlockup is probably separate - but caused by - the nbd closed socket errors. However, the serial port output definitely shouldn't be causing a softlockup - no matter how much data it has to send, the serial port driver in the kernel should be scheduling itself during operation, so that it doesn't hog a single cpu for a long time. It's more likely that the general system "freezing" you are seeing is due to the serial port driver refusing to schedule off its cpu, and not any problem with the nbdX failure.
I'll look into the nbd code also though, to see where that error is coming from and what that problem may be.
Nick Moffitt (nick-moffitt) wrote : | #143 |
This problem has caused more serious damage recently. When nbd dies and printk()s like mad, the serial console is not fast enough to display it.
The kernel keeps allocating buffer space for serial output, which we see as 13G kmalloc-256 or kmalloc-512 kernel threads.
Eventually the OOMkiller tries to free up space, but it can only kill userspace programs so ultimately the system dies altogether.
This is more dire than mere CPU load or lockup warning messages.
Nick Moffitt (nick-moffitt) wrote : | #144 |
This memory leak we have so far only seen on arm64, to be clear.
Dan Streetman (ddstreet) wrote : | #145 |
axino or nick, can either of you attach an sosreport from an affected system? The crashdump doesn't include any userspace data so I can't see what exactly the qemu-nbd userspace program is doing, nor can i see what params it's started with. I'll need that info to be able to debug the qemu-nbd side of this.
Dan Streetman (ddstreet) wrote : | #146 |
Ok, nm about the sosreport - I got the info from some older emails from axino, nova is using qemu-nbd to locally mount images and access the partitions inside them. I was able to trivially reproduce this simply by creating an image, attaching it with qemu-nbd to /dev/nbd0, partitioning it and mkfs its p1 and then mounting it, then while copying a file to it, performing qemu-nbd -d to un-attach it to /dev/nbd0. That causes the spam of "Attempted..." error messages.
So this appears to be a simple case of nova calling qemu-nbd -d while there is still I/O to the image. The right thing to do is simply ratelimit the error messages (and they really should be anyway, as they're printing directly inside a loop). The messages themselves do not indicate any kernel error, simply that the nbd device was removed while being written to.
Can you try this kernel PPA to see if it fixes the problem? You will still see the error messages, but only a few lines since they'll be ratelimited.
Of course there is still the (probably more serious) problem of the serial port driver hanging a cpu and eating up memory; that probably deserves its own bug, since it's caused by this, but a separate issue.
Junien F (axino) wrote : | #147 |
Except that what happens on the compute nodes is that, when creating an instance, nova attaches the image with qemu-nbd (say to /dev/nbd0), and then tries to mount /dev/nbd0 somewhere, except that doesn't work because the image has partitions, and so the root device is actually on /dev/nbd0p1. So the "mount" commands return an error, and nova then detaches the image with qemu-nbd -d.
Overall, as far as nova logs show, there is 0 write on the nbd device and very few reads (probably just the MBR ?). Could that still cause inflight I/O when qemu-nbd -d is ran ?
I'll happily test your kernel PPA, but as far as I can see, you don't mention where it actually is :)
Thanks !
Dan Streetman (ddstreet) wrote : | #148 |
> Overall, as far as nova logs show, there is 0 write on the nbd device and very few reads (probably just the MBR ?).
> Could that still cause inflight I/O when qemu-nbd -d is ran ?
"very few" > 0
:-)
and it could be coming from elsewhere...but we don't need to account for where the IO is coming from, as the simple fact that it's there is enough. Also it's not just data IO, it's any "request", including metadata/control requests. Network-backed devices can disappear at any time, and the driver must be able to handle that. Spamming endless messages to the log isn't a good idea in that case.
To clarify the exact code in this situation:
while ((req = blk_fetch_
...
if (unlikely(
...
}
so, as soon as the connection (socket) is gone, there will be an "Attempted..." message printed for every request in the queue, as the queue is cleared.
> I'll happily test your kernel PPA, but as far as I can see, you don't mention where it actually is :)
ha, forgot to paste it in, sorry :-)
Junien F (axino) wrote : | #149 |
I applied the patch, and it saved a reboot twice already, I think. dmesg from one server : http://
I have to stop the tests for the weekend though, I'll resume on Monday.
Junien F (axino) wrote : | #150 |
I resumed the tests on Monday, and so far we're looking good. Your change prevented ~10 locks so far, it would seem.
Dan Streetman (ddstreet) wrote : | #151 |
Great. I'll send the patch upstream, and open a new bug for the serial port hanging issue. Thanks!
tags: | added: canonical-bootstack |
Dan Streetman (ddstreet) wrote : | #152 |
opened bug 1534216 to track the serial port issue.
Launchpad Janitor (janitor) wrote : | #153 |
This bug was fixed in the package linux - 4.4.0-6.21
---------------
linux (4.4.0-6.21) xenial; urgency=low
[ Tim Gardner ]
* Release Tracking Bug
- LP: #1546283
* Naples/Zen, NTB Driver (LP: #1542071)
- [Config] CONFIG_NTB_AMD=m
- NTB: Add support for AMD PCI-Express Non-Transparent Bridge
* [Hyper-V] kernel panic occurs when installing Ubuntu Server x32 (LP: #1495983)
- SAUCE: storvsc: use small sg_tablesize on x86
* Enable arm64 emulation of removed ARMv7 instructions (LP: #1545542)
- [Config] CONFIG_
* Surelock-GA2:kernel panic/ exception @ pcibios_
- powerpc/eeh: Fix stale cached primary bus
* Miscellaneous Ubuntu changes
- SAUCE: fs: Add user namesapace member to struct super_block
- SAUCE: fs: Limit file caps to the user namespace of the super block
- SAUCE: Smack: Add support for unprivileged mounts from user namespaces
- SAUCE: block_dev: Support checking inode permissions in lookup_bdev()
- SAUCE: block_dev: Check permissions towards block device inode when mounting
- SAUCE: fs: Treat foreign mounts as nosuid
- SAUCE: selinux: Add support for unprivileged mounts from user namespaces
- SAUCE: userns: Replace in_userns with current_in_userns
- SAUCE: Smack: Handle labels consistently in untrusted mounts
- SAUCE: fs: Check for invalid i_uid in may_follow_link()
- SAUCE: cred: Reject inodes with invalid ids in set_create_
- SAUCE: fs: Refuse uid/gid changes which don't map into s_user_ns
- SAUCE: fs: Update posix_acl support to handle user namespace mounts
- SAUCE: fs: Ensure the mounter of a filesystem is privileged towards its inodes
- SAUCE: fs: Don't remove suid for CAP_FSETID in s_user_ns
- SAUCE: fs: Allow superblock owner to access do_remount_sb()
- SAUCE: capabilities: Allow privileged user in s_user_ns to set security.* xattrs
- SAUCE: fuse: Add support for pid namespaces
- SAUCE: fuse: Support fuse filesystems outside of init_user_ns
- SAUCE: fuse: Restrict allow_other to the superblock's namespace or a descendant
- SAUCE: fuse: Allow user namespace mounts
- SAUCE: mtd: Check permissions towards mtd block device inode when mounting
- SAUCE: fs: Update i_[ug]id_
- SAUCE: quota: Convert ids relative to s_user_ns
- SAUCE: evm: Translate user/group ids relative to s_user_ns when computing HMAC
- SAUCE: fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
- SAUCE: quota: Treat superblock owner as privilged
- SAUCE: ima/evm: Allow root in s_user_ns to set xattrs
- SAUCE: block_dev: Forbid unprivileged mounting when device is opened for writing
- SAUCE: ext4: Add support for unprivileged mounts from user namespaces
- SAUCE: ext4: Add module parameter to enable user namespace mounts
- SAUCE: fuse: Add module parameter to enable user namespace mounts
* Miscellaneous upstream changes
- megaraid: Fix possible NULL pointer deference in mraid_mm_ioctl
- libahci: Implement the capability to override th...
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Released |
Changed in linux (Ubuntu Trusty): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Vivid): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Wily): | |
status: | New → Fix Committed |
Brad Figg (brad-figg) wrote : | #154 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-trusty |
tags: | added: verification-needed-vivid |
Brad Figg (brad-figg) wrote : | #155 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-wily |
Brad Figg (brad-figg) wrote : | #156 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: |
added: verification-done-trusty removed: verification-needed-trusty |
Dan Streetman (ddstreet) wrote : | #157 |
verification can be done with this script:
#!/bin/bash
modprobe nbd
qemu-nbd -d /dev/nbd0
truncate /tmp/testfile -s 20G
qemu-nbd -c /dev/nbd0 /tmp/testfile
for n in $( seq 1 250 ) ; do
echo $n
( dd if=/dev/zero of=/dev/nbd0 bs=1 & )
done
qemu-nbd -d /dev/nbd0
after running that, on an unpatched system the dmesg will show a large number (~100 or more) of messages like:
[ 70.408246] block nbd0: Attempted send on closed socket
with a patched kernel, the dmesg will show a ratelimited number (~10) of those messages.
This has been verified on trusty 3.13, vivid 3.19, and wily 4.2
tags: |
added: verification-done-vivid verification-done-wily removed: verification-needed-vivid verification-needed-wily |
Junien F (axino) wrote : | #158 |
Thanks !
Launchpad Janitor (janitor) wrote : | #159 |
This bug was fixed in the package linux - 4.2.0-34.39
---------------
linux (4.2.0-34.39) wily; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1555821
[ Florian Westphal ]
* SAUCE: [nf] netfilter: x_tables: check for size overflow
- LP: #1555353
* SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
userspace
- LP: #1555338
linux (4.2.0-33.38) wily; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1554649
[ Upstream Kernel Changes ]
* Revert "drm/radeon: call hpd_irq_event on resume"
- LP: #1554608
* cxl: Fix PSL timebase synchronization detection
- LP: #1532914
linux (4.2.0-32.37) wily; urgency=low
[ Kamal Mostafa ]
* Release Tracking Bug
- LP: #1550045
[ Kamal Mostafa ]
* Merged back Ubuntu-4.2.0-31.36
linux (4.2.0-31.36) wily; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1548579
[ Andy Whitcroft ]
* [Debian] hv: hv_set_ifconfig -- convert to python3
- LP: #1506521
* [Debian] hv: hv_set_ifconfig -- switch to approved indentation
- LP: #1540586
* [Debian] hv: hv_set_ifconfig -- fix numerous parameter handling issues
- LP: #1540586
[ Carol L Soto ]
* SAUCE: IB/IPoIB: Do not set skb truesize since using one linearskb
- LP: #1541326
[ Dan Streetman ]
* SAUCE: nbd: ratelimit error msgs after socket close
- LP: #1505564
[ Tim Gardner ]
* Revert "SAUCE: (noup) cxlflash: Fix to avoid virtual LUN failover
failure"
- LP: #1541635
* Revert "SAUCE: (noup) cxlflash: Fix to escalate LINK_RESET also on port
1"
- LP: #1541635
* [Config] ARMV8_DEPRECATED=y
- LP: #1545542
[ Upstream Kernel Changes ]
* x86/xen/p2m: hint at the last populated P2M entry
- LP: #1542941
* mm: add dma_pool_zalloc() call to DMA API
- LP: #1543737
* sctp: Prevent soft lockup when sctp_accept() is called during a timeout
event
- LP: #1543737
* xen-netback: respect user provided max_queues
- LP: #1543737
* xen-netfront: respect user provided max_queues
- LP: #1543737
* xen-netfront: update num_queues to real created
- LP: #1543737
* iio: adis_buffer: Fix out-of-bounds memory access
- LP: #1543737
* KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
- LP: #1543737
* KVM: PPC: Fix ONE_REG AltiVec support
- LP: #1543737
* x86/irq: Call chip->irq_
- LP: #1543737
* drm/amdgpu: fix tonga smu resume
- LP: #1543737
* perf kvm record/report: 'unprocessable sample' error while
recording/
- LP: #1543737
* hrtimer: Handle remaining time proper for TIME_LOW_RES
- LP: #1543737
* timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
- LP: #1543737
* posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
- LP: #1543737
* itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
- LP: #1543737
* drm/amdgpu: Use drm_calloc_large for VM page_tables array
- LP: #1543737
* drm/amdgpu: fix amdgpu_
- LP: #1543737
* drm/radeon: properly byte swap vce firmware setup
- LP: #1543737
...
Changed in linux (Ubuntu Wily): | |
status: | Fix Committed → Fix Released |
Launchpad Janitor (janitor) wrote : | #160 |
This bug was fixed in the package linux - 3.19.0-56.62
---------------
linux (3.19.0-56.62) vivid; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1555832
[ Florian Westphal ]
* SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
userspace
- LP: #1555338
linux (3.19.0-55.61) vivid; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1554708
[ Upstream Kernel Changes ]
* Revert "drm/radeon: call hpd_irq_event on resume"
- LP: #1554608
linux (3.19.0-54.60) vivid; urgency=low
[ Luis Henriques ]
* Release Tracking Bug
- LP: #1552337
[ Upstream Kernel Changes ]
* Revert "firmware: dmi_scan: Fix UUID endianness for SMBIOS >= 2.6"
- LP: #1551419
linux (3.19.0-53.59) vivid; urgency=low
[ Kamal Mostafa ]
* Release Tracking Bug
- LP: #1550576
[ Kamal Mostafa ]
* Merged back 3.19.0-52.58
linux (3.19.0-52.58) vivid; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1548548
[ Dan Streetman ]
* SAUCE: nbd: ratelimit error msgs after socket close
- LP: #1505564
[ Upstream Kernel Changes ]
* Revert "ACPI / LPSS: allow to use specific PM domain during ->probe()"
- LP: #1542457
* Revert "workqueue: make sure delayed work run in local cpu"
- LP: #1546320
* net: ipmr: fix static mfc/dev leaks on table destruction
- LP: #1542457
* drm/nouveau/nv46: Change mc subdev oclass from nv44 to nv4c
- LP: #1542457
* ovl: allow zero size xattr
- LP: #1542457
* ovl: use a minimal buffer in ovl_copy_xattr
- LP: #1542457
* [media] vb2: fix a regression in poll() behavior for output,streams
- LP: #1542457
* [media] gspca: ov534/topro: prevent a division by 0
- LP: #1542457
* [media] media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
- LP: #1542457
* tools lib traceevent: Fix output of %llu for 64 bit values read on 32
bit machines
- LP: #1542457
* KVM: x86: expose MSR_TSC_AUX to userspace
- LP: #1542457
* KVM: x86: correctly print #AC in traces
- LP: #1542457
* drm/radeon: call hpd_irq_event on resume
- LP: #1542457
* xhci: refuse loading if nousb is used
- LP: #1542457
* arm64: Clear out any singlestep state on a ptrace detach operation
- LP: #1542457
* time: Avoid signed overflow in timekeeping_
- LP: #1542457
* ovl: root: copy attr
- LP: #1542457
* Bluetooth: Add support of Toshiba Broadcom based devices
- LP: #1522949, #1542457
* rtlwifi: fix memory leak for USB device
- LP: #1542457
* wlcore/wl12xx: spi: fix oops on firmware load
- LP: #1542457
* ovl: check dentry positiveness in ovl_cleanup_
- LP: #1542457
* EDAC, mc_sysfs: Fix freeing bus' name
- LP: #1542457
* EDAC: Robustify workqueues destruction
- LP: #1542457
* arm64: mm: ensure that the zero page is visible to the page table
walker
- LP: #1542457
* powerpc: Make value-returning atomics fully ordered
- LP: #1542457
* powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
- LP: #1542457
* dm space map metadata: remove unused variable in brb_pop()
- LP: #1542457
* dm thi...
Changed in linux (Ubuntu Vivid): | |
status: | Fix Committed → Fix Released |
Launchpad Janitor (janitor) wrote : | #161 |
This bug was fixed in the package linux - 3.13.0-83.127
---------------
linux (3.13.0-83.127) trusty; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1555839
[ Florian Westphal ]
* SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
userspace
- LP: #1555338
linux (3.13.0-82.126) trusty; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1554732
[ Upstream Kernel Changes ]
* Revert "drm/radeon: call hpd_irq_event on resume"
- LP: #1554608
* net: generic dev_disable_lro() stacked device handling
- LP: #1547680
linux (3.13.0-81.125) trusty; urgency=low
[ Luis Henriques ]
* Release Tracking Bug
- LP: #1552316
[ Upstream Kernel Changes ]
* Revert "firmware: dmi_scan: Fix UUID endianness for SMBIOS >= 2.6"
- LP: #1551419
* bcache: Fix a lockdep splat in an error path
- LP: #1551327
linux (3.13.0-80.124) trusty; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1548519
[ Andy Whitcroft ]
* [Debian] hv: hv_set_ifconfig -- convert to python3
- LP: #1506521
* [Debian] hv: hv_set_ifconfig -- switch to approved indentation
- LP: #1540586
* [Debian] hv: hv_set_ifconfig -- fix numerous parameter handling issues
- LP: #1540586
[ Dan Streetman ]
* SAUCE: nbd: ratelimit error msgs after socket close
- LP: #1505564
[ Upstream Kernel Changes ]
* Revert "workqueue: make sure delayed work run in local cpu"
- LP: #1546320
* [media] gspca: ov534/topro: prevent a division by 0
- LP: #1542497
* [media] media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
- LP: #1542497
* tools lib traceevent: Fix output of %llu for 64 bit values read on 32
bit machines
- LP: #1542497
* KVM: x86: correctly print #AC in traces
- LP: #1542497
* drm/radeon: call hpd_irq_event on resume
- LP: #1542497
* xhci: refuse loading if nousb is used
- LP: #1542497
* arm64: Clear out any singlestep state on a ptrace detach operation
- LP: #1542497
* time: Avoid signed overflow in timekeeping_
- LP: #1542497
* rtlwifi: fix memory leak for USB device
- LP: #1542497
* wlcore/wl12xx: spi: fix oops on firmware load
- LP: #1542497
* EDAC, mc_sysfs: Fix freeing bus' name
- LP: #1542497
* EDAC: Don't try to cancel workqueue when it's never setup
- LP: #1542497
* EDAC: Robustify workqueues destruction
- LP: #1542497
* powerpc: Make value-returning atomics fully ordered
- LP: #1542497
* powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
- LP: #1542497
* dm space map metadata: remove unused variable in brb_pop()
- LP: #1542497
* dm thin: fix race condition when destroying thin pool workqueue
- LP: #1542497
* futex: Drop refcount if requeue_pi() acquired the rtmutex
- LP: #1542497
* drm/radeon: clean up fujitsu quirks
- LP: #1542497
* mmc: sdio: Fix invalid vdd in voltage switch power cycle
- LP: #1542497
* mmc: sdhci: Fix sdhci_runtime_
- LP: #1542497
* udf: limit the maximum number of indirect extents in a row
- LP: #1542497
* nfs: Fix race in __update_
Changed in linux (Ubuntu Trusty): | |
status: | Fix Committed → Fix Released |
Paul Gear (paulgear) wrote : | #162 |
For posterity: If https:/
apport information