kernel panic -not syncing: Fatal exception: panic_on_oops

Bug #1708399 reported by bugproxy on 2017-08-03
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
High
Canonical Kernel Team
linux (Ubuntu)
High
Skipper Bug Screeners
Xenial
High
Stefan Bader
Zesty
High
Stefan Bader

Bug Description

SRU justification:

Impact: A race in context flushing is causing a kernel panic on the s390x architecture.

Fix: Using a set of 3 patches (all restricted to arch code), one already upstream and the other 2 pending on linux-next. Regression risk should be low (limited to arch code and tested).

Testcase: see below

---

== Comment: #0 - QI YE <email address hidden> - 2017-08-02 04:11:25 ==
---Problem Description---
Ubuntu got kernel panic

---uname output---
#110-Ubuntu SMP Tue Jul 18 12:56:43 UTC 2017 s390x s390x s390x GNU/Linux

---Debugger Data---
PID: 10991 TASK: 19872a0e8 CPU: 2 COMMAND: "hyperkube"
 LOWCORE INFO:
  -psw : 0x0004c00180000000 0x0000000000115fa6
  -function : pcpu_delegate at 115fa6
  -prefix : 0x7fe42000
  -cpu timer: 0x7ffab2827828aa50
  -clock cmp: 0xd2eb8b31445e4200
  -general registers:
     0x0004e00100000000 0x00000000001283b6
     0x0000c00100000000 0x000000008380fcb8
     0x0000000000115f9e 0x000000000056f6e2
     0x0000000000000004 0x0000000000cf9070
     0x00000001f3bfc000 0x0000000000112fd8
     0x00000001c72bb400 0x0000000000000002
     0x000000007fffc000 0x00000000007c9ef0
     0x0000000000115f9e 0x000000008380fc18
  -access registers:
     0x000003ff 0x7ffff910 0000000000 0000000000
     0000000000 0000000000 0000000000 0000000000
     0000000000 0000000000 0000000000 0000000000
     0000000000 0000000000 0000000000 0000000000
  -control registers:
     0x0000000014066a12 0x000000007e6d81c7
     0x0000000000011140 000000000000000000
     0x0000000000002aef 0x0000000000000400
     0x0000000050000000 0x000000007e6d81c7
     000000000000000000 000000000000000000
     000000000000000000 000000000000000000
     000000000000000000 0x0000000000cfc007
     0x00000000db000000 0x0000000000011280
  -floating point registers:
     0x409c7e2580000000 0x401de4e000000000
     000000000000000000 0x3fd24407ab0e073a
     0x3ff0000000000000 0x3fee666666666666
     0x3fef218f8a7a41a0 0x3fee666666666666
     0x0000000000800000 000000000000000000
     0x000003ff7f800000 0x000002aa4940e9e0
     0x000000000000d401 0x000003ffe81fe110
     000000000000000000 0x000003fff2cfe638

 #0 [8380fc78] smp_find_processor_id at 1160f8
 #1 [8380fc90] machine_kexec at 1135d4
 #2 [8380fcb8] crash_kexec at 1fbb8a
 #3 [8380fd88] panic at 27d0e0
 #4 [8380fe28] die at 1142cc
 #5 [8380fe90] do_low_address at 12215e
 #6 [8380fea8] pgm_check_handler at 7c2ab4
 PSW: 0705200180000000 000002aa267e0e42 (user space)
 GPRS: 0000000000000000 0000000000000000 000002aa2c4fd690 0000000000000001
       000002aa2c4fd690 000003ff7fffee38 0000000000000000 0000000000000002
       0000000000029c0f 000000c42001ea00 0000000000000001 0000000000000001
       000000c42001c5c8 000000c42082c1a0 000002aa2666325e 000003ff7fffed90

Contact Information = Chee Ye / <email address hidden>

Stack trace output:
 no

Oops output:
 [43200.761465] docker0: port 10(vethb9132e9) entered forwarding state
[50008.560926] hrtimer: interrupt took 1698076 ns
[123483.768984] systemd[1]: apt-daily.timer: Adding 7h 34min 22.582204s random time.
[123483.930058] systemd[1]: apt-daily.timer: Adding 2h 18min 14.857162s random time.
[123484.064879] systemd[1]: apt-daily.timer: Adding 10h 46min 2.301756s random time.
[123484.824760] systemd[1]: apt-daily.timer: Adding 6h 16min 22.178655s random time.
[153113.703126] conntrack: generic helper won't handle protocol 47. Please consider loading the specific helper module.
[477085.704538] Low-address protection: 0004 ilc:2 [#1] SMP
[477085.704551] Modules linked in: xt_physdev veth xt_recent xt_comment xt_mark xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype nf_nat br_netfilter bridge stp llc aufs ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 qeth_l2 sha256_s390 qeth sha1_s390 qdio sha_common ccwgroup vmur dasd_eckd_mod dasd_mod
[477085.705522] CPU: 2 PID: 10991 Comm: hyperkube Not tainted 4.4.0-87-generic #110-Ubuntu
[477085.705525] task: 000000019872a0e8 ti: 000000008380c000 task.ti: 000000008380c000
[477085.705529] User PSW : 0705200180000000 000002aa267e0e42
[477085.705532] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 EA:3
                User GPRS: 0000000000000000 0000000000000000 000002aa2c4fd690 0000000000000001
[477085.705539] 000002aa2c4fd690 000003ff7fffee38 0000000000000000 0000000000000002
[477085.705553] 0000000000029c0f 000000c42001ea00 0000000000000001 0000000000000001
[477085.705554] 000000c42001c5c8 000000c42082c1a0 000002aa2666325e 000003ff7fffed90
[477085.705578] User Code: 000002aa267e0e30: e340f0080004 lg %r4,8(%r15)
                           000002aa267e0e36: e330f0100014 lgf %r3,16(%r15)
                          #000002aa267e0e3c: e36040000014 lgf %r6,0(%r4)
                          >000002aa267e0e42: ba634000 cs %r6,%r3,0(%r4)
                           000002aa267e0e46: a774fffe brc 7,2aa267e0e42
                           000002aa267e0e4a: e360f0180050 sty %r6,24(%r15)
                           000002aa267e0e50: 07fe bcr 15,%r14
                           000002aa267e0e52: 0000 unknown
[477085.705596] Last Breaking-Event-Address:
[477085.705599] [<000002aa26663258>] 0x2aa26663258
[477085.705600]
[477085.705602] Kernel panic - not syncing: Fatal exception: panic_on_oops

System Dump Location:
 There are 4 vCPU defined. I can see hyperkube executed on two CPUs and then got kernel panic. It may be related to the TLB entry flush on the two CPUs.

CPU 0 RUNQUEUE: 1ea5a8c00
  CURRENT: PID: 0 TASK: bb1528 COMMAND: "swapper/0"

  RT PRIO_ARRAY: 1ea5a8db0
     [no tasks queued]
  CFS RB_ROOT: 1ea5a8c98
     [no tasks queued]

CPU 1 RUNQUEUE: 1ea5b9c00
  CURRENT: PID: 0 TASK: 1e94162b8 COMMAND: "swapper/1"
  RT PRIO_ARRAY: 1ea5b9db0
     [no tasks queued]
  CFS RB_ROOT: 1ea5b9c98
     [120] PID: 23421 TASK: 1c9368af8 COMMAND: "PipelineService"
     [120] PID: 10957 TASK: 1987336d8 COMMAND: "hyperkube"

CPU 2 RUNQUEUE: 1ea5cac00
  CURRENT: PID: 10991 TASK: 19872a0e8 COMMAND: "hyperkube"
  RT PRIO_ARRAY: 1ea5cadb0
     [no tasks queued]
  CFS RB_ROOT: 1ea5cac98
     [no tasks queued]

CPU 3 RUNQUEUE: 1ea5dbc00
  CURRENT: PID: 10975 TASK: 198a30000 COMMAND: "hyperkube"
  RT PRIO_ARRAY: 1ea5dbdb0
     [no tasks queued]
  CFS RB_ROOT: 1ea5dbc98
     [120] PID: 21614 TASK: 1cbee57c0 COMMAND: "IngestServiceCl"

== Comment: #1 - QI YE <email address hidden> - 2017-08-02 04:20:02 ==
The problem happened randomly. Not pattern has been figured out yet.

It also happens on below kernel levels.
- 4.4.0-78-generic #99
- 4.4.0-83-generic

== Comment: #2 - Heinz-Werner Seeck <email address hidden> - 2017-08-02 08:25:06 ==
@QI YE: Please provide the use case of this problem report. And add dumps and dbginfo , sosreports as attachment. For me it is not clear which use case this problems generates.
Many thanks in advance

== Comment: #3 - QI YE <email address hidden> - 2017-08-02 08:44:01 ==
(In reply to comment #2)
> @QI YE: Please provide the use case of this problem report. And add dumps
> and dbginfo , sosreports as attachment. For me it is not clear which use
> case this problems generates.
> Many thanks in advance

Heinz-Werner, what do you mean by "use case"? Could you elaborate it? If you are referring to what application caused this problem. We have machine learning running on Ubuntu on the IBM Z community cloud.

The dump file is big, any suggestion of the location to upload the dump file?

== Comment: #4 - QI YE <email address hidden> - 2017-08-02 08:50:32 ==
sosreport

CVE References

bugproxy (bugproxy) wrote : dbginfo

Default Comment by Bridge

tags: added: architecture-s39064 bugnameltc-157227 severity-critical targetmilestone-inin16042
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → kernel-package (Ubuntu)

------- Comment From <email address hidden> 2017-08-03 05:36 EDT-------
DUMP attached here: https://ibm.ent.box.com/folder/34249914326

Frank Heimes (fheimes) on 2017-08-03
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Frank Heimes (fheimes) wrote :

according to the logs it might affect more docker than the kernel:

- docker is used on that system (at least since Aug 1st)
- but I cannot find that docker.io is used - which docker version is in use?
  dpkg log doesn't show me that docker.io got installed

- seems to be a docker issue that is causing a crash
 occurs multiple times per second
 Aug 2 06:26:51 zml025 dockerd[6150]: time="2017-08-02T06:26:51.327342000-04:00" level=error msg="Handler for GET /
 containers/18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d/json returned error: No such container:
  18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d"

- the kernel log also shows some issues:
  the following is a known docker issue, seems to be caused by privileged containers:
  https://github.com/moby/moby/issues/21081
  https://github.com/kubernetes/kubernetes/issues/27885
 Aug 1 03:17:19 zml025 kernel: [15074.536567] aufs au_opts_verify:1597:dockerd[6723]:
 dirperm1 breaks the protection by the permission bits on the lower branch

- kernel log:
 another issue also known by docker:
 https://github.com/moby/moby/issues/14807
 Aug 1 03:17:19 zml025 kernel: [15074.649870] device vetha553aad entered promiscuous mode
 Aug 1 03:17:19 zml025 kernel: [15074.649937] IPv6: ADDRCONF(NETDEV_UP): vetha553aad: link is not ready
 Aug 1 03:17:19 zml025 kernel: [15074.649939] docker0: port 1(vetha553aad) entered forwarding state
 Aug 1 03:17:19 zml025 kernel: [15074.649943] docker0: port 1(vetha553aad) entered forwarding state
 Aug 1 03:17:19 zml025 kernel: [15074.650259] docker0: port 1(vetha553aad) entered disabled state
 Aug 1 03:17:19 zml025 kernel: [15075.283565] eth0: renamed from vethd76add0
 Aug 1 03:17:20 zml025 kernel: [15075.334494] IPv6: ADDRCONF(NETDEV_CHANGE): vetha553aad: link becomes ready
 Aug 1 03:17:20 zml025 kernel: [15075.334520] docker0: port 1(vetha553aad) entered forwarding state
 Aug 1 03:17:20 zml025 kernel: [15075.334527] docker0: port 1(vetha553aad) entered forwarding state
 Aug 1 03:17:20 zml025 kernel: [15075.334549] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready

- duplicate IPv6 addresses needs to be fixed
  Aug 1 03:17:20 zml025 kernel: [15075.611749] IPv6: eth0: IPv6 duplicate address fe80::42:acff:fe11:2 detected!

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-03 09:17 EDT-------
This is the docker version:
Docker version 1.12.6, build 78d1802

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-03 09:20 EDT-------
The z/VM version is z/VM 6.3

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-03 09:39 EDT-------
(In reply to comment #14)
> according to the logs it might affect more docker than the kernel:
>
> - docker is used on that system (at least since Aug 1st)
> - but I cannot find that docker.io is used - which docker version is in use?
> dpkg log doesn't show me that docker.io got installed
>
> - seems to be a docker issue that is causing a crash
> occurs multiple times per second
> Aug 2 06:26:51 zml025 dockerd[6150]:
> time="2017-08-02T06:26:51.327342000-04:00" level=error msg="Handler for GET /
> containers/18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d/
> json returned error: No such container:
> 18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d"
>
> - the kernel log also shows some issues:
> the following is a known docker issue, seems to be caused by privileged
> containers:
> https://github.com/moby/moby/issues/21081
> https://github.com/kubernetes/kubernetes/issues/27885
> Aug 1 03:17:19 zml025 kernel: [15074.536567] aufs
> au_opts_verify:1597:dockerd[6723]:
> dirperm1 breaks the protection by the permission bits on the lower branch
>
> - kernel log:
> another issue also known by docker:
> https://github.com/moby/moby/issues/14807
> Aug 1 03:17:19 zml025 kernel: [15074.649870] device vetha553aad entered
> promiscuous mode
> Aug 1 03:17:19 zml025 kernel: [15074.649937] IPv6: ADDRCONF(NETDEV_UP):
> vetha553aad: link is not ready
> Aug 1 03:17:19 zml025 kernel: [15074.649939] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug 1 03:17:19 zml025 kernel: [15074.649943] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug 1 03:17:19 zml025 kernel: [15074.650259] docker0: port 1(vetha553aad)
> entered disabled state
> Aug 1 03:17:19 zml025 kernel: [15075.283565] eth0: renamed from vethd76add0
> Aug 1 03:17:20 zml025 kernel: [15075.334494] IPv6: ADDRCONF(NETDEV_CHANGE):
> vetha553aad: link becomes ready
> Aug 1 03:17:20 zml025 kernel: [15075.334520] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug 1 03:17:20 zml025 kernel: [15075.334527] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug 1 03:17:20 zml025 kernel: [15075.334549] IPv6: ADDRCONF(NETDEV_CHANGE):
> docker0: link becomes ready
>
> - duplicate IPv6 addresses needs to be fixed
> Aug 1 03:17:20 zml025 kernel: [15075.611749] IPv6: eth0: IPv6 duplicate
> address fe80::42:acff:fe11:2 detected!

Just for your information. We have many servers running same applications. There are several servers which never got kernel panic. They are all in the same docker version. And also have those docker issues.

------- Comment on attachment From <email address hidden> 2017-08-11 04:44 EDT-------

The attached test patch fixes a potential race in the kernel which might result in missing TLB flushes.
In addition it adds a "notlblc" kernel parameter which allows to disable the local TLB clearing optimization.

Note: this is just a test patch to verify if it solves the seen problem. This patch should currently not go into an official kernel release.

@Canonical can you please build a test kernel which includes this patch?

The patch is against kernel version 4.4.0-89.112.

Thank you!

Changed in ubuntu-power-systems:
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Stefan Bader (smb) wrote :

Applied patch and build packages: http://people.canonical.com/~smb/lp1708399/

tags: added: kernel-da-key
Frank Heimes (fheimes) on 2017-08-11
Changed in ubuntu-power-systems:
status: New → In Progress

------- Comment From <email address hidden> 2017-08-17 11:37 EDT-------
Comment on attachment 119988
tlb test patch

We will provide two new patches, since this patch solves only part of the problem. Therefore marking this patch as obsolete.

Frank Heimes (fheimes) on 2017-08-17
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Changed in linux (Ubuntu):
status: New → Incomplete

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:50 EDT-------

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:51 EDT-------

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:52 EDT-------

------- Comment From <email address hidden> 2017-08-18 07:05 EDT-------
I have added three patches to replace the test patch that Heiko already
marked as invalid:

0001-s390-mm-no-local-TLB-flush-for-clearing-by-ASCE-IDTE.patch
0002-s390-mm-fix-local-TLB-flushing-vs.-detach-of-an-mm-a.patch
0003-s390-mm-fix-race-on-mm-context.flush_mm.patch

The first is an upstream patch which removes the code that tries to
use the local flushing option on an IDTE clearing-by-ASCE instruction.
The local flushing option only exists for IDTE invalidation-and-clearing.

Patches #2 and #3 fix race conditions in the architecture specific TLB
flushing code. I have run my TLB stress tests on a z/VM guest with 4 CPUs
for a few hours with the three patches applied. Nothing undue happened,
but my TLB stress did run without these patches as well. Seems like
we need the specific timing of the workload to trigger the problem.

Now, if you could run a test for us with these patches applied and the bug
does not show up again, I would declare these patches as final solution.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-18 07:31 EDT-------
@Canonical can you please build another test kernel which includes the three new patches?

The patches are against kernel version 4.4.0-89.112.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-22 05:08 EDT-------
I haven't seen the test kernel package yet. Any update?

Changed in ubuntu-power-systems:
status: Incomplete → New
Changed in linux (Ubuntu):
status: Incomplete → New
Manoj Iyer (manjo) on 2017-08-25
Changed in ubuntu-z-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
no longer affects: ubuntu-power-systems
Changed in linux (Ubuntu):
importance: Undecided → High
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-29 10:07 EDT-------
Hello,

May I know when we can get the test kernel fix? Thank you!

bugproxy (bugproxy) wrote : dbginfo

Default Comment by Bridge

Stefan Bader (smb) wrote :

While preparing to provide a test kernel I noticed that the backport for patch #1 introduces a test for MACHINE_HAS_TLB_LC which is not present even in linux-next. Martin, is this really correct?

+ /* Reset TLB flush mask */
+ if (MACHINE_HAS_TLB_LC)
+ cpumask_copy(mm_cpumask(mm), &mm->context.cpu_attach_mask);

Stefan Bader (smb) wrote :

In fact hunk #3 of the original patch was also dropped which removed the check in a different location. Yet, the last hunk removes the check for that from the flush functions.

------- Comment From <email address hidden> 2017-08-30 04:53 EDT-------
The upstream version of __tlb_flush_mm has this:

static inline void __tlb_flush_mm(struct mm_struct *mm)
{
...
/* Reset TLB flush mask */
cpumask_copy(mm_cpumask(mm), &mm->context.cpu_attach_mask);
...
}

The difference is because of git commit 64f31d5802af11fd
"s390/mm: simplify the TLB flushing code" which removed the
check for MACHINE_HAS_TLB_LC and simply always does the
copy.

Imho the patch is correct.

Stefan Bader (smb) wrote :

Ah ok, thanks. Will add some info to the commit message and prepare that test kernel.

Stefan Bader (smb) wrote :

Replaced the packages at http://people.canonical.com/~smb/lp1708399/ with the latest kernel and the three suggested patche on top.

Frank Heimes (fheimes) on 2017-09-11
Changed in linux (Ubuntu):
status: New → In Progress
Changed in ubuntu-z-systems:
status: New → In Progress
importance: Undecided → High
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-12 03:26 EDT-------
Fix is tested. The problem did not occur anymore with the test kernel.
When will this fix official be rolled out. Please provide that answer within this bugzilla. Many thanks

Stefan Bader (smb) on 2017-09-12
description: updated
Changed in linux (Ubuntu Xenial):
assignee: nobody → Stefan Bader (smb)
importance: Undecided → High
status: New → In Progress
Changed in linux (Ubuntu Zesty):
assignee: nobody → Stefan Bader (smb)
importance: Undecided → High
status: New → In Progress
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-12 07:22 EDT-------
Upstream commits will be provided soon. Target kernel 4.14

Andrew Cloke (andrew-cloke) wrote :

Moving to "incomplete", pending patches landing upstream.

Changed in ubuntu-z-systems:
status: In Progress → Incomplete
Stefan Bader (smb) on 2017-09-12
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Frank Heimes (fheimes) on 2017-09-12
Changed in ubuntu-z-systems:
status: Incomplete → In Progress
Seth Forshee (sforshee) on 2017-09-12
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Juerg Haefliger (juergh) on 2017-09-12
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Frank Heimes (fheimes) on 2017-09-12
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-13 07:28 EDT-------
Upstream git commit ids:

60f07c8ec5fae06c23e9fd7bab67dabce92b3414
"s390/mm: fix race on mm->context.flush_mm"

b3e5dc45fd1ec2aa1de6b80008f9295eb17e0659
"s390/mm: fix local TLB flushing vs. detach of an mm address space"

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-zesty

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-15 02:07 EDT-------
(In reply to comment #72)
> This bug is awaiting verification that the kernel in -proposed solves the
> problem. Please test the kernel and update this bug with the results. If the
> problem is solved, change the tag 'verification-needed-xenial' to
> 'verification-done-xenial'. If the problem still exists, change the tag
> 'verification-needed-xenial' to 'verification-failed-xenial'.
>
> If verification is not done by 5 working days from today, this fix will be
> dropped from the source code, and this bug will be closed.
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Thank you!
>
> This bug is awaiting verification that the kernel in -proposed solves the
> problem. Please test the kernel and update this bug with the results. If the
> problem is solved, change the tag 'verification-needed-zesty' to
> 'verification-done-zesty'. If the problem still exists, change the tag
> 'verification-needed-zesty' to 'verification-failed-zesty'.
>
> If verification is not done by 5 working days from today, this fix will be
> dropped from the source code, and this bug will be closed.
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Thank you!

I configured the proposed source. Just double check with you that the fix is in kernel linux-image-generic-4.4.0.96.101? And I only need to install this proposed kernel version? Thanks!

Stefan Bader (smb) wrote :

The version number is that of the meta package (linux-image-generic). But as long as uname -r returns 4.4.0-96-generic the correct kernel is running. And it should have the fixes included.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-15 03:17 EDT-------
(In reply to comment #75)
> The version number is that of the meta package (linux-image-generic). But as
> long as uname -r returns 4.4.0-96-generic the correct kernel is running. And
> it should have the fixes included.

Ok. Thank you for the explanation!

When I installed it today, the version has changed to 119 already.

4.4.0-96-generic #119

Launchpad Janitor (janitor) wrote :
Download full text (4.2 KiB)

This bug was fixed in the package linux - 4.10.0-35.39

---------------
linux (4.10.0-35.39) zesty; urgency=low

  * linux: 4.10.0-35.39 -proposed tracker (LP: #1716606)

  * kernel panic -not syncing: Fatal exception: panic_on_oops (LP: #1708399)
    - SAUCE: s390/mm: fix local TLB flushing vs. detach of an mm address space
    - SAUCE: s390/mm: fix race on mm->context.flush_mm

  * CVE-2017-1000251
    - Bluetooth: Properly check L2CAP config option output buffer length

linux (4.10.0-34.38) zesty; urgency=low

  * linux: 4.10.0-34.38 -proposed tracker (LP: #1713470)

  * Ubuntu 16.04.03: perf tool does not count pm_run_inst_cmpl with rcode on
    POWER9 DD2.0 (LP: #1709964)
    - powerpc/perf: Fix Power9 test_adder fields

  * HID: multitouch: Support ALPS PTP Stick and Touchpad devices (LP: #1712481)
    - HID: multitouch: Support PTP Stick and Touchpad device
    - SAUCE: HID: multitouch: Support ALPS PTP stick with pid 0x120A

  * igb: Support using Broadcom 54616 as PHY (LP: #1712024)
    - SAUCE: igb: add support for using Broadcom 54616 as PHY

  * RPT related fixes missing in Ubuntu 16.04.3 (LP: #1709220)
    - powerpc/mm/radix: Optimise tlbiel flush all case
    - powerpc/mm/radix: Improve _tlbiel_pid to be usable for PWC flushes
    - powerpc/mm/radix: Improve TLB/PWC flushes
    - powerpc/mm/radix: Avoid flushing the PWC on every flush_tlb_range

  * AMD RV platforms with SNPS 3.1 USB controller stop responding (S3 issue)
    (LP: #1711098)
    - usb: xhci: Issue stop EP command only when the EP state is running

  * dma-buf: performance issue when looking up the fence status (LP: #1711096)
    - dma-buf: avoid scheduling on fence status query v2

  * IPR driver causes multipath to fail paths/stuck IO on Medium Errors
    (LP: #1682644)
    - scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION

  * Disable CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE (LP: #1709171)
    - [Config] CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n for ppc64el

  * memory-hotplug test needs to be fixed (LP: #1710868)
    - selftests: typo correction for memory-hotplug test
    - selftests: check hot-pluggagble memory for memory-hotplug test
    - selftests: check percentage range for memory-hotplug test
    - selftests: add missing test name in memory-hotplug test
    - selftests: fix memory-hotplug test

  * Ubuntu 16.04.3: Qemu fails on P9 (LP: #1686019)
    - KVM: PPC: Pass kvm* to kvmppc_find_table()
    - KVM: PPC: Use preregistered memory API to access TCE list
    - KVM: PPC: VFIO: Add in-kernel acceleration for VFIO
    - powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange()
    - powerpc/powernv/ioda2: Update iommu table base on ownership change
    - powerpc/iommu/vfio_spapr_tce: Cleanup iommu_table disposal
    - powerpc/vfio_spapr_tce: Add reference counting to iommu_table
    - powerpc/mmu: Add real mode support for IOMMU preregistered memory
    - KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
    - KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers

  * [SRU][Zesty] [QDF2400] pl011 E44 erratum patch needed for 2.0 firmware and
    1.1 silicon (LP: #1709123)
    - tty: pl011: fix initialization or...

Read more...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:52 EDT-------

Launchpad Janitor (janitor) wrote :
Download full text (14.4 KiB)

This bug was fixed in the package linux - 4.4.0-96.119

---------------
linux (4.4.0-96.119) xenial; urgency=low

  * linux: 4.4.0-96.119 -proposed tracker (LP: #1716613)

  * kernel panic -not syncing: Fatal exception: panic_on_oops (LP: #1708399)
    - s390/mm: no local TLB flush for clearing-by-ASCE IDTE
    - SAUCE: s390/mm: fix local TLB flushing vs. detach of an mm address space
    - SAUCE: s390/mm: fix race on mm->context.flush_mm

  * CVE-2017-1000251
    - Bluetooth: Properly check L2CAP config option output buffer length

linux (4.4.0-95.118) xenial; urgency=low

  * linux: 4.4.0-95.118 -proposed tracker (LP: #1715651)

  * Xenial update to 4.4.78 stable release broke Address Sanitizer
    (LP: #1715636)
    - mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes

linux (4.4.0-94.117) xenial; urgency=low

  * linux: 4.4.0-94.117 -proposed tracker (LP: #1713462)

  * mwifiex causes kernel oops when AP mode is enabled (LP: #1712746)
    - SAUCE: net/wireless: do not dereference invalid pointer
    - SAUCE: mwifiex: do not dereference invalid pointer

  * Backport more recent Broadcom bnxt_en driver (LP: #1711056)
    - SAUCE: bnxt_en_bpo: Import bnxt_en driver version 1.8.1
    - SAUCE: bnxt_en_bpo: Drop distro out-of-tree detection logic
    - SAUCE: bnxt_en_bpo: Remove unnecessary compile flags
    - SAUCE: bnxt_en_bpo: Move config settings to Kconfig
    - SAUCE: bnxt_en_bpo: Remove PCI_IDs handled by the regular driver
    - SAUCE: bnxt_en_bpo: Rename the backport driver to bnxt_en_bpo
    - bnxt_en_bpo: [Config] Enable CONFIG_BNXT_BPO=m

  * HID: multitouch: Support ALPS PTP Stick and Touchpad devices (LP: #1712481)
    - HID: multitouch: Support PTP Stick and Touchpad device
    - SAUCE: HID: multitouch: Support ALPS PTP stick with pid 0x120A

  * igb: Support using Broadcom 54616 as PHY (LP: #1712024)
    - SAUCE: igb: add support for using Broadcom 54616 as PHY

  * IPR driver causes multipath to fail paths/stuck IO on Medium Errors
    (LP: #1682644)
    - scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION

  * accessing /dev/hvc1 with stress-ng on Ubuntu xenial causes crash
    (LP: #1711401)
    - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles

  * memory-hotplug test needs to be fixed (LP: #1710868)
    - selftests: typo correction for memory-hotplug test
    - selftests: check hot-pluggagble memory for memory-hotplug test
    - selftests: check percentage range for memory-hotplug test
    - selftests: add missing test name in memory-hotplug test
    - selftests: fix memory-hotplug test

  * HP lt4132 LTE/HSPA+ 4G Module (03f0:a31d) does not work (LP: #1707643)
    - net: cdc_mbim: apply "NDP to end" quirk to HP lt4132

  * Migrating KSM page causes the VM lock up as the KSM page merging list is too
    large (LP: #1680513)
    - ksm: introduce ksm_max_page_sharing per page deduplication limit
    - ksm: fix use after free with merge_across_nodes = 0
    - ksm: cleanup stable_node chain collapse case
    - ksm: swap the two output parameters of chain/chain_prune
    - ksm: optimize refile of stable_node_dup at the head of the chain

  * sort ABI files with C.UTF-8 locale (LP: #1712345)
    - [Packaging] sort ABI ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:51 EDT-------

------- Comment From <email address hidden> 2017-09-18 06:32 EDT-------
IBM bugzilla status -> closed. Fix Release in Zesty/Xenial.

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:51 EDT-------

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:52 EDT-------

Frank Heimes (fheimes) on 2017-09-18
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released

------- Comment From <email address hidden> 2017-09-18 07:59 EDT-------
I'm confused. It's still under test. Why does it said the fix is released?

Stefan Bader (smb) wrote :

Because it was told those were important, the backports from Martin were tested in a separate kernel and we included those and the Zesty cherry-picks in re-spins that were made last week. And those kernels moved to updates today.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-18 09:19 EDT-------
(In reply to comment #81)
> Because it was told those were important, the backports from Martin were
> tested in a separate kernel and we included those and the Zesty cherry-picks
> in re-spins that were made last week. And those kernels moved to updates
> today.

I see.. I thought I have 5 working days to test and confirm. So just double confirm with you that the fix has been released officially in 4.4.96-119 in Xenial, right?

Stefan Bader (smb) wrote :

Yes, the fix was released with 4.4.0-96.119 (linux-image version) in Xenial/16.04 and 4.10.0-35.39 in Zesty/17.04.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-10-11 06:57 EDT-------
*** Bug 159970 has been marked as a duplicate of this bug. ***

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.