Ubuntu
linux package

kernel panic -not syncing: Fatal exception: panic_on_oops

Bug #1708399 reported by bugproxy on 2017-08-03

This bug affects 1 person

	Status	Importance	Assigned to
Ubuntu on IBM z Systems	Fix Released	High	Canonical Kernel Team
linux (Ubuntu)	Fix Released	High	Skipper Bug Screeners
Xenial	Fix Released	High	Stefan Bader
Zesty	Fix Released	High	Stefan Bader

Bug Description

SRU justification:

Impact: A race in context flushing is causing a kernel panic on the s390x architecture.

Fix: Using a set of 3 patches (all restricted to arch code), one already upstream and the other 2 pending on linux-next. Regression risk should be low (limited to arch code and tested).

Testcase: see below

---

== Comment: #0 - QI YE <email address hidden> - 2017-08-02 04:11:25 ==
---Problem Description---
Ubuntu got kernel panic

---uname output---
#110-Ubuntu SMP Tue Jul 18 12:56:43 UTC 2017 s390x s390x s390x GNU/Linux

---Debugger Data---
PID: 10991 TASK: 19872a0e8 CPU: 2 COMMAND: "hyperkube"
LOWCORE INFO:
  -psw : 0x0004c00180000000 0x0000000000115fa6
  -function : pcpu_delegate at 115fa6
  -prefix : 0x7fe42000
  -cpu timer: 0x7ffab2827828aa50
  -clock cmp: 0xd2eb8b31445e4200
  -general registers:
     0x0004e00100000000 0x00000000001283b6
     0x0000c00100000000 0x000000008380fcb8
     0x0000000000115f9e 0x000000000056f6e2
     0x0000000000000004 0x0000000000cf9070
     0x00000001f3bfc000 0x0000000000112fd8
     0x00000001c72bb400 0x0000000000000002
     0x000000007fffc000 0x00000000007c9ef0
     0x0000000000115f9e 0x000000008380fc18
  -access registers:
     0x000003ff 0x7ffff910 0000000000 0000000000
     0000000000 0000000000 0000000000 0000000000
     0000000000 0000000000 0000000000 0000000000
     0000000000 0000000000 0000000000 0000000000
  -control registers:
     0x0000000014066a12 0x000000007e6d81c7
     0x0000000000011140 000000000000000000
     0x0000000000002aef 0x0000000000000400
     0x0000000050000000 0x000000007e6d81c7
     000000000000000000 000000000000000000
     000000000000000000 000000000000000000
     000000000000000000 0x0000000000cfc007
     0x00000000db000000 0x0000000000011280
  -floating point registers:
     0x409c7e2580000000 0x401de4e000000000
     000000000000000000 0x3fd24407ab0e073a
     0x3ff0000000000000 0x3fee666666666666
     0x3fef218f8a7a41a0 0x3fee666666666666
     0x0000000000800000 000000000000000000
     0x000003ff7f800000 0x000002aa4940e9e0
     0x000000000000d401 0x000003ffe81fe110
     000000000000000000 0x000003fff2cfe638

#0 [8380fc78] smp_find_processor_id at 1160f8
#1 [8380fc90] machine_kexec at 1135d4
#2 [8380fcb8] crash_kexec at 1fbb8a
#3 [8380fd88] panic at 27d0e0
#4 [8380fe28] die at 1142cc
#5 [8380fe90] do_low_address at 12215e
#6 [8380fea8] pgm_check_handler at 7c2ab4
PSW: 0705200180000000 000002aa267e0e42 (user space)
GPRS: 0000000000000000 0000000000000000 000002aa2c4fd690 0000000000000001
       000002aa2c4fd690 000003ff7fffee38 0000000000000000 0000000000000002
       0000000000029c0f 000000c42001ea00 0000000000000001 0000000000000001
       000000c42001c5c8 000000c42082c1a0 000002aa2666325e 000003ff7fffed90

Contact Information = Chee Ye / <email address hidden>

Stack trace output:
no

Oops output:
[43200.761465] docker0: port 10(vethb9132e9) entered forwarding state
[50008.560926] hrtimer: interrupt took 1698076 ns
[123483.768984] systemd[1]: apt-daily.timer: Adding 7h 34min 22.582204s random time.
[123483.930058] systemd[1]: apt-daily.timer: Adding 2h 18min 14.857162s random time.
[123484.064879] systemd[1]: apt-daily.timer: Adding 10h 46min 2.301756s random time.
[123484.824760] systemd[1]: apt-daily.timer: Adding 6h 16min 22.178655s random time.
[153113.703126] conntrack: generic helper won't handle protocol 47. Please consider loading the specific helper module.
[477085.704538] Low-address protection: 0004 ilc:2 [#1] SMP
[477085.704551] Modules linked in: xt_physdev veth xt_recent xt_comment xt_mark xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype nf_nat br_netfilter bridge stp llc aufs ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 qeth_l2 sha256_s390 qeth sha1_s390 qdio sha_common ccwgroup vmur dasd_eckd_mod dasd_mod
[477085.705522] CPU: 2 PID: 10991 Comm: hyperkube Not tainted 4.4.0-87-generic #110-Ubuntu
[477085.705525] task: 000000019872a0e8 ti: 000000008380c000 task.ti: 000000008380c000
[477085.705529] User PSW : 0705200180000000 000002aa267e0e42
[477085.705532] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 EA:3
                User GPRS: 0000000000000000 0000000000000000 000002aa2c4fd690 0000000000000001
[477085.705539] 000002aa2c4fd690 000003ff7fffee38 0000000000000000 0000000000000002
[477085.705553] 0000000000029c0f 000000c42001ea00 0000000000000001 0000000000000001
[477085.705554] 000000c42001c5c8 000000c42082c1a0 000002aa2666325e 000003ff7fffed90
[477085.705578] User Code: 000002aa267e0e30: e340f0080004 lg %r4,8(%r15)
                           000002aa267e0e36: e330f0100014 lgf %r3,16(%r15)
                          #000002aa267e0e3c: e36040000014 lgf %r6,0(%r4)
                          >000002aa267e0e42: ba634000 cs %r6,%r3,0(%r4)
                           000002aa267e0e46: a774fffe brc 7,2aa267e0e42
                           000002aa267e0e4a: e360f0180050 sty %r6,24(%r15)
                           000002aa267e0e50: 07fe bcr 15,%r14
                           000002aa267e0e52: 0000 unknown
[477085.705596] Last Breaking-Event-Address:
[477085.705599] [<000002aa26663258>] 0x2aa26663258
[477085.705600]
[477085.705602] Kernel panic - not syncing: Fatal exception: panic_on_oops

System Dump Location:
There are 4 vCPU defined. I can see hyperkube executed on two CPUs and then got kernel panic. It may be related to the TLB entry flush on the two CPUs.

CPU 0 RUNQUEUE: 1ea5a8c00
CURRENT: PID: 0 TASK: bb1528 COMMAND: "swapper/0"

  RT PRIO_ARRAY: 1ea5a8db0
     [no tasks queued]
  CFS RB_ROOT: 1ea5a8c98
     [no tasks queued]

CPU 1 RUNQUEUE: 1ea5b9c00
  CURRENT: PID: 0 TASK: 1e94162b8 COMMAND: "swapper/1"
  RT PRIO_ARRAY: 1ea5b9db0
     [no tasks queued]
  CFS RB_ROOT: 1ea5b9c98
     [120] PID: 23421 TASK: 1c9368af8 COMMAND: "PipelineService"
     [120] PID: 10957 TASK: 1987336d8 COMMAND: "hyperkube"

CPU 2 RUNQUEUE: 1ea5cac00
  CURRENT: PID: 10991 TASK: 19872a0e8 COMMAND: "hyperkube"
  RT PRIO_ARRAY: 1ea5cadb0
     [no tasks queued]
  CFS RB_ROOT: 1ea5cac98
     [no tasks queued]

CPU 3 RUNQUEUE: 1ea5dbc00
  CURRENT: PID: 10975 TASK: 198a30000 COMMAND: "hyperkube"
  RT PRIO_ARRAY: 1ea5dbdb0
     [no tasks queued]
  CFS RB_ROOT: 1ea5dbc98
     [120] PID: 21614 TASK: 1cbee57c0 COMMAND: "IngestServiceCl"

== Comment: #1 - QI YE <email address hidden> - 2017-08-02 04:20:02 ==
The problem happened randomly. Not pattern has been figured out yet.

It also happens on below kernel levels.
- 4.4.0-78-generic #99
- 4.4.0-83-generic

== Comment: #2 - Heinz-Werner Seeck <email address hidden> - 2017-08-02 08:25:06 ==
@QI YE: Please provide the use case of this problem report. And add dumps and dbginfo , sosreports as attachment. For me it is not clear which use case this problems generates.
Many thanks in advance

== Comment: #3 - QI YE <email address hidden> - 2017-08-02 08:44:01 ==
(In reply to comment #2)
> @QI YE: Please provide the use case of this problem report. And add dumps
> and dbginfo , sosreports as attachment. For me it is not clear which use
> case this problems generates.
> Many thanks in advance

Heinz-Werner, what do you mean by "use case"? Could you elaborate it? If you are referring to what application caused this problem. We have machine learning running on Ubuntu on the IBM Z community cloud.

The dump file is big, any suggestion of the location to upload the dump file?

== Comment: #4 - QI YE <email address hidden> - 2017-08-02 08:50:32 ==
sosreport

See original description

Tags:

CVE References

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-03: dbginfo

dbginfo Edit (4.9 MiB, application/gzip)

Default Comment by Bridge

tags:	added: architecture-s39064 bugnameltc-157227 severity-critical targetmilestone-inin16042
Changed in ubuntu:
assignee:	nobody → Skipper Bug Screeners (skipper-screen-team)
affects:	ubuntu → kernel-package (Ubuntu)

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-03: Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-08-03 05:36 EDT-------
DUMP attached here: https://ibm.ent.box.com/folder/34249914326

Frank Heimes (fheimes) on 2017-08-03

affects:

kernel-package (Ubuntu) → linux (Ubuntu)

Revision history for this message

Frank Heimes (fheimes) wrote on 2017-08-03:

according to the logs it might affect more docker than the kernel:

- docker is used on that system (at least since Aug 1st)
- but I cannot find that docker.io is used - which docker version is in use?
dpkg log doesn't show me that docker.io got installed

- seems to be a docker issue that is causing a crash
occurs multiple times per second
Aug 2 06:26:51 zml025 dockerd[6150]: time="2017-08-02T06:26:51.327342000-04:00" level=error msg="Handler for GET /
containers/18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d/json returned error: No such container:
18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d"

- the kernel log also shows some issues:
  the following is a known docker issue, seems to be caused by privileged containers:
  https://github.com/moby/moby/issues/21081
  https://github.com/kubernetes/kubernetes/issues/27885
Aug 1 03:17:19 zml025 kernel: [15074.536567] aufs au_opts_verify:1597:dockerd[6723]:
dirperm1 breaks the protection by the permission bits on the lower branch

- kernel log:
another issue also known by docker:
https://github.com/moby/moby/issues/14807
Aug 1 03:17:19 zml025 kernel: [15074.649870] device vetha553aad entered promiscuous mode
Aug 1 03:17:19 zml025 kernel: [15074.649937] IPv6: ADDRCONF(NETDEV_UP): vetha553aad: link is not ready
Aug 1 03:17:19 zml025 kernel: [15074.649939] docker0: port 1(vetha553aad) entered forwarding state
Aug 1 03:17:19 zml025 kernel: [15074.649943] docker0: port 1(vetha553aad) entered forwarding state
Aug 1 03:17:19 zml025 kernel: [15074.650259] docker0: port 1(vetha553aad) entered disabled state
Aug 1 03:17:19 zml025 kernel: [15075.283565] eth0: renamed from vethd76add0
Aug 1 03:17:20 zml025 kernel: [15075.334494] IPv6: ADDRCONF(NETDEV_CHANGE): vetha553aad: link becomes ready
Aug 1 03:17:20 zml025 kernel: [15075.334520] docker0: port 1(vetha553aad) entered forwarding state
Aug 1 03:17:20 zml025 kernel: [15075.334527] docker0: port 1(vetha553aad) entered forwarding state
Aug 1 03:17:20 zml025 kernel: [15075.334549] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready

- duplicate IPv6 addresses needs to be fixed
Aug 1 03:17:20 zml025 kernel: [15075.611749] IPv6: eth0: IPv6 duplicate address fe80::42:acff:fe11:2 detected!

according to the logs it might affect more docker than the kernel:

- docker is used on that system (at least since Aug 1st)
- but I cannot find that docker.io is used - which docker version is in use?
  dpkg log doesn't show me that docker.io got installed

- seems to be a docker issue that is causing a crash
	occurs multiple times per second
	Aug  2 06:26:51 zml025 dockerd[6150]: time="2017-08-02T06:26:51.327342000-04:00" level=error msg="Handler for GET /
	containers/18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d/json returned error: No such container:
		18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d"

- the kernel log also shows some issues:
  the following is a known docker issue, seems to be caused by privileged containers:
		https://github.com/moby/moby/issues/21081
		https://github.com/kubernetes/kubernetes/issues/27885
 Aug  1 03:17:19 zml025 kernel: [15074.536567] aufs au_opts_verify:1597:dockerd[6723]:
	dirperm1 breaks the protection by the permission bits on the lower branch

- kernel log:
	another issue also known by docker:			
	https://github.com/moby/moby/issues/14807
	Aug  1 03:17:19 zml025 kernel: [15074.649870] device vetha553aad entered promiscuous mode
	Aug  1 03:17:19 zml025 kernel: [15074.649937] IPv6: ADDRCONF(NETDEV_UP): vetha553aad: link is not ready
	Aug  1 03:17:19 zml025 kernel: [15074.649939] docker0: port 1(vetha553aad) entered forwarding state
	Aug  1 03:17:19 zml025 kernel: [15074.649943] docker0: port 1(vetha553aad) entered forwarding state
	Aug  1 03:17:19 zml025 kernel: [15074.650259] docker0: port 1(vetha553aad) entered disabled state
	Aug  1 03:17:19 zml025 kernel: [15075.283565] eth0: renamed from vethd76add0
	Aug  1 03:17:20 zml025 kernel: [15075.334494] IPv6: ADDRCONF(NETDEV_CHANGE): vetha553aad: link becomes ready
	Aug  1 03:17:20 zml025 kernel: [15075.334520] docker0: port 1(vetha553aad) entered forwarding state
	Aug  1 03:17:20 zml025 kernel: [15075.334527] docker0: port 1(vetha553aad) entered forwarding state
	Aug  1 03:17:20 zml025 kernel: [15075.334549] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready

- duplicate IPv6 addresses needs to be fixed
  Aug  1 03:17:20 zml025 kernel: [15075.611749] IPv6: eth0: IPv6 duplicate address fe80::42:acff:fe11:2 detected!

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-03:

------- Comment From <email address hidden> 2017-08-03 09:17 EDT-------
This is the docker version:
Docker version 1.12.6, build 78d1802

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-03:

------- Comment From <email address hidden> 2017-08-03 09:20 EDT-------
The z/VM version is z/VM 6.3

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-03:

------- Comment From <email address hidden> 2017-08-03 09:39 EDT-------
(In reply to comment #14)
> according to the logs it might affect more docker than the kernel:
>
> - docker is used on that system (at least since Aug 1st)
> - but I cannot find that docker.io is used - which docker version is in use?
> dpkg log doesn't show me that docker.io got installed
>
> - seems to be a docker issue that is causing a crash
> occurs multiple times per second
> Aug 2 06:26:51 zml025 dockerd[6150]:
> time="2017-08-02T06:26:51.327342000-04:00" level=error msg="Handler for GET /
> containers/18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d/
> json returned error: No such container:
> 18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d"
>
> - the kernel log also shows some issues:
> the following is a known docker issue, seems to be caused by privileged
> containers:
> https://github.com/moby/moby/issues/21081
> https://github.com/kubernetes/kubernetes/issues/27885
> Aug 1 03:17:19 zml025 kernel: [15074.536567] aufs
> au_opts_verify:1597:dockerd[6723]:
> dirperm1 breaks the protection by the permission bits on the lower branch
>
> - kernel log:
> another issue also known by docker:
> https://github.com/moby/moby/issues/14807
> Aug 1 03:17:19 zml025 kernel: [15074.649870] device vetha553aad entered
> promiscuous mode
> Aug 1 03:17:19 zml025 kernel: [15074.649937] IPv6: ADDRCONF(NETDEV_UP):
> vetha553aad: link is not ready
> Aug 1 03:17:19 zml025 kernel: [15074.649939] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug 1 03:17:19 zml025 kernel: [15074.649943] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug 1 03:17:19 zml025 kernel: [15074.650259] docker0: port 1(vetha553aad)
> entered disabled state
> Aug 1 03:17:19 zml025 kernel: [15075.283565] eth0: renamed from vethd76add0
> Aug 1 03:17:20 zml025 kernel: [15075.334494] IPv6: ADDRCONF(NETDEV_CHANGE):
> vetha553aad: link becomes ready
> Aug 1 03:17:20 zml025 kernel: [15075.334520] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug 1 03:17:20 zml025 kernel: [15075.334527] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug 1 03:17:20 zml025 kernel: [15075.334549] IPv6: ADDRCONF(NETDEV_CHANGE):
> docker0: link becomes ready
>
> - duplicate IPv6 addresses needs to be fixed
> Aug 1 03:17:20 zml025 kernel: [15075.611749] IPv6: eth0: IPv6 duplicate
> address fe80::42:acff:fe11:2 detected!

Just for your information. We have many servers running same applications. There are several servers which never got kernel panic. They are all in the same docker version. And also have those docker issues.

------- Comment From yeqi@cn.ibm.com 2017-08-03 09:39 EDT-------
(In reply to comment #14)
> according to the logs it might affect more docker than the kernel:
>
> - docker is used on that system (at least since Aug 1st)
> - but I cannot find that docker.io is used - which docker version is in use?
> dpkg log doesn't show me that docker.io got installed
>
> - seems to be a docker issue that is causing a crash
> occurs multiple times per second
> Aug  2 06:26:51 zml025 dockerd[6150]:
> time="2017-08-02T06:26:51.327342000-04:00" level=error msg="Handler for GET /
> containers/18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d/
> json returned error: No such container:
> 18ffa76eaba65e5f451a3d56821d3f90a58dac74021ea7a5114352a2d6816d0d"
>
> - the kernel log also shows some issues:
> the following is a known docker issue, seems to be caused by privileged
> containers:
> https://github.com/moby/moby/issues/21081
> https://github.com/kubernetes/kubernetes/issues/27885
> Aug  1 03:17:19 zml025 kernel: [15074.536567] aufs
> au_opts_verify:1597:dockerd[6723]:
> dirperm1 breaks the protection by the permission bits on the lower branch
>
> - kernel log:
> another issue also known by docker:
> https://github.com/moby/moby/issues/14807
> Aug  1 03:17:19 zml025 kernel: [15074.649870] device vetha553aad entered
> promiscuous mode
> Aug  1 03:17:19 zml025 kernel: [15074.649937] IPv6: ADDRCONF(NETDEV_UP):
> vetha553aad: link is not ready
> Aug  1 03:17:19 zml025 kernel: [15074.649939] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug  1 03:17:19 zml025 kernel: [15074.649943] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug  1 03:17:19 zml025 kernel: [15074.650259] docker0: port 1(vetha553aad)
> entered disabled state
> Aug  1 03:17:19 zml025 kernel: [15075.283565] eth0: renamed from vethd76add0
> Aug  1 03:17:20 zml025 kernel: [15075.334494] IPv6: ADDRCONF(NETDEV_CHANGE):
> vetha553aad: link becomes ready
> Aug  1 03:17:20 zml025 kernel: [15075.334520] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug  1 03:17:20 zml025 kernel: [15075.334527] docker0: port 1(vetha553aad)
> entered forwarding state
> Aug  1 03:17:20 zml025 kernel: [15075.334549] IPv6: ADDRCONF(NETDEV_CHANGE):
> docker0: link becomes ready
>
> - duplicate IPv6 addresses needs to be fixed
> Aug  1 03:17:20 zml025 kernel: [15075.611749] IPv6: eth0: IPv6 duplicate
> address fe80::42:acff:fe11:2 detected!

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-11: tlb test patch

tlb test patch Edit (8.7 KiB, text/plain)

------- Comment on attachment From <email address hidden> 2017-08-11 04:44 EDT-------

The attached test patch fixes a potential race in the kernel which might result in missing TLB flushes.
In addition it adds a "notlblc" kernel parameter which allows to disable the local TLB clearing optimization.

Note: this is just a test patch to verify if it solves the seen problem. This patch should currently not go into an official kernel release.

@Canonical can you please build a test kernel which includes this patch?

The patch is against kernel version 4.4.0-89.112.

Thank you!

Andrew Cloke (andrew-cloke) on 2017-08-11

Changed in ubuntu-power-systems:
importance:	Undecided → Critical
assignee:	nobody → Canonical Kernel Team (canonical-kernel-team)

Revision history for this message

Stefan Bader (smb) wrote on 2017-08-11:

Applied patch and build packages: http://people.canonical.com/~smb/lp1708399/

Joseph Salisbury (jsalisbury) on 2017-08-11

tags:

added: kernel-da-key

Frank Heimes (fheimes) on 2017-08-11

Changed in ubuntu-power-systems:
status:	New → In Progress

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-17: Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-08-17 11:37 EDT-------
Comment on attachment 119988
tlb test patch

We will provide two new patches, since this patch solves only part of the problem. Therefore marking this patch as obsolete.

Frank Heimes (fheimes) on 2017-08-17

Changed in ubuntu-power-systems:
status:	In Progress → Incomplete
Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-18: Upstream patch that removes local flushing for clearing-by-ASCE

#10

Upstream patch that removes local flushing for clearing-by-ASCE Edit (2.7 KiB, text/plain)

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:50 EDT-------

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-18: Fix race local TLB flushing vs. context switch

#11

Fix race local TLB flushing vs. context switch Edit (3.8 KiB, text/plain)

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:51 EDT-------

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-18: Fix race on mm->context.flush_mm

#12

Fix race on mm->context.flush_mm Edit (2.8 KiB, text/plain)

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:52 EDT-------

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-18: Comment bridged from LTC Bugzilla

#13

------- Comment From <email address hidden> 2017-08-18 07:05 EDT-------
I have added three patches to replace the test patch that Heiko already
marked as invalid:

0001-s390-mm-no-local-TLB-flush-for-clearing-by-ASCE-IDTE.patch
0002-s390-mm-fix-local-TLB-flushing-vs.-detach-of-an-mm-a.patch
0003-s390-mm-fix-race-on-mm-context.flush_mm.patch

The first is an upstream patch which removes the code that tries to
use the local flushing option on an IDTE clearing-by-ASCE instruction.
The local flushing option only exists for IDTE invalidation-and-clearing.

Patches #2 and #3 fix race conditions in the architecture specific TLB
flushing code. I have run my TLB stress tests on a z/VM guest with 4 CPUs
for a few hours with the three patches applied. Nothing undue happened,
but my TLB stress did run without these patches as well. Seems like
we need the specific timing of the workload to trigger the problem.

Now, if you could run a test for us with these patches applied and the bug
does not show up again, I would declare these patches as final solution.

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-18:

#14

------- Comment From <email address hidden> 2017-08-18 07:31 EDT-------
@Canonical can you please build another test kernel which includes the three new patches?

The patches are against kernel version 4.4.0-89.112.

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-22:

#15

------- Comment From <email address hidden> 2017-08-22 05:08 EDT-------
I haven't seen the test kernel package yet. Any update?

Andrew Cloke (andrew-cloke) on 2017-08-22

Changed in ubuntu-power-systems:
status:	Incomplete → New
Changed in linux (Ubuntu):
status:	Incomplete → New

Manoj Iyer (manjo) on 2017-08-25

Changed in ubuntu-z-systems:
assignee:	nobody → Canonical Kernel Team (canonical-kernel-team)
no longer affects:	ubuntu-power-systems
Changed in linux (Ubuntu):
importance:	Undecided → High

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-29:

#16

------- Comment From <email address hidden> 2017-08-29 10:07 EDT-------
Hello,

May I know when we can get the test kernel fix? Thank you!

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-30: dbginfo

#17

dbginfo Edit (4.9 MiB, application/gzip)

Default Comment by Bridge

Revision history for this message

Stefan Bader (smb) wrote on 2017-08-30:

#18

While preparing to provide a test kernel I noticed that the backport for patch #1 introduces a test for MACHINE_HAS_TLB_LC which is not present even in linux-next. Martin, is this really correct?

+ /* Reset TLB flush mask */
+ if (MACHINE_HAS_TLB_LC)
+ cpumask_copy(mm_cpumask(mm), &mm->context.cpu_attach_mask);

Revision history for this message

Stefan Bader (smb) wrote on 2017-08-30:

#19

In fact hunk #3 of the original patch was also dropped which removed the check in a different location. Yet, the last hunk removes the check for that from the flush functions.

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-30: Comment bridged from LTC Bugzilla

#20

------- Comment From <email address hidden> 2017-08-30 04:53 EDT-------
The upstream version of __tlb_flush_mm has this:

static inline void __tlb_flush_mm(struct mm_struct *mm)
{
...
/* Reset TLB flush mask */
cpumask_copy(mm_cpumask(mm), &mm->context.cpu_attach_mask);
...
}

The difference is because of git commit 64f31d5802af11fd
"s390/mm: simplify the TLB flushing code" which removed the
check for MACHINE_HAS_TLB_LC and simply always does the
copy.

Imho the patch is correct.

Revision history for this message

Stefan Bader (smb) wrote on 2017-08-30:

#21

Ah ok, thanks. Will add some info to the commit message and prepare that test kernel.

Revision history for this message

Stefan Bader (smb) wrote on 2017-08-30:

#22

Replaced the packages at http://people.canonical.com/~smb/lp1708399/ with the latest kernel and the three suggested patche on top.

Frank Heimes (fheimes) on 2017-09-11

Changed in linux (Ubuntu):
status:	New → In Progress
Changed in ubuntu-z-systems:
status:	New → In Progress
importance:	Undecided → High

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-12:

#23

------- Comment From <email address hidden> 2017-09-12 03:26 EDT-------
Fix is tested. The problem did not occur anymore with the test kernel.
When will this fix official be rolled out. Please provide that answer within this bugzilla. Many thanks

Stefan Bader (smb) on 2017-09-12

description:	updated
Changed in linux (Ubuntu Xenial):
assignee:	nobody → Stefan Bader (smb)
importance:	Undecided → High
status:	New → In Progress
Changed in linux (Ubuntu Zesty):
assignee:	nobody → Stefan Bader (smb)
importance:	Undecided → High
status:	New → In Progress

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-12:

#24

------- Comment From <email address hidden> 2017-09-12 07:22 EDT-------
Upstream commits will be provided soon. Target kernel 4.14

Revision history for this message

Andrew Cloke (andrew-cloke) wrote on 2017-09-12:

#25

Moving to "incomplete", pending patches landing upstream.

Changed in ubuntu-z-systems:
status:	In Progress → Incomplete

Stefan Bader (smb) on 2017-09-12

Changed in linux (Ubuntu Xenial):
status:	In Progress → Fix Committed

Frank Heimes (fheimes) on 2017-09-12

Changed in ubuntu-z-systems:
status:	Incomplete → In Progress

Seth Forshee (sforshee) on 2017-09-12

Changed in linux (Ubuntu):
status:	In Progress → Fix Committed

Juerg Haefliger (juergh) on 2017-09-12

Changed in linux (Ubuntu Zesty):
status:	In Progress → Fix Committed

Frank Heimes (fheimes) on 2017-09-12

Changed in ubuntu-z-systems:
status:	In Progress → Fix Committed

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-13:

#26

------- Comment From <email address hidden> 2017-09-13 07:28 EDT-------
Upstream git commit ids:

60f07c8ec5fae06c23e9fd7bab67dabce92b3414
"s390/mm: fix race on mm->context.flush_mm"

b3e5dc45fd1ec2aa1de6b80008f9295eb17e0659
"s390/mm: fix local TLB flushing vs. detach of an mm address space"

Revision history for this message

Kleber Sacilotto de Souza (kleber-souza) wrote on 2017-09-14:

#27

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:	added: verification-needed-xenial
tags:	added: verification-needed-zesty

Revision history for this message

Kleber Sacilotto de Souza (kleber-souza) wrote on 2017-09-14:

#28

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-15:

#29

------- Comment From <email address hidden> 2017-09-15 02:07 EDT-------
(In reply to comment #72)
> This bug is awaiting verification that the kernel in -proposed solves the
> problem. Please test the kernel and update this bug with the results. If the
> problem is solved, change the tag 'verification-needed-xenial' to
> 'verification-done-xenial'. If the problem still exists, change the tag
> 'verification-needed-xenial' to 'verification-failed-xenial'.
>
> If verification is not done by 5 working days from today, this fix will be
> dropped from the source code, and this bug will be closed.
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Thank you!
>
> This bug is awaiting verification that the kernel in -proposed solves the
> problem. Please test the kernel and update this bug with the results. If the
> problem is solved, change the tag 'verification-needed-zesty' to
> 'verification-done-zesty'. If the problem still exists, change the tag
> 'verification-needed-zesty' to 'verification-failed-zesty'.
>
> If verification is not done by 5 working days from today, this fix will be
> dropped from the source code, and this bug will be closed.
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Thank you!

I configured the proposed source. Just double check with you that the fix is in kernel linux-image-generic-4.4.0.96.101? And I only need to install this proposed kernel version? Thanks!

Revision history for this message

Stefan Bader (smb) wrote on 2017-09-15:

#30

The version number is that of the meta package (linux-image-generic). But as long as uname -r returns 4.4.0-96-generic the correct kernel is running. And it should have the fixes included.

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-15:

#31

------- Comment From <email address hidden> 2017-09-15 03:17 EDT-------
(In reply to comment #75)
> The version number is that of the meta package (linux-image-generic). But as
> long as uname -r returns 4.4.0-96-generic the correct kernel is running. And
> it should have the fixes included.

Ok. Thank you for the explanation!

When I installed it today, the version has changed to 119 already.

4.4.0-96-generic #119

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-09-18:

#32

Download full text (4.2 KiB)

This bug was fixed in the package linux - 4.10.0-35.39

---------------
linux (4.10.0-35.39) zesty; urgency=low

* linux: 4.10.0-35.39 -proposed tracker (LP: #1716606)

  * kernel panic -not syncing: Fatal exception: panic_on_oops (LP: #1708399)
    - SAUCE: s390/mm: fix local TLB flushing vs. detach of an mm address space
    - SAUCE: s390/mm: fix race on mm->context.flush_mm

* CVE-2017-1000251
- Bluetooth: Properly check L2CAP config option output buffer length

linux (4.10.0-34.38) zesty; urgency=low

* linux: 4.10.0-34.38 -proposed tracker (LP: #1713470)

  * Ubuntu 16.04.03: perf tool does not count pm_run_inst_cmpl with rcode on
    POWER9 DD2.0 (LP: #1709964)
    - powerpc/perf: Fix Power9 test_adder fields

  * HID: multitouch: Support ALPS PTP Stick and Touchpad devices (LP: #1712481)
    - HID: multitouch: Support PTP Stick and Touchpad device
    - SAUCE: HID: multitouch: Support ALPS PTP stick with pid 0x120A

* igb: Support using Broadcom 54616 as PHY (LP: #1712024)
- SAUCE: igb: add support for using Broadcom 54616 as PHY

  * RPT related fixes missing in Ubuntu 16.04.3 (LP: #1709220)
    - powerpc/mm/radix: Optimise tlbiel flush all case
    - powerpc/mm/radix: Improve _tlbiel_pid to be usable for PWC flushes
    - powerpc/mm/radix: Improve TLB/PWC flushes
    - powerpc/mm/radix: Avoid flushing the PWC on every flush_tlb_range

  * AMD RV platforms with SNPS 3.1 USB controller stop responding (S3 issue)
    (LP: #1711098)
    - usb: xhci: Issue stop EP command only when the EP state is running

* dma-buf: performance issue when looking up the fence status (LP: #1711096)
- dma-buf: avoid scheduling on fence status query v2

  * IPR driver causes multipath to fail paths/stuck IO on Medium Errors
    (LP: #1682644)
    - scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION

* Disable CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE (LP: #1709171)
- [Config] CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n for ppc64el

  * memory-hotplug test needs to be fixed (LP: #1710868)
    - selftests: typo correction for memory-hotplug test
    - selftests: check hot-pluggagble memory for memory-hotplug test
    - selftests: check percentage range for memory-hotplug test
    - selftests: add missing test name in memory-hotplug test
    - selftests: fix memory-hotplug test

  * Ubuntu 16.04.3: Qemu fails on P9 (LP: #1686019)
    - KVM: PPC: Pass kvm* to kvmppc_find_table()
    - KVM: PPC: Use preregistered memory API to access TCE list
    - KVM: PPC: VFIO: Add in-kernel acceleration for VFIO
    - powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange()
    - powerpc/powernv/ioda2: Update iommu table base on ownership change
    - powerpc/iommu/vfio_spapr_tce: Cleanup iommu_table disposal
    - powerpc/vfio_spapr_tce: Add reference counting to iommu_table
    - powerpc/mmu: Add real mode support for IOMMU preregistered memory
    - KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
    - KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers

  * [SRU][Zesty] [QDF2400] pl011 E44 erratum patch needed for 2.0 firmware and
    1.1 silicon (LP: #1709123)
    - tty: pl011: fix initialization or...

This bug was fixed in the package linux - 4.10.0-35.39

---------------
linux (4.10.0-35.39) zesty; urgency=low

* linux: 4.10.0-35.39 -proposed tracker (LP: #1716606)

* CVE-2017-1000251
    - Bluetooth: Properly check L2CAP config option output buffer length

linux (4.10.0-34.38) zesty; urgency=low

* linux: 4.10.0-34.38 -proposed tracker (LP: #1713470)

* Ubuntu 16.04.03: perf tool does not count pm_run_inst_cmpl with rcode on
    POWER9 DD2.0 (LP: #1709964)
    - powerpc/perf: Fix Power9 test_adder fields

* igb: Support using Broadcom 54616 as PHY (LP: #1712024)
    - SAUCE: igb: add support for using Broadcom 54616 as PHY

* AMD RV platforms with SNPS 3.1 USB controller stop responding (S3 issue)
    (LP: #1711098)
    - usb: xhci: Issue stop EP command only when the EP state is running

* dma-buf: performance issue when looking up the fence status (LP: #1711096)
    - dma-buf: avoid scheduling on fence status query v2

* IPR driver causes multipath to fail paths/stuck IO on Medium Errors
    (LP: #1682644)
    - scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION

* Disable CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE (LP: #1709171)
    - [Config] CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n for ppc64el

* [SRU][Zesty] [QDF2400] pl011 E44 erratum patch needed for 2.0 firmware and
    1.1 silicon (LP: #1709123)
    - tty: pl011: fix initialization order of QDF2400 E44

* Docker hangs with xfs using aufs storage driver (LP: #1709749)
    - SAUCE: aufs: for v4.5, use vfs_clone_file_range() in copy-up
    - SAUCE: aufs: bugfix, for v4.10, copy-up on XFS branch

* ACPI ID for Hip07/08 I2C controller has typo (LP: #1711182)
    - ACPI: APD: Fix HID for Hisilicon Hip07/08

* Avoid spurious PMU interrupts after idle (LP: #1709352)
    - powerpc/perf: Avoid spurious PMU interrupts after idle

* [SRU][ZESTY]kernel BUG at
    /build/linux-H5UzH8/linux-4.10.0/drivers/nvme/host/pci.c:567! (LP: #1709073)
    - block: fix bio_will_gap() for first bvec with offset

* CVE-2017-7541
    - brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()

* sort ABI files with C.UTF-8 locale (LP: #1712345)
    - [Packaging] sort ABI files with C.UTF-8 locale

* Please only recommend or suggest initramfs-tools | linux-initramfs-tool for
    kernels able to boot without initramfs (LP: #1700972)
    - [Debian] Don't depend on initramfs-tools

-- Juerg Haefliger <juerg.haefliger@canonical.com>  Wed, 13 Sep 2017 08:15:17 +0200

Changed in linux (Ubuntu Zesty):
status:	Fix Committed → Fix Released

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-18: Fix race on mm->context.flush_mm

#33

Fix race on mm->context.flush_mm Edit (2.8 KiB, text/plain)

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:52 EDT-------

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-09-18:

#34

Download full text (14.4 KiB)

This bug was fixed in the package linux - 4.4.0-96.119

---------------
linux (4.4.0-96.119) xenial; urgency=low

* linux: 4.4.0-96.119 -proposed tracker (LP: #1716613)

  * kernel panic -not syncing: Fatal exception: panic_on_oops (LP: #1708399)
    - s390/mm: no local TLB flush for clearing-by-ASCE IDTE
    - SAUCE: s390/mm: fix local TLB flushing vs. detach of an mm address space
    - SAUCE: s390/mm: fix race on mm->context.flush_mm

* CVE-2017-1000251
- Bluetooth: Properly check L2CAP config option output buffer length

linux (4.4.0-95.118) xenial; urgency=low

* linux: 4.4.0-95.118 -proposed tracker (LP: #1715651)

  * Xenial update to 4.4.78 stable release broke Address Sanitizer
    (LP: #1715636)
    - mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes

linux (4.4.0-94.117) xenial; urgency=low

* linux: 4.4.0-94.117 -proposed tracker (LP: #1713462)

  * mwifiex causes kernel oops when AP mode is enabled (LP: #1712746)
    - SAUCE: net/wireless: do not dereference invalid pointer
    - SAUCE: mwifiex: do not dereference invalid pointer

  * Backport more recent Broadcom bnxt_en driver (LP: #1711056)
    - SAUCE: bnxt_en_bpo: Import bnxt_en driver version 1.8.1
    - SAUCE: bnxt_en_bpo: Drop distro out-of-tree detection logic
    - SAUCE: bnxt_en_bpo: Remove unnecessary compile flags
    - SAUCE: bnxt_en_bpo: Move config settings to Kconfig
    - SAUCE: bnxt_en_bpo: Remove PCI_IDs handled by the regular driver
    - SAUCE: bnxt_en_bpo: Rename the backport driver to bnxt_en_bpo
    - bnxt_en_bpo: [Config] Enable CONFIG_BNXT_BPO=m

* igb: Support using Broadcom 54616 as PHY (LP: #1712024)
- SAUCE: igb: add support for using Broadcom 54616 as PHY

  * IPR driver causes multipath to fail paths/stuck IO on Medium Errors
    (LP: #1682644)
    - scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION

  * accessing /dev/hvc1 with stress-ng on Ubuntu xenial causes crash
    (LP: #1711401)
    - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles

* HP lt4132 LTE/HSPA+ 4G Module (03f0:a31d) does not work (LP: #1707643)
- net: cdc_mbim: apply "NDP to end" quirk to HP lt4132

  * Migrating KSM page causes the VM lock up as the KSM page merging list is too
    large (LP: #1680513)
    - ksm: introduce ksm_max_page_sharing per page deduplication limit
    - ksm: fix use after free with merge_across_nodes = 0
    - ksm: cleanup stable_node chain collapse case
    - ksm: swap the two output parameters of chain/chain_prune
    - ksm: optimize refile of stable_node_dup at the head of the chain

* sort ABI files with C.UTF-8 locale (LP: #1712345)
- [Packaging] sort ABI ...

This bug was fixed in the package linux - 4.4.0-96.119

---------------
linux (4.4.0-96.119) xenial; urgency=low

* linux: 4.4.0-96.119 -proposed tracker (LP: #1716613)

* CVE-2017-1000251
    - Bluetooth: Properly check L2CAP config option output buffer length

linux (4.4.0-95.118) xenial; urgency=low

* linux: 4.4.0-95.118 -proposed tracker (LP: #1715651)

* Xenial update to 4.4.78 stable release broke Address Sanitizer
    (LP: #1715636)
    - mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes

linux (4.4.0-94.117) xenial; urgency=low

* linux: 4.4.0-94.117 -proposed tracker (LP: #1713462)

* mwifiex causes kernel oops when AP mode is enabled (LP: #1712746)
    - SAUCE: net/wireless: do not dereference invalid pointer
    - SAUCE: mwifiex: do not dereference invalid pointer

* igb: Support using Broadcom 54616 as PHY (LP: #1712024)
    - SAUCE: igb: add support for using Broadcom 54616 as PHY

* IPR driver causes multipath to fail paths/stuck IO on Medium Errors
    (LP: #1682644)
    - scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION

* accessing /dev/hvc1 with stress-ng on Ubuntu xenial causes crash
    (LP: #1711401)
    - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles

* HP lt4132 LTE/HSPA+ 4G Module (03f0:a31d) does not work (LP: #1707643)
    - net: cdc_mbim: apply "NDP to end" quirk to HP lt4132

* sort ABI files with C.UTF-8 locale (LP: #1712345)
    - [Packaging] sort ABI files with C.UTF-8 locale

* Include Broadcom GPL modules in Xenial Kernel (LP: #1665783)
    - [Config] OpenNSL Kconfig/Makefile
    - Import OpenNSL v3.1.0.17
    - [Config] CONFIG_OPENNSL=y for amd64
    - OpenNSL: Enable Kconfig and build
    - SAUCE: opennsl: add proper CFLAGS

* Xenial update to 4.4.83 stable release (LP: #1711557)
    - cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
    - mm: ratelimit PFNs busy info message
    - iscsi-target: fix memory leak in iscsit_setup_text_cmd()
    - iscsi-target: Fix iscsi_np reset hung task during parallel delete
    - fuse: initialize the flock flag in fuse_file on allocation
    - nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays
    - USB: serial: option: add D-Link DWM-222 device ID
    - USB: serial: cp210x: add support for Qivicon USB ZigBee dongle
    - USB: serial: pl2303: add new ATEN device id
    - usb: musb: fix tx fifo flush handling again
    - USB: hcd: Mark secondary HCD as dead if the primary one died
    - staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
    - iio: accel: bmc150: Always restore device to normal mode after suspend-
      resume
    - iio: light: tsl2563: use correct event code
    - uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069
    - USB: Check for dropped connection before switching to full speed
    - usb: core: unlink urbs from the tail of the endpoint's urb_list
    - usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter
    - usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume
    - iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits
    - pnfs/blocklayout: require 64-bit sector_t
    - pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
    - pinctrl: samsung: Remove bogus irq_[un]mask from resource management
    - Linux 4.4.83

* Xenial update to 4.4.82 stable release (LP: #1711535)
    - tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states
    - net: fix keepalive code vs TCP_FASTOPEN_CONNECT
    - bpf, s390: fix jit branch offset related to ldimm64
    - net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
    - tcp: fastopen: tcp_connect() must refresh the route
    - net: avoid skb_warn_bad_offload false positives on UFO
    - sparc64: Prevent perf from running during super critical sections
    - KVM: arm/arm64: Handle hva aging while destroying the vm
    - mm/mempool: avoid KASAN marking mempool poison checks as use-after-free
    - Linux 4.4.82

* Xenial update to 4.4.81 stable release (LP: #1711526)
    - libata: array underflow in ata_find_dev()
    - workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
    - ALSA: hda - Fix speaker output from VAIO VPCL14M1R
    - ASoC: do not close shared backend dailink
    - KVM: async_pf: make rcu irq exit if not triggered from idle task
    - mm/page_alloc: Remove kernel address exposure in free_reserved_area()
    - ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
    - ext4: fix overflow caused by missing cast in ext4_resize_fs()
    - ARM: dts: armada-38x: Fix irq type for pca955
    - media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS
      ioctl
    - target: Avoid mappedlun symlink creation during lun shutdown
    - iscsi-target: Always wait for kthread_should_stop() before kthread exit
    - iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
    - iscsi-target: Fix initial login PDU asynchronous socket close OOPs
    - iscsi-target: Fix delayed logout processing greater than
      SECONDS_FOR_LOGOUT_COMP
    - iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done
    - mm, mprotect: flush TLB if potentially racing with a parallel reclaim
      leaving stale TLB entries
    - media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds
    - f2fs: sanity check checkpoint segno and blkoff
    - drm: rcar-du: fix backport bug
    - saa7164: fix double fetch PCIe access condition
    - ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()
    - net: Zero terminate ifr_name in dev_ifname().
    - ipv6: avoid overflow of offset in ip6_find_1stfragopt
    - ipv4: initialize fib_trie prior to register_netdev_notifier call.
    - rtnetlink: allocate more memory for dev_set_mac_address()
    - mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled
    - openvswitch: fix potential out of bound access in parse_ct
    - packet: fix use-after-free in prb_retire_rx_blk_timer_expired()
    - ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()
    - net: ethernet: nb8800: Handle all 4 RGMII modes identically
    - dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly
    - dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly
    - dccp: fix a memleak for dccp_feat_init err process
    - sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()
    - sctp: fix the check for _sctp_walk_params and _sctp_walk_errors
    - net/mlx5: Fix command bad flow on command entry allocation failure
    - net: phy: Correctly process PHY_HALTED in phy_stop_machine()
    - net: phy: Fix PHY unbind crash
    - xen-netback: correctly schedule rate-limited queues
    - sparc64: Measure receiver forward progress to avoid send mondo timeout
    - wext: handle NULL extra data in iwe_stream_add_point better
    - sh_eth: R8A7740 supports packet shecksumming
    - net: phy: dp83867: fix irq generation
    - tg3: Fix race condition in tg3_get_stats64().
    - x86/boot: Add missing declaration of string functions
    - phy state machine: failsafe leave invalid RUNNING state
    - scsi: qla2xxx: Get mutex lock before checking optrom_state
    - drm/virtio: fix framebuffer sparse warning
    - virtio_blk: fix panic in initialization error path
    - ARM: 8632/1: ftrace: fix syscall name matching
    - mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER
    - lib/Kconfig.debug: fix frv build failure
    - signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
    - mm: don't dereference struct page fields of invalid pages
    - workqueue: implicit ordered attribute should be overridable
    - Linux 4.4.81

* Xenial update to 4.4.80 stable release (LP: #1710646)
    - af_key: Add lock to key dump
    - pstore: Make spinlock per zone instead of global
    - powerpc/pseries: Fix of_node_put() underflow during reconfig remove
    - crypto: authencesn - Fix digest_null crash
    - md/raid5: add thread_group worker async_tx_issue_pending_all
    - drm/vmwgfx: Fix gcc-7.1.1 warning
    - drm/nouveau/bar/gf100: fix access to upper half of BAR2
    - KVM: PPC: Book3S HV: Context-switch EBB registers properly
    - KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
    - KVM: PPC: Book3S HV: Reload HTM registers explicitly
    - KVM: PPC: Book3S HV: Save/restore host values of debug registers
    - Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"
    - Staging: comedi: comedi_fops: Avoid orphaned proc entry
    - drm/rcar: Nuke preclose hook
    - drm: rcar-du: Perform initialization/cleanup at probe/remove time
    - drm: rcar-du: Simplify and fix probe error handling
    - perf intel-pt: Fix ip compression
    - perf intel-pt: Fix last_ip usage
    - perf intel-pt: Use FUP always when scanning for an IP
    - perf intel-pt: Ensure never to set 'last_ip' when packet 'count' is zero
    - xfs: don't BUG() on mixed direct and mapped I/O
    - nfc: fdp: fix NULL pointer dereference
    - net: phy: Do not perform software reset for Generic PHY
    - isdn: Fix a sleep-in-atomic bug
    - isdn/i4l: fix buffer overflow
    - ath10k: fix null deref on wmi-tlv when trying spectral scan
    - wil6210: fix deadlock when using fw_no_recovery option
    - mailbox: always wait in mbox_send_message for blocking Tx mode
    - mailbox: skip complete wait event if timer expired
    - mailbox: handle empty message in tx_tick
    - mpt3sas: Don't overreach ioc->reply_post[] during initialization
    - kaweth: fix firmware download
    - kaweth: fix oops upon failed memory allocation
    - sched/cgroup: Move sched_online_group() back into css_online() to fix crash
    - PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if
      present
    - RDMA/uverbs: Fix the check for port number
    - libnvdimm, btt: fix btt_rw_page not returning errors
    - ipmi/watchdog: fix watchdog timeout set on reboot
    - v4l: s5c73m3: fix negation operator
    - pstore: Allow prz to control need for locking
    - pstore: Correctly initialize spinlock and flags
    - pstore: Use dynamic spinlock initializer
    - net: skb_needs_check() accepts CHECKSUM_NONE for tx
    - sched/cputime: Fix prev steal time accouting during CPU hotplug
    - xen/blkback: don't free be structure too early
    - xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
    - tpm: fix a kernel memory leak in tpm-sysfs.c
    - tpm: Replace device number bitmap with IDR
    - x86/mce/AMD: Make the init code more robust
    - r8169: add support for RTL8168 series add-on card.
    - ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags
    - net/mlx4: Remove BUG_ON from ICM allocation routine
    - drm/msm: Ensure that the hardware write pointer is valid
    - drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
    - vfio-pci: use 32-bit comparisons for register address for gcc-4.5
    - irqchip/keystone: Fix "scheduling while atomic" on rt
    - ASoC: tlv320aic3x: Mark the RESET register as volatile
    - spi: dw: Make debugfs name unique between instances
    - ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
    - irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
    - openrisc: Add _text symbol to fix ksym build error
    - dmaengine: ioatdma: Add Skylake PCI Dev ID
    - dmaengine: ioatdma: workaround SKX ioatdma version
    - dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
    - ARM64: zynqmp: Fix W=1 dtc 1.4 warnings
    - ARM64: zynqmp: Fix i2c node's compatible string
    - ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*
    - ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
    - usb: gadget: Fix copy/pasted error message
    - Btrfs: adjust outstanding_extents counter properly when dio write is split
    - tools lib traceevent: Fix prev/next_prio for deadline tasks
    - xfrm: Don't use sk_family for socket policy lookups
    - perf tools: Install tools/lib/traceevent plugins with install-bin
    - perf symbols: Robustify reading of build-id from sysfs
    - video: fbdev: cobalt_lcdfb: Handle return NULL error from devm_ioremap
    - vfio-pci: Handle error from pci_iomap
    - arm64: mm: fix show_pte KERN_CONT fallout
    - nvmem: imx-ocotp: Fix wrong register size
    - sh_eth: enable RX descriptor word 0 shift on SH7734
    - ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
    - HID: ignore Petzl USB headlamp
    - scsi: fnic: Avoid sending reset to firmware when another reset is in
      progress
    - scsi: snic: Return error code on memory allocation failure
    - ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
    - Linux 4.4.80

* Please only recommend or suggest initramfs-tools | linux-initramfs-tool for
    kernels able to boot without initramfs (LP: #1700972)
    - [Debian] Don't depend on initramfs-tools

-- Stefan Bader <stefan.bader@canonical.com>  Tue, 12 Sep 2017 15:40:01 +0200

Changed in linux (Ubuntu Xenial):
status:	Fix Committed → Fix Released

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-18: Fix race local TLB flushing vs. context switch

#35

Fix race local TLB flushing vs. context switch Edit (3.8 KiB, text/plain)

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:51 EDT-------

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-18: Comment bridged from LTC Bugzilla

#36

------- Comment From <email address hidden> 2017-09-18 06:32 EDT-------
IBM bugzilla status -> closed. Fix Release in Zesty/Xenial.

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-18: Fix race local TLB flushing vs. context switch

#37

Fix race local TLB flushing vs. context switch Edit (3.8 KiB, text/plain)

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:51 EDT-------

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-18: Fix race on mm->context.flush_mm

#38

Fix race on mm->context.flush_mm Edit (2.8 KiB, text/plain)

------- Comment (attachment only) From <email address hidden> 2017-08-18 06:52 EDT-------

Frank Heimes (fheimes) on 2017-09-18

Changed in linux (Ubuntu):
status:	Fix Committed → Fix Released
Changed in ubuntu-z-systems:
status:	Fix Committed → Fix Released

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-18: Comment bridged from LTC Bugzilla

#39

------- Comment From <email address hidden> 2017-09-18 07:59 EDT-------
I'm confused. It's still under test. Why does it said the fix is released?

Revision history for this message

Stefan Bader (smb) wrote on 2017-09-18:

#40

Because it was told those were important, the backports from Martin were tested in a separate kernel and we included those and the Zesty cherry-picks in re-spins that were made last week. And those kernels moved to updates today.

Revision history for this message

bugproxy (bugproxy) wrote on 2017-09-18:

#41

------- Comment From <email address hidden> 2017-09-18 09:19 EDT-------
(In reply to comment #81)
> Because it was told those were important, the backports from Martin were
> tested in a separate kernel and we included those and the Zesty cherry-picks
> in re-spins that were made last week. And those kernels moved to updates
> today.

I see.. I thought I have 5 working days to test and confirm. So just double confirm with you that the fix has been released officially in 4.4.96-119 in Xenial, right?

Revision history for this message

Stefan Bader (smb) wrote on 2017-09-18:

#42

Yes, the fix was released with 4.4.0-96.119 (linux-image version) in Xenial/16.04 and 4.10.0-35.39 in Zesty/17.04.

Revision history for this message

bugproxy (bugproxy) wrote on 2017-10-11:

#43

------- Comment From <email address hidden> 2017-10-11 06:57 EDT-------
*** Bug 159970 has been marked as a duplicate of this bug. ***

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

auto-github-kubernetes-kubernetes #27885
[closed priority/important-soon sig/node] Edit
auto-github-moby-moby #14807
[closed] Edit
auto-github-moby-moby #21081
[closed area/storage/aufs area/kernel version/1.10] Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

kernel panic -not syncing: Fatal exception: panic_on_oops

Bug Description

CVE References

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package