Comment 0 for bug 1708399

Revision history for this message
bugproxy (bugproxy) wrote :

== Comment: #0 - QI YE <email address hidden> - 2017-08-02 04:11:25 ==
---Problem Description---
Ubuntu got kernel panic

---uname output---
#110-Ubuntu SMP Tue Jul 18 12:56:43 UTC 2017 s390x s390x s390x GNU/Linux

---Debugger Data---
PID: 10991 TASK: 19872a0e8 CPU: 2 COMMAND: "hyperkube"
 LOWCORE INFO:
  -psw : 0x0004c00180000000 0x0000000000115fa6
  -function : pcpu_delegate at 115fa6
  -prefix : 0x7fe42000
  -cpu timer: 0x7ffab2827828aa50
  -clock cmp: 0xd2eb8b31445e4200
  -general registers:
     0x0004e00100000000 0x00000000001283b6
     0x0000c00100000000 0x000000008380fcb8
     0x0000000000115f9e 0x000000000056f6e2
     0x0000000000000004 0x0000000000cf9070
     0x00000001f3bfc000 0x0000000000112fd8
     0x00000001c72bb400 0x0000000000000002
     0x000000007fffc000 0x00000000007c9ef0
     0x0000000000115f9e 0x000000008380fc18
  -access registers:
     0x000003ff 0x7ffff910 0000000000 0000000000
     0000000000 0000000000 0000000000 0000000000
     0000000000 0000000000 0000000000 0000000000
     0000000000 0000000000 0000000000 0000000000
  -control registers:
     0x0000000014066a12 0x000000007e6d81c7
     0x0000000000011140 000000000000000000
     0x0000000000002aef 0x0000000000000400
     0x0000000050000000 0x000000007e6d81c7
     000000000000000000 000000000000000000
     000000000000000000 000000000000000000
     000000000000000000 0x0000000000cfc007
     0x00000000db000000 0x0000000000011280
  -floating point registers:
     0x409c7e2580000000 0x401de4e000000000
     000000000000000000 0x3fd24407ab0e073a
     0x3ff0000000000000 0x3fee666666666666
     0x3fef218f8a7a41a0 0x3fee666666666666
     0x0000000000800000 000000000000000000
     0x000003ff7f800000 0x000002aa4940e9e0
     0x000000000000d401 0x000003ffe81fe110
     000000000000000000 0x000003fff2cfe638

 #0 [8380fc78] smp_find_processor_id at 1160f8
 #1 [8380fc90] machine_kexec at 1135d4
 #2 [8380fcb8] crash_kexec at 1fbb8a
 #3 [8380fd88] panic at 27d0e0
 #4 [8380fe28] die at 1142cc
 #5 [8380fe90] do_low_address at 12215e
 #6 [8380fea8] pgm_check_handler at 7c2ab4
 PSW: 0705200180000000 000002aa267e0e42 (user space)
 GPRS: 0000000000000000 0000000000000000 000002aa2c4fd690 0000000000000001
       000002aa2c4fd690 000003ff7fffee38 0000000000000000 0000000000000002
       0000000000029c0f 000000c42001ea00 0000000000000001 0000000000000001
       000000c42001c5c8 000000c42082c1a0 000002aa2666325e 000003ff7fffed90

Contact Information = Chee Ye / <email address hidden>

Stack trace output:
 no

Oops output:
 [43200.761465] docker0: port 10(vethb9132e9) entered forwarding state
[50008.560926] hrtimer: interrupt took 1698076 ns
[123483.768984] systemd[1]: apt-daily.timer: Adding 7h 34min 22.582204s random time.
[123483.930058] systemd[1]: apt-daily.timer: Adding 2h 18min 14.857162s random time.
[123484.064879] systemd[1]: apt-daily.timer: Adding 10h 46min 2.301756s random time.
[123484.824760] systemd[1]: apt-daily.timer: Adding 6h 16min 22.178655s random time.
[153113.703126] conntrack: generic helper won't handle protocol 47. Please consider loading the specific helper module.
[477085.704538] Low-address protection: 0004 ilc:2 [#1] SMP
[477085.704551] Modules linked in: xt_physdev veth xt_recent xt_comment xt_mark xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype nf_nat br_netfilter bridge stp llc aufs ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 qeth_l2 sha256_s390 qeth sha1_s390 qdio sha_common ccwgroup vmur dasd_eckd_mod dasd_mod
[477085.705522] CPU: 2 PID: 10991 Comm: hyperkube Not tainted 4.4.0-87-generic #110-Ubuntu
[477085.705525] task: 000000019872a0e8 ti: 000000008380c000 task.ti: 000000008380c000
[477085.705529] User PSW : 0705200180000000 000002aa267e0e42
[477085.705532] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 EA:3
                User GPRS: 0000000000000000 0000000000000000 000002aa2c4fd690 0000000000000001
[477085.705539] 000002aa2c4fd690 000003ff7fffee38 0000000000000000 0000000000000002
[477085.705553] 0000000000029c0f 000000c42001ea00 0000000000000001 0000000000000001
[477085.705554] 000000c42001c5c8 000000c42082c1a0 000002aa2666325e 000003ff7fffed90
[477085.705578] User Code: 000002aa267e0e30: e340f0080004 lg %r4,8(%r15)
                           000002aa267e0e36: e330f0100014 lgf %r3,16(%r15)
                          #000002aa267e0e3c: e36040000014 lgf %r6,0(%r4)
                          >000002aa267e0e42: ba634000 cs %r6,%r3,0(%r4)
                           000002aa267e0e46: a774fffe brc 7,2aa267e0e42
                           000002aa267e0e4a: e360f0180050 sty %r6,24(%r15)
                           000002aa267e0e50: 07fe bcr 15,%r14
                           000002aa267e0e52: 0000 unknown
[477085.705596] Last Breaking-Event-Address:
[477085.705599] [<000002aa26663258>] 0x2aa26663258
[477085.705600]
[477085.705602] Kernel panic - not syncing: Fatal exception: panic_on_oops

System Dump Location:
 There are 4 vCPU defined. I can see hyperkube executed on two CPUs and then got kernel panic. It may be related to the TLB entry flush on the two CPUs.

CPU 0 RUNQUEUE: 1ea5a8c00
  CURRENT: PID: 0 TASK: bb1528 COMMAND: "swapper/0"

  RT PRIO_ARRAY: 1ea5a8db0
     [no tasks queued]
  CFS RB_ROOT: 1ea5a8c98
     [no tasks queued]

CPU 1 RUNQUEUE: 1ea5b9c00
  CURRENT: PID: 0 TASK: 1e94162b8 COMMAND: "swapper/1"
  RT PRIO_ARRAY: 1ea5b9db0
     [no tasks queued]
  CFS RB_ROOT: 1ea5b9c98
     [120] PID: 23421 TASK: 1c9368af8 COMMAND: "PipelineService"
     [120] PID: 10957 TASK: 1987336d8 COMMAND: "hyperkube"

CPU 2 RUNQUEUE: 1ea5cac00
  CURRENT: PID: 10991 TASK: 19872a0e8 COMMAND: "hyperkube"
  RT PRIO_ARRAY: 1ea5cadb0
     [no tasks queued]
  CFS RB_ROOT: 1ea5cac98
     [no tasks queued]

CPU 3 RUNQUEUE: 1ea5dbc00
  CURRENT: PID: 10975 TASK: 198a30000 COMMAND: "hyperkube"
  RT PRIO_ARRAY: 1ea5dbdb0
     [no tasks queued]
  CFS RB_ROOT: 1ea5dbc98
     [120] PID: 21614 TASK: 1cbee57c0 COMMAND: "IngestServiceCl"

== Comment: #1 - QI YE <email address hidden> - 2017-08-02 04:20:02 ==
The problem happened randomly. Not pattern has been figured out yet.

It also happens on below kernel levels.
- 4.4.0-78-generic #99
- 4.4.0-83-generic

== Comment: #2 - Heinz-Werner Seeck <email address hidden> - 2017-08-02 08:25:06 ==
@QI YE: Please provide the use case of this problem report. And add dumps and dbginfo , sosreports as attachment. For me it is not clear which use case this problems generates.
Many thanks in advance

== Comment: #3 - QI YE <email address hidden> - 2017-08-02 08:44:01 ==
(In reply to comment #2)
> @QI YE: Please provide the use case of this problem report. And add dumps
> and dbginfo , sosreports as attachment. For me it is not clear which use
> case this problems generates.
> Many thanks in advance

Heinz-Werner, what do you mean by "use case"? Could you elaborate it? If you are referring to what application caused this problem. We have machine learning running on Ubuntu on the IBM Z community cloud.

The dump file is big, any suggestion of the location to upload the dump file?

== Comment: #4 - QI YE <email address hidden> - 2017-08-02 08:50:32 ==
sosreport