CPU lockups divide error: 0000 [#1] SMP

Bug #1606098 reported by Benjamin Kaehne
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-lts-xenial (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I noticed the following kernel error prior to expotential increase in server load, ps listings not returning; and getting stuck on processes only in an "S" state which seemed unresponsive to signals. (Process was a ceph-osd if it matters)

Jul 25 00:32:58 SERVER kernel: [1529921.423169] divide error: 0000 [#1] SMP
Jul 25 00:32:58 SERVER kernel: [1529921.423196] Modules linked in: ip6table_raw ip6table_mangle nf_conntrack_ipv6 xt_CT xt_connmark xt_mac xt_comment xt_physdev br_n
etfilter xt_multiport xt_set ip_set_hash_net ip_set nfnetlink veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_
ipv4 iptable_raw nf_defrag_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_ta
bles nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp vport_gre ip_gre libiscsi_tcp ip_tunnel libiscsi gre scsi_transport_iscsi openvswitch nf_defrag_ipv6
 nf_conntrack dm_crypt bonding ipmi_ssif ipmi_devintf dcdbas intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dm_multipath irqbypass sb_edac mei_me mei e
dac_core ipmi_si lpc_ich ipmi_msghandler 8250_fintek acpi_power_meter shpchp mac_hid xfs libcrc32c btrfs xor raid6_pq bcache crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_6
4 lrw gf128mul glue_helper ablk_helper cryptd ixgbe igb vxlan ip6_udp_tunnel dca udp_tunnel ptp pps_core megaraid_sas i2c_algo_bit mdio wmi fjes
Jul 25 00:32:58 SERVER kernel: [1529921.423919] CPU: 12 PID: 2300042 Comm: ms_pipe_read Not tainted 4.4.0-28-generic #47~14.04.1-Ubuntu
Jul 25 00:32:58 SERVER kernel: [1529921.423942] Hardware name: Dell Inc. PowerEdge R730xd/0H21J3, BIOS 1.0.4 08/28/2014
Jul 25 00:32:58 SERVER kernel: [1529921.423965] task: ffff881e7baba940 ti: ffff880103fcc000 task.ti: ffff880103fcc000
Jul 25 00:32:58 SERVER kernel: [1529921.424013] RIP: 0010:[<ffffffff810aff78>] [<ffffffff810aff78>] task_numa_find_cpu+0x238/0x700
Jul 25 00:32:58 SERVER kernel: [1529921.424087] RSP: 0000:ffff880103fcfbb0 EFLAGS: 00010257
Jul 25 00:32:58 SERVER kernel: [1529921.424126] RAX: 0000000000000000 RBX: ffff880103fcfc50 RCX: 0000000000000000
Jul 25 00:32:58 SERVER kernel: [1529921.424191] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff881ffed96d70
Jul 25 00:32:58 SERVER kernel: [1529921.424256] RBP: ffff880103fcfc18 R08: 0000000116cb13e1 R09: 0000000000000375
Jul 25 00:32:58 SERVER kernel: [1529921.424321] R10: 000000000001e8f9 R11: 0000000000000072 R12: ffff881e11913700
Jul 25 00:32:58 SERVER kernel: [1529921.424386] R13: 0000000000000001 R14: 0000000000000000 R15: fffffffffffffd68
Jul 25 00:32:58 SERVER kernel: [1529921.424451] FS: 00007fec1582c700(0000) GS:ffff881ffed80000(0000) knlGS:0000000000000000
Jul 25 00:32:58 SERVER kernel: [1529921.424519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 25 00:32:58 SERVER kernel: [1529921.424559] CR2: 0000558c2fe89ff0 CR3: 0000003a7798b000 CR4: 00000000001406e0
Jul 25 00:32:58 SERVER kernel: [1529921.424624] Stack:
Jul 25 00:32:58 SERVER kernel: [1529921.424655] ffff880103fcfbb0 ffff880103fcfbb0 ffff881ffedd6d70 ffff881e7baba940
Jul 25 00:32:58 SERVER kernel: [1529921.424737] 000000000000006b 00000000000000c3 0000000000016d00 000000000000006b
Jul 25 00:32:58 SERVER kernel: [1529921.424818] ffff881e7baba940 00000000000001be ffff880103fcfc50 0000000000000192
Jul 25 00:32:58 SERVER kernel: [1529921.424899] Call Trace:
Jul 25 00:32:58 SERVER kernel: [1529921.424934] [<ffffffff810b08e0>] task_numa_migrate+0x4a0/0x930
Jul 25 00:32:58 SERVER kernel: [1529921.424976] [<ffffffff810b0de9>] numa_migrate_preferred+0x79/0x80
Jul 25 00:32:58 SERVER kernel: [1529921.425018] [<ffffffff810b563d>] task_numa_fault+0x91d/0xcc0
Jul 25 00:32:58 SERVER kernel: [1529921.425062] [<ffffffff811d406e>] ? mpol_misplaced+0x14e/0x190
Jul 25 00:32:58 SERVER kernel: [1529921.425104] [<ffffffff811b12c6>] handle_pte_fault+0x5a6/0x1470
Jul 25 00:32:58 SERVER kernel: [1529921.425150] [<ffffffff810fb2b2>] ? do_futex+0xa2/0x520
Jul 25 00:32:58 SERVER kernel: [1529921.425192] [<ffffffff811b3000>] handle_mm_fault+0x250/0x540
Jul 25 00:32:58 SERVER kernel: [1529921.425236] [<ffffffff81067c0a>] __do_page_fault+0x19a/0x430
Jul 25 00:32:58 SERVER kernel: [1529921.425279] [<ffffffff810fb7a1>] ? SyS_futex+0x71/0x150
Jul 25 00:32:58 SERVER kernel: [1529921.425320] [<ffffffff81067ec2>] do_page_fault+0x22/0x30
Jul 25 00:32:58 SERVER kernel: [1529921.425362] [<ffffffff817f2fb8>] page_fault+0x28/0x30
Jul 25 00:32:58 SERVER kernel: [1529921.425402] Code: 4d b0 4c 89 f7 e8 29 d5 ff ff 48 8b 4d b0 49 8b 86 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4e 78 4c 8b
73 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c1 48 29 c1 4c 03 4b 48 4c 39 7d d0
Jul 25 00:32:58 SERVER kernel: [1529921.425790] RIP [<ffffffff810aff78>] task_numa_find_cpu+0x238/0x700
Jul 25 00:32:58 SERVER kernel: [1529921.425836] RSP <ffff880103fcfbb0>
Jul 25 00:32:58 SERVER kernel: [1529921.426417] ---[ end trace 6e3f67e365a57c9f ]---

Linux SERVER 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-xenial (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.