Elasticsearch 2.4.6 cause a Oops: 0010 [#37] SMP NOPTI

Bug #1745123 reported by macg on 2018-01-24
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
elasticsearch (Ubuntu)
Undecided
Unassigned
Artful
Undecided
Unassigned
linux (Ubuntu)
High
Unassigned
Artful
High
Unassigned

Bug Description

Elasticsearch 2.4.6 can't start since new kernel upgrade (4.13.0-31-generic #34~16.04.1-Ubuntu)

Kernel log contains :

[ 4308.595429] Oops: 0010 [#37] SMP NOPTI
[ 4308.595430] Modules linked in: btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c bnep pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) bluetooth ecdh_generic binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel eeepc_wmi kvm_amd asus_wmi snd_hda_codec sparse_keymap snd_hda_core video wmi_bmof kvm input_leds joydev snd_hwdep snd_pcm irqbypass crct10dif_pclmul snd_seq_midi crc32_pclmul snd_seq_midi_event ghash_clmulni_intel snd_rawmidi pcbc nvidia_uvm(POE) snd_seq aesni_intel snd_seq_device snd_timer snd aes_x86_64 crypto_simd soundcore ccp glue_helper cryptd shpchp i2c_piix4 8250_dw wmi mac_hid parport_pc ppdev lp parport autofs4 hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE)
[ 4308.595462] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 ahci mii libahci gpio_amdpt gpio_generic
[ 4308.595468] CPU: 0 PID: 9314 Comm: java Tainted: P D OE 4.13.0-31-generic #34~16.04.1-Ubuntu
[ 4308.595470] Hardware name: System manufacturer System Product Name/PRIME B350M-A, BIOS 0502 02/24/2017
[ 4308.595471] task: ffff8dfa07b945c0 task.stack: ffffb2b6033f4000
[ 4308.595472] RIP: 0010:0x6987877
[ 4308.595473] RSP: 0018:ffffb2b6033f7f50 EFLAGS: 00010202
[ 4308.595474] RAX: 00000000000003e7 RBX: 0000000000000000 RCX: 00007f20e3d314d9
[ 4308.595475] RDX: 00007f20dc6686c0 RSI: 000000004d73aa93 RDI: 0000000000000000
[ 4308.595476] RBP: 0000000000000000 R08: 00007f20e319caa4 R09: 000000000000000c
[ 4308.595477] R10: 0000000006987877 R11: ffff8dfa07b945c0 R12: 0000000000000000
[ 4308.595478] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 4308.595479] FS: 00007f20e4823700(0000) GS:ffff8dfac6600000(0000) knlGS:0000000000000000
[ 4308.595480] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4308.595481] CR2: 0000000006987877 CR3: 00000003b723e000 CR4: 00000000003406f0
[ 4308.595482] Call Trace:
[ 4308.595487] ? entry_SYSCALL_64_fastpath+0x33/0xa3
[ 4308.595489] Code: Bad RIP value.
[ 4308.595490] RIP: 0x6987877 RSP: ffffb2b6033f7f50
[ 4308.595491] CR2: 0000000006987877
[ 4308.595493] ---[ end trace ba92d9f3b2090708 ]---

It related to new kernel update, it work when starting with an older kernel.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1745123

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: artful
Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key pti
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Changed in linux (Ubuntu Artful):
status: New → Triaged
importance: Undecided → High
Jeffrey Bouter (kyentei) wrote :

ElasticSearch doens't log anything itself. There's just this in journalctl -xe

-- Unit elasticsearch.service has begun starting up.
Jan 24 18:34:26 logness systemd[1]: Started Elasticsearch.
-- Subject: Unit elasticsearch.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit elasticsearch.service has finished starting up.
--
-- The start-up result is done.
Jan 24 18:34:27 logness kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 111)
Jan 24 18:34:27 logness kernel: BUG: unable to handle kernel paging request at 00007f8429f767a0
Jan 24 18:34:27 logness kernel: IP: 0x7f8429f767a0
Jan 24 18:34:27 logness kernel: PGD 8000000136028067
Jan 24 18:34:27 logness kernel: P4D 8000000136028067
Jan 24 18:34:27 logness kernel: PUD 139ea5067
Jan 24 18:34:27 logness kernel: PMD 139f07067
Jan 24 18:34:27 logness kernel: PTE 8000000109a6f867
Jan 24 18:34:27 logness kernel:
Jan 24 18:34:27 logness kernel: Oops: 0011 [#9] SMP PTI
Jan 24 18:34:27 logness kernel: Modules linked in: ppdev snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep input_leds snd_pcm joydev serio_raw snd_timer snd i2c_piix4 soundcore
Jan 24 18:34:27 logness kernel: linear hid_generic usbhid hid qxl ttm drm_kms_helper syscopyarea sysfillrect psmouse virtio_blk virtio_net sysimgblt fb_sys_fops drm pata_acpi floppy
Jan 24 18:34:27 logness kernel: CPU: 0 PID: 2305 Comm: java Tainted: G D 4.13.0-31-generic #34~16.04.1-Ubuntu
Jan 24 18:34:27 logness kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Jan 24 18:34:27 logness kernel: task: ffff984ff604d800 task.stack: ffffac7b41dfc000
Jan 24 18:34:27 logness kernel: RIP: 0010:0x7f8429f767a0
Jan 24 18:34:27 logness kernel: RSP: 0018:ffffac7b41dfff50 EFLAGS: 00010202
Jan 24 18:34:27 logness kernel: RAX: 00000000000003e7 RBX: 0000000000000000 RCX: 00007f84298884d9
Jan 24 18:34:27 logness kernel: RDX: 00007f8429f76f50 RSI: 00007f8429f77030 RDI: 0000000000000000
Jan 24 18:34:27 logness kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000000c
Jan 24 18:34:27 logness kernel: R10: 00007f8429f767a0 R11: ffff984ff604d800 R12: 0000000000000000
Jan 24 18:34:27 logness kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jan 24 18:34:27 logness kernel: FS: 00007f8429f78700(0000) GS:ffff984fffc00000(0000) knlGS:0000000000000000
Jan 24 18:34:27 logness kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 24 18:34:27 logness kernel: CR2: 00007f8429f767a0 CR3: 0000000136022000 CR4: 00000000000006f0
Jan 24 18:34:27 logness kernel: Call Trace:
Jan 24 18:34:27 logness kernel: ? entry_SYSCALL_64_fastpath+0x33/0xa3
Jan 24 18:34:27 logness kernel: Code: Bad RIP value.
Jan 24 18:34:27 logness kernel: RIP: 0x7f8429f767a0 RSP: ffffac7b41dfff50
Jan 24 18:34:27 logness kernel: CR2: 00007f8429f767a0
Jan 24 18:34:27 logness kernel: ---[ end trace 2b5dfc9417bb8d5f ]---

Marcos BL (marcosbl) wrote :
Download full text (6.9 KiB)

Just confirming this bug with the same exact versions in Linux Mint:

Elasticsearch 2.4.6 + kernel 4.13.0-31-generic #34~16.04.1-Ubuntu

============================
syslog
============================

Jan 25 00:42:34 DevExMachina systemd[1]: Starting Elasticsearch...
Jan 25 00:42:34 DevExMachina systemd[1]: Started Elasticsearch.
Jan 25 00:42:35 DevExMachina kernel: [120549.349622] BUG: unable to handle kernel paging request at 0000000006987877
Jan 25 00:42:35 DevExMachina kernel: [120549.350743] IP: 0x6987877
Jan 25 00:42:35 DevExMachina kernel: [120549.351867] PGD 80000000b25e1067
Jan 25 00:42:35 DevExMachina kernel: [120549.351867] P4D 80000000b25e1067
Jan 25 00:42:35 DevExMachina kernel: [120549.351867] PUD 789bc067
Jan 25 00:42:35 DevExMachina kernel: [120549.351868] PMD 0
Jan 25 00:42:35 DevExMachina kernel: [120549.351868]
Jan 25 00:42:35 DevExMachina kernel: [120549.351869] Oops: 0010 [#12] SMP PTI
Jan 25 00:42:35 DevExMachina kernel: [120549.351871] Modules linked in: binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp snd_hda_codec kvm_intel snd_hda_core kvm snd_hwdep irqbypass snd_pcm crct10dif_pclmul gpio_ich snd_seq_midi crc32_pclmul snd_seq_midi_event ghash_clmulni_intel snd_rawmidi pcbc snd_seq aesni_intel ipmi_si aes_x86_64 crypto_simd glue_helper cryptd ipmi_devintf snd_seq_device snd_timer intel_cstate hp_wmi input_leds joydev ipmi_msghandler intel_rapl_perf serio_raw sparse_keymap wmi_bmof snd soundcore mei_me mei shpchp lpc_ich tpm_infineon mac_hid parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq hid_generic usbhid hid dm_mirror dm_region_hash dm_log i915 video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect e1000e
Jan 25 00:42:35 DevExMachina kernel: [120549.351895] sysimgblt ahci fb_sys_fops psmouse drm libahci ptp pps_core wmi
Jan 25 00:42:35 DevExMachina kernel: [120549.351899] CPU: 1 PID: 15290 Comm: java Tainted: G D 4.13.0-31-generic #34~16.04.1-Ubuntu
Jan 25 00:42:35 DevExMachina kernel: [120549.351900] Hardware name: Hewlett-Packard HP Compaq 8200 Elite SFF PC/1495, BIOS J01 v02.15 11/10/2011
Jan 25 00:42:35 DevExMachina kernel: [120549.351900] task: ffff9d249c8bdd00 task.stack: ffffb51c04064000
Jan 25 00:42:35 DevExMachina kernel: [120549.351901] RIP: 0010:0x6987877
Jan 25 00:42:35 DevExMachina kernel: [120549.351902] RSP: 0018:ffffb51c04067f50 EFLAGS: 00010202
Jan 25 00:42:35 DevExMachina kernel: [120549.351902] RAX: 00000000000003e7 RBX: 0000000000000000 RCX: 00007fd744b314d9
Jan 25 00:42:35 DevExMachina kernel: [120549.351903] RDX: 00007fd73c5e9830 RSI: 000000004d73aa93 RDI: 0000000000000000
Jan 25 00:42:35 DevExMachina kernel: [120549.351903] RBP: 0000000000000000 R08: 00007fd743f97b64 R09: 000000000000000c
Jan 25 00:42:35 DevExMachina kernel: [120549.351904] R10: 0000000006987877 R11: ffff9d249c8bdd00 R12: 0000000000000000
Jan 25 00:42:35 DevExMachina kernel: [120549.351904] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jan 25 00:42:35 DevExMachina kernel: [120549.351905] FS: 00007fd74562c700(0000) GS:ffff9d256e240000(0000) knlGS:0000000000000000
Jan 25 00:42:35 D...

Read more...

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in elasticsearch (Ubuntu Artful):
status: New → Confirmed
Changed in elasticsearch (Ubuntu):
status: New → Confirmed

The kernel OOPS is easily reproducable with the following c program and kernel 4.13.0-31-generic

```
#include <unistd.h>
#include <sys/syscall.h> /* For SYS_xxx definitions */

int main(void) {
  syscall(999);
}
```

$ gcc test.c -o test
$ ./test
Killed

$ dmesg
```
[ 1554.968904] Oops: 0011 [#3] SMP PTI
[ 1554.969159] Modules linked in: ppdev joydev input_leds serio_raw parport_pc parport vboxguest video ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd psmouse ahci libahci e1000
[ 1554.971175] CPU: 0 PID: 5255 Comm: test Tainted: G D 4.13.0-31-generic #34~16.04.1-Ubuntu
[ 1554.971723] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 1554.972261] task: ffff9eb519219740 task.stack: ffffb08685950000
[ 1554.972578] RIP: 0010:0x4005b0
[ 1554.972823] RSP: 0018:ffffb08685953f50 EFLAGS: 00010202
[ 1554.973114] RAX: 00000000000003e7 RBX: 0000000000000000 RCX: 00007fa18e0204d9
[ 1554.973445] RDX: 0000000000000000 RSI: 00007ffca2252c48 RDI: 00007ffca2252c38
[ 1554.973776] RBP: 0000000000000000 R08: 00007fa18e2f9ab0 R09: 0000000000400540
[ 1554.974107] R10: 00000000004005b0 R11: ffff9eb519219740 R12: 0000000000000000
[ 1554.974438] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1554.974797] FS: 00007fa18e505700(0000) GS:ffff9eb51fc00000(0000) knlGS:0000000000000000
[ 1554.975296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1554.975621] CR2: 00000000004005b0 CR3: 0000000117264000 CR4: 00000000000406f0
[ 1554.975955] Call Trace:
[ 1554.976206] ? entry_SYSCALL_64_fastpath+0x33/0xa3
[ 1554.976508] Code: Bad RIP value.
[ 1554.976762] RIP: 0x4005b0 RSP: ffffb08685953f50
[ 1554.977039] CR2: 00000000004005b0
[ 1554.977293] ---[ end trace 34538f23cc948433 ]---
```

Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.16 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc4

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers