java Corrupted page table

Bug #1787127 reported by Martin Schröder on 2018-08-15
516
This bug affects 84 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Critical
Canonical Kernel Team
Trusty
Critical
Canonical Kernel Team

Bug Description

Since upgrading to 3.13.0-155 my Java programs won't start anymore; instead the kernel reports

Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.972732] java: Corrupted page table at address 7ff1cbe38100
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.973329] PGD 8000000392599067 PUD 34996a067 PMD 349846067 PTE 80003ffffe17c225
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.974090] Bad pagetable: 000d [#2] SMP
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.974481] Modules linked in: hidp nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables dm_crypt x86_pkg_temp_thermal kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel thinkpad_acpi aes_x86_64 nvram snd_hda_codec_realtek lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_seq_midi snd_hwdep snd_seq_midi_event uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev snd_pcm snd_rawmidi arc4 cdc_mbim cdc_wdm snd_page_alloc cdc_ncm usbnet mii snd_seq cdc_acm iwlmvm mac80211 rfcomm bnep btusb bluetooth rtsx_pci_ms iwlwifi snd_seq_device cfg80211 joydev memstick snd_timer lpc_ich serio_raw shpchp snd mei_me mei parport_pc soundcore wmi mac_hid ppdev coretemp lp parport rpcsec_gss_krb5 nfsd binfmt_misc auth_rpcgss nfs_acl nfs lockd sunrpc fscache btrfs libcrc32c raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear hid_generic usbhid hid i915_bdw rtsx_pci_sdmmc ahci e1000e intel_ips ptp i2c_algo_bit drm_kms_helper psmouse rtsx_pci drm libahci pps_core video
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.987143] CPU: 1 PID: 11001 Comm: java Tainted: G B D 3.13.0-155-generic #205-Ubuntu
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.988009] Hardware name: LENOVO 20CK0000GE/20CK0000GE, BIOS N11ET30W (1.06 ) 02/03/2015
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.988822] task: ffff880441f1c800 ti: ffff880349a3c000 task.ti: ffff880349a3c000
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.989565] RIP: 0033:[<00007ff1b5114f5e>] [<00007ff1b5114f5e>] 0x7ff1b5114f5e
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.990293] RSP: 002b:00007ff1cbe0c2b8 EFLAGS: 00010202
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.990815] RAX: 000000076f0a9ea0 RBX: 00007ff1b404f480 RCX: 000000076f0a9ea0
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.991522] RDX: 00007ff1b50082bd RSI: 0000000010000020 RDI: 000000076f0a9ea0
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.992230] RBP: 00007ff1cbe0c310 R08: 0000000000000000 R09: 000000000000001c
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.992937] R10: 00007ff1cb1ebc20 R11: 00007ff1b5114f40 R12: 0000000000000000
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.993643] R13: 00007ff1cbe0c2c8 R14: 00007ff1cbe0c328 R15: 00007ff1c4008800
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.994351] FS: 00007ff1cbe10700(0000) GS:ffff88045dc40000(0000) knlGS:0000000000000000
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.995154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.995720] CR2: 00007ff1cbe38100 CR3: 0000000366030000 CR4: 0000000000360770
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.996427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.997134] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.997840]
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.997974] RIP [<00007ff1b5114f5e>] 0x7ff1b5114f5e
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.998481] RSP <00007ff1cbe0c2b8>
Aug 15 10:49:10 dtm2573lapli kernel: [ 1012.998819] ---[ end trace b06169e683385857 ]---
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000936] BUG: Bad page map in process java pte:80003ffffe17c225 pmd:349846067
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000941] addr:00007ff1cbe38000 vm_flags:08000071 anon_vma: (null) mapping: (null) index:7ff1cbe38
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000945] CPU: 0 PID: 11010 Comm: java Tainted: G B D 3.13.0-155-generic #205-Ubuntu
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000946] Hardware name: LENOVO 20CK0000GE/20CK0000GE, BIOS N11ET30W (1.06 ) 02/03/2015
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000948] 0000000000000000 ffff8803eb479a98 ffffffff8173983f 00007ff1cbe38000
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000954] ffff880442622300 ffff8803eb479ae8 ffffffff8117e374 80003ffffe17c225
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000957] 0000000349846067 00000007ff1cbe38 ffff8803498461c0 ffff8803eb479c58
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000960] Call Trace:
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000972] [<ffffffff8173983f>] dump_stack+0x64/0x80
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000978] [<ffffffff8117e374>] print_bad_pte+0x1a4/0x250
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000981] [<ffffffff8117f6ae>] vm_normal_page+0x6e/0x80
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000984] [<ffffffff8117faa6>] unmap_page_range+0x3e6/0x830
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000987] [<ffffffff8117ff71>] unmap_single_vma+0x81/0xf0
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000990] [<ffffffff81181019>] unmap_vmas+0x49/0x90
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000993] [<ffffffff8118a05c>] exit_mmap+0x9c/0x170
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.000997] [<ffffffff81118ee3>] ? __delayacct_add_tsk+0x153/0x170
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001002] [<ffffffff8106a43c>] mmput+0x5c/0x120
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001006] [<ffffffff8106fda4>] do_exit+0x264/0xa60
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001008] [<ffffffff8109553a>] ? hrtimer_cancel+0x1a/0x30
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001013] [<ffffffff810e0bb2>] ? futex_wait+0x1b2/0x290
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001016] [<ffffffff8107061f>] do_group_exit+0x3f/0xb0
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001019] [<ffffffff81080ba0>] get_signal_to_deliver+0x1d0/0x700
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001026] [<ffffffff81014458>] do_signal+0x48/0xa30
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001031] [<ffffffff810a7335>] ? set_next_entity+0x95/0xb0
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001035] [<ffffffff810137c0>] ? __switch_to+0x350/0x530
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001038] [<ffffffff81014ea9>] do_notify_resume+0x69/0xb0
Aug 15 10:49:10 dtm2573lapli kernel: [ 1013.001041] [<ffffffff8174ad70>] int_signal+0x12/0x17

Java is
> java -version
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-2~14.04-b11)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

> lsb_release -a
LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty

> uname -a
Linux dtm2573lapli 3.13.0-155-generic #205-Ubuntu SMP Fri Aug 10 15:53:26 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

CVE References

This is a bug in 3.13.0-155; I've got no problems when I reboot to
Linux dtm2573lapli 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1787127

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty

Sorry, but this is Kubuntu 14.04 and apport-collect does not work due to bug 1439784.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Trusty):
importance: Undecided → High
status: New → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Changed in linux (Ubuntu Trusty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Trusty):
status: Triaged → In Progress
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between Ubuntu 3.13.0-153 and Ubuntu 3.13.0-155. The kernel bisect will require testing of about 6-8 test kernels.

I built the first test kernel, up to the following commit:
6f4b6df5cb10508e0c1c81c3884ca1afca98c8e2

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1787127

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Simon Déziel (sdeziel) wrote :

3.13.0-155-generic #205~lp1787127Commit6f4b6df5cb1 is bad here

SWick (swick) wrote :

installed first test kernel ( 3.13.0-155-generic #205~lp1787127Commit6f4b6df5cb1 ) but tomcat7 still won't start

Simon Déziel (sdeziel) wrote :

Not sure if that helps with the bisection but booting with l1tf=off doesn't help.

Cloter M Filho (cloter) wrote :

Reporting the same problem, and the test kernel ( 3.13.0-155-generic #205~lp1787127Commit6f4b6df5cb1 ) didn't solve it here.

Phonon (stephanstrauss) wrote :

Reporting the same problem on my machine 3.13.0-155-lowlatency with openjdk and oracle 1.7 and 1.8 -- do we have a workaround for the moment ? Thanks for any help.

Simon Déziel (sdeziel) wrote :

@Phonon, you can either revert to 3.13.0-153-lowlatency or use the Xenial backported kernel (linux-lowlatency-lts-xenial) that isn't affected. Both solutions worked in our case.

Dan Streetman (ddstreet) wrote :

Looks like bad commit is:

# first bad commit: [870ebe727a6035987b8b6fc779486a63d646ffac] x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation

Changed in linux (Ubuntu):
importance: High → Critical
Changed in linux (Ubuntu Trusty):
importance: High → Critical
tags: added: kernel-key
Dan Streetman (ddstreet) wrote :

I built a kernel with commit 870ebe727a6035987b8b6fc779486a63d646ffac and another with commit 870ebe727a6035987b8b6fc779486a63d646ffac^ (which is commit 35fa18d):

http://people.canonical.com/~ddstreet/lp1787127/

commit 35fa18d kernel is the last 'good' kernel:
http://people.canonical.com/~ddstreet/lp1787127/linux-image-3.13.0-155-generic_3.13.0-155.205+hf1787127v20180815b1h35fa18d_amd64.deb

commit 870ebe7 is the first 'bad' kernel:
http://people.canonical.com/~ddstreet/lp1787127/linux-image-3.13.0-155-generic_3.13.0-155.205+hf1787127v20180815b2h870ebe7_amd64.deb

Verified that 35fa18d kernel does not cause the problem, and 870ebe7 kernel does cause problem.

Jim Browne (jbrowne) wrote :

Just a note that someone should probably re-spin the AWS AMIs for Trusty as soon as this is resolved.

Cloter M Filho (cloter) wrote :

My java-things are working again with the 35fa18d kernel. DKMS did not compile a new module for Virtual Box, will need the correct source/headers for that. But definitely moving in the right direction.

SWick (swick) wrote :

verified: 35fa18d kernel works for me too

another verified ok for 35fa18d kernel

Am Do., 16. Aug. 2018 um 00:05 Uhr schrieb Dan Streetman
<email address hidden>:
> commit 35fa18d kernel is the last 'good' kernel:
> http://people.canonical.com/~ddstreet/lp1787127/linux-image-3.13.0-155-generic_3.13.0-155.205+hf1787127v20180815b1h35fa18d_amd64.deb

That didn't work here at all, it probably misses some graphics stuff
my T550 needs.

GGrandes (ggrandes) wrote :

We have same problem, 3.13.0-155.205 (amd64) is broken, ubuntu 14.04 over aws, our autoscaling instances (with kernel.panic=5) are rebooting over and over again. With previous (3.13.0-153.203) was all right.

Norbert (nrbrtx) wrote :

With 3.13.0-155-generic Linux kernel
On fully upgraded Ubuntu 14.04 LTS
Scilab 5.5.0 consumes all CPU resources and does not start at all.

Rebooting with 3.13.0-153-generic helps. Installing `linux-image-generic-lts-xenial` (4.4.0-133-generic) helps too.

35fa18d worked on VMs but on physical nodes i run into all kind of odd errors now as well

Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Trusty):
assignee: Joseph Salisbury (jsalisbury) → Canonical Kernel Team (canonical-kernel-team)

@ggrandes: I experienced that reboot loop on one my aws machines (with 3.13.0-155-generic) today too. Stopping the machine from the web UI and bringing it back up stopped the loop, and the machine stayed up, if that helps at all.

I've experienced this as well (production Zimbra servers - running 3.13.0-155-generic Linux kernel on fully updated Ubuntu 14.04 LTS - depend on Java, and this kernel pulled the rug out right from under them)...

Brad Figg (brad-figg) wrote :

We are actively working this bug. We will have a kernel with a temporary fix out as soon as possible.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-156.206

---------------
linux (3.13.0-156.206) trusty; urgency=medium

  * linux: 3.13.0-156.206 -proposed tracker (LP: #1787187)

  * java Corrupted page table (LP: #1787127)
    - [Config] disable NUMA_BALANCING

  * java Corrupted page table (LP: #1787127) // CVE-2018-3620 // CVE-2018-3646
    - x86/mm: Simplify p[g4um]d_page() macros

  * 3.13.0-155.205 Kernel Panic - divide by zero (LP: #1787258)
    - x86/topology: Handle CPUID bogosity gracefully

 -- Kamal Mostafa <email address hidden> Thu, 16 Aug 2018 13:59:37 -0700

Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Released
Brad Figg (brad-figg) wrote :

We've just released a new Trusty kernel (3.13.0-156.206) which should address this issue.

Jim Browne (jbrowne) wrote :

Any news on when new AMIs might be built and published.

Simon Déziel (sdeziel) wrote :

I'm happy to report that 3.13.0-156.206 fixes the regression for us on Trusty. Many thanks to all that were involved in testing/fixing this bug!

Brad Figg (brad-figg) wrote :

@jbrowne, Just as soon as the kernels hit -updates we started the work to produce new AMIs. We'll get them out ASAP.

Jeff Rivett (jrweb02) wrote :

3.13.0-156.206 did the trick here as well. My Minecraft server is up and running again.

Norbert (nrbrtx) wrote :

Scilab 5.5.0 in works again with new kernel 3.13.0-156.206. Thank you!

Gerson Zaragocín (gerson-e) wrote :

Java dependent applications are running normally again with Kernel 3.13.0-156-generic update. Great effort. Thanks.

I actually got kernel panics with 3.13.0-155. With 3.13.0-156 everything's fine again.

SWick (swick) wrote :

3.13.0-156-generic works here as well. Thank you

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers