Crash due to BUG: Bad page map in process X & BUG: Bad rss-counter state X

Bug #1787191 reported by timeodonovan
This bug report is a duplicate of:  Bug #1787127: java Corrupted page table. Edit Remove
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
High
Joseph Salisbury
Trusty
In Progress
High
Joseph Salisbury

Bug Description

Multiple SuperMicro based servers running 14.04 are experiencing continual kernel errors since upgrading to 3.13.0-155 which quickly leads to the system becoming unresponsive. The errors start immediately after boot.

Small excerpt from the attached kern.log:

Aug 15 09:54:15 server kernel: [ 0.000000] CPU0 microcode updated early to revision 0x713, date = 2018-01-26
...
Aug 15 09:54:17 server kernel: [ 14.381553] ipmi device interface
Aug 15 09:54:17 server kernel: [ 14.610493] NFS: Registering the id_resolver key type
Aug 15 09:54:17 server kernel: [ 14.610504] Key type id_resolver registered
Aug 15 09:54:17 server kernel: [ 14.610505] Key type id_legacy registered
Aug 15 09:54:26 server kernel: [ 23.412042] BUG: Bad page map in process plymouthd pte:8000000860a3d966 pmd:465c17067
Aug 15 09:54:26 server kernel: [ 23.442867] addr:00007fb8cc137000 vm_flags:08100073 anon_vma:ffff880866695ab0 mapping:ffff88086543a870 index:7
Aug 15 09:54:26 server kernel: [ 23.472375] vma->vm_ops->fault: filemap_fault+0x0/0x400
Aug 15 09:54:26 server kernel: [ 23.484454] vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x60
Aug 15 09:54:26 server kernel: [ 23.496669] CPU: 4 PID: 523 Comm: plymouthd Tainted: G B 3.13.0-155-generic #205-Ubuntu
Aug 15 09:54:26 server kernel: [ 23.496670] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013
Aug 15 09:54:26 server kernel: [ 23.496671] 0000000000000000 ffff880465f37d00 ffffffff8173983f 00007fb8cc137000
Aug 15 09:54:26 server kernel: [ 23.496675] ffff8808652c63c0 ffff880465f37d50 ffffffff8117e374 8000000860a3d966
Aug 15 09:54:26 server kernel: [ 23.496678] 0000000465c17067 0000000000000007 ffff880465c179b8 8000000860a3d966
Aug 15 09:54:26 server kernel: [ 23.496681] Call Trace:
Aug 15 09:54:26 server kernel: [ 23.496684] [<ffffffff8173983f>] dump_stack+0x64/0x80
Aug 15 09:54:26 server kernel: [ 23.496687] [<ffffffff8117e374>] print_bad_pte+0x1a4/0x250
Aug 15 09:54:26 server kernel: [ 23.496690] [<ffffffff8117f6ae>] vm_normal_page+0x6e/0x80
Aug 15 09:54:26 server kernel: [ 23.496701] [<ffffffff8118ae5f>] change_protection_range+0x55f/0x720
Aug 15 09:54:26 server kernel: [ 23.496706] [<ffffffff8118b085>] change_protection+0x65/0xb0
Aug 15 09:54:26 server kernel: [ 23.496709] [<ffffffff811a164b>] change_prot_numa+0x1b/0x40
Aug 15 09:54:26 server kernel: [ 23.496712] [<ffffffff810a60c2>] task_numa_work+0x1d2/0x300
Aug 15 09:54:26 server kernel: [ 23.496714] [<ffffffff8108ef8f>] task_work_run+0xaf/0xd0
Aug 15 09:54:26 server kernel: [ 23.496717] [<ffffffff81014ed7>] do_notify_resume+0x97/0xb0
Aug 15 09:54:26 server kernel: [ 23.496720] [<ffffffff8174ad70>] int_signal+0x12/0x17
...
Aug 15 09:54:54 server kernel: [ 50.902769] BUG: Bad rss-counter state mm:ffff880466869880 idx:1 val:338
Aug 15 09:55:25 server kernel: [ 82.513872] BUG: Bad rss-counter state mm:ffff880866d6b800 idx:1 val:249
...
Aug 15 09:56:30 server kernel: [ 144.954186] CPU: 18 PID: 4139 Comm: php-cgi Tainted: G B 3.13.0-155-generic #205-Ubuntu
Aug 15 09:56:30 server kernel: [ 144.954189] 0000000000000000 ffff880467cefd00 ffffffff8173983f 0000000002d2e000
Aug 15 09:56:30 server kernel: [ 144.954193] 0000000468950067 0000000000002d2e ffff880468950970 800000042a764966
Aug 15 09:56:30 server kernel: [ 144.954195] [<ffffffff8173983f>] dump_stack+0x64/0x80
Aug 15 09:56:30 server kernel: [ 144.954199] [<ffffffff8117f6ae>] vm_normal_page+0x6e/0x80
Aug 15 09:56:30 server kernel: [ 144.954203] [<ffffffff8118b085>] change_protection+0x65/0xb0
Aug 15 09:56:30 server kernel: [ 144.954207] [<ffffffff811a164b>] change_prot_numa+0x1b/0x40
Aug 15 09:56:30 server kernel: [ 144.954211] [<ffffffff8108ef8f>] task_work_run+0xaf/0xd0
Aug 15 09:56:30 server kernel: [ 144.954214] [<ffffffff817421b2>] retint_signal+0x48/0x86
Aug 15 09:56:30 server kernel: [ 144.954216] addr:0000000002d2f000 vm_flags:08100073 anon_vma:ffff880467651c18 mapping: (null) index:2d2f
Aug 15 09:56:30 server kernel: [ 144.954218] CPU: 18 PID: 4139 Comm: php-cgi Tainted: G B 3.13.0-155-generic #205-Ubuntu
Aug 15 09:56:30 server kernel: [ 144.954218] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013
Aug 15 09:56:30 server kernel: [ 144.954221] 0000000000000000 ffff880467cefd00 ffffffff8173983f 0000000002d2f000
Aug 15 09:56:30 server kernel: [ 144.954223] ffff880868111800 ffff880467cefd50 ffffffff8117e374 800000042a765966
Aug 15 09:56:30 server kernel: [ 144.954225] 0000000468950067 0000000000002d2f ffff880468950978 800000042a765966
Aug 15 09:56:30 server kernel: [ 144.954225] Call Trace:
Aug 15 09:56:30 server kernel: [ 144.954227] [<ffffffff8173983f>] dump_stack+0x64/0x80
Aug 15 09:56:30 server kernel: [ 144.954229] [<ffffffff8117e374>] print_bad_pte+0x1a4/0x250
Aug 15 09:56:30 server kernel: [ 144.954231] [<ffffffff8117f6ae>] vm_normal_page+0x6e/0x80
Aug 15 09:56:30 server kernel: [ 144.954233] [<ffffffff8118ae5f>] change_protection_range+0x55f/0x720
Aug 15 09:56:30 server kernel: [ 144.954235] [<ffffffff8118b085>] change_protection+0x65/0xb0
Aug 15 09:56:30 server kernel: [ 144.954237] [<ffffffff81742655>] ? error_entry+0x115/0x179
Aug 15 09:56:30 server kernel: [ 144.954239] [<ffffffff811a164b>] change_prot_numa+0x1b/0x40
Aug 15 09:56:30 server kernel: [ 144.954241] [<ffffffff810a60c2>] task_numa_work+0x1d2/0x300
Aug 15 09:56:30 server kernel: [ 144.954242] [<ffffffff8108ef8f>] task_work_run+0xaf/0xd0
Aug 15 09:56:30 server kernel: [ 144.954244] [<ffffffff81014ed7>] do_notify_resume+0x97/0xb0
Aug 15 09:56:30 server kernel: [ 144.954246] [<ffffffff817421b2>] retint_signal+0x48/0x86
Aug 15 09:56:30 server kernel: [ 147.373769] addr:0000000000ec8000 vm_flags:08100073 anon_vma:ffff8808674497e0 mapping: (null) index:ec8
Aug 15 09:56:30 server kernel: [ 147.404530] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013
Aug 15 09:56:30 server kernel: [ 147.404545] ffff88086062c780 ffff880462bcfd50 ffffffff8117e374 80000004292ce966
Aug 15 09:56:30 server kernel: [ 147.404552] Call Trace:
Aug 15 09:56:30 server kernel: [ 147.404569] [<ffffffff8117e374>] print_bad_pte+0x1a4/0x250
Aug 15 09:56:30 server kernel: [ 147.404594] [<ffffffff8118ae5f>] change_protection_range+0x55f/0x720
Aug 15 09:56:30 server kernel: [ 147.404597] [<ffffffff8118b085>] change_protection+0x65/0xb0
Aug 15 09:56:30 server kernel: [ 147.404607] [<ffffffff811a164b>] change_prot_numa+0x1b/0x40
Aug 15 09:56:30 server kernel: [ 147.404618] [<ffffffff8108ef8f>] task_work_run+0xaf/0xd0
Aug 15 09:56:30 server kernel: [ 147.404627] [<ffffffff817421b2>] retint_signal+0x48/0x86
Aug 15 09:56:36 server kernel: [ 152.965390] BUG: Bad rss-counter state mm:ffff8808651b4980 idx:1 val:272
Aug 15 09:56:38 server kernel: [ 155.091856] BUG: Bad rss-counter state mm:ffff880866d6aa00 idx:0 val:23
Aug 15 09:56:38 server kernel: [ 155.459289] BUG: Bad rss-counter state mm:ffff8808651b2300 idx:0 val:23
Aug 15 09:56:38 server kernel: [ 155.472278] BUG: Bad rss-counter state mm:ffff8808651b2300 idx:1 val:793
Aug 15 09:56:38 server kernel: [ 155.613023] BUG: Bad rss-counter state mm:ffff8804669e0700 idx:1 val:657
Aug 15 09:56:42 server kernel: [ 159.472398] BUG: Bad rss-counter state mm:ffff880867d3e580 idx:0 val:2
Aug 15 09:56:42 server kernel: [ 159.483401] BUG: Bad rss-counter state mm:ffff880867d3e580 idx:1 val:1740
Aug 15 09:56:44 server kernel: [ 161.445747] BUG: Bad rss-counter state mm:ffff8804669e1180 idx:1 val:8655
Aug 15 09:56:54 server kernel: [ 171.619129] BUG: Bad rss-counter state mm:ffff880466baa680 idx:1 val:7075
Aug 15 09:56:57 server kernel: [ 174.185697] BUG: Bad rss-counter state mm:ffff880865bc7a80 idx:1 val:508
Aug 15 09:56:58 server kernel: [ 175.442721] BUG: Bad rss-counter state mm:ffff880866d69f80 idx:0 val:23
Aug 15 09:56:58 server kernel: [ 175.450734] BUG: Bad rss-counter state mm:ffff880866d69f80 idx:1 val:511
...

The system becomes unresponsive at this point.

The 'Bad page map' error occurs for some processes many times.

The issue is not present when reverting to 3.13.0-153.

Unable to provide output from `ubuntu-bug linux` due to system instability.

# lsb_release -rd
Description: Ubuntu 14.04.5 LTS
Release: 14.04

# apt-cache policy linux-image-3.13.0-155-generic
linux-image-3.13.0-155-generic:
  Installed: 3.13.0-155.205
  Candidate: 3.13.0-155.205
  Version table:
 *** 3.13.0-155.205 0
        500 http://gb.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
        100 /var/lib/dpkg/status

Tags: cscc trusty
Revision history for this message
timeodonovan (timeodonovan) wrote :
Revision history for this message
timeodonovan (timeodonovan) wrote :

Attaching lspci information.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: trusty
Changed in linux (Ubuntu Trusty):
status: New → Incomplete
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
importance: Undecided → High
Changed in linux (Ubuntu Trusty):
importance: Undecided → High
status: Incomplete → Triaged
Changed in linux (Ubuntu Trusty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Trusty):
status: Triaged → In Progress
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between Ubuntu 3.13.0-153 and Ubuntu 3.13.0-155. The kernel bisect will require testing of about 6-8 test kernels.

I built the first test kernel, up to the following commit:
6f4b6df5cb10508e0c1c81c3884ca1afca98c8e2

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1787191

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Ben Schmitz (bschmitz) wrote :

We are experiencing similar issues with this kernel in aws.

Revision history for this message
Simon Déziel (sdeziel) wrote :

3.13.0-155-generic #205~lp1787192Commit6f4b6df5cb1 is bad here

Revision history for this message
Ben Schmitz (bschmitz) wrote :

so we encounter this issue when building a c3 class instance in aws.

[ 0.092982] ftrace: allocating 28746 entries in 113 pages
[ 0.132089] divide error: 0000 [#1] SMP
[ 0.135169] Modules linked in:
[ 0.136000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-155-generic #205-Ubuntu
[ 0.136000] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 0.136000] task: ffff8800e9ef8000 ti: ffff8800e9f70000 task.ti: ffff8800e9f70000
[ 0.136000] RIP: 0010:[<ffffffff81d4b9f2>] [<ffffffff81d4b9f2>] smp_store_boot_cpu_info+0x58/0x191
[ 0.136000] RSP: 0000:ffff8800e9f71e98 EFLAGS: 00010286
[ 0.136000] RAX: 000000000000000e RBX: ffffffff81d18980 RCX: 0000000000000000
[ 0.136000] RDX: 0000000000000000 RSI: 00000000000000d0 RDI: ffff8800efc13380
[ 0.136000] RBP: ffff8800e9f71ec0 R08: ffffffff81d18988 R09: 0000000000000004
[ 0.136000] R10: ffffffff8180b6c0 R11: 0001f8ecf7bca282 R12: 0000000000013280
[ 0.136000] R13: 00000000ffffffff R14: 0000000000000100 R15: 000000000000d088
[ 0.136000] FS: 0000000000000000(0000) GS:ffff8800efc00000(0000) knlGS:0000000000000000
[ 0.136000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.136000] CR2: ffff8800effff000 CR3: 0000000001c0e000 CR4: 0000000000160670
[ 0.136000] Stack:
[ 0.136000] ffffffff81d18980 0000000000013280 0000000000000246 0000000000000100
[ 0.136000] 0000000000000000 ffff8800e9f71ef0 ffffffff81d4bb82 ffffffff81e5df18
[ 0.136000] ffff8800e9ef8650 0000000000000246 0000000000000001 ffff8800e9f71f00
[ 0.136000] Call Trace:
[ 0.136000] [<ffffffff81d4bb82>] native_smp_prepare_cpus+0x57/0x3e0
[ 0.136000] [<ffffffff81d404e1>] xen_hvm_smp_prepare_cpus+0x9/0x2e
[ 0.136000] [<ffffffff81d3a01b>] kernel_init_freeable+0xa7/0x1eb
[ 0.136000] [<ffffffff81727500>] ? rest_init+0x80/0x80
[ 0.136000] [<ffffffff8172750e>] kernel_init+0xe/0x130
[ 0.136000] [<ffffffff8174a88e>] ret_from_fork+0x6e/0xa0
[ 0.136000] [<ffffffff81727500>] ? rest_init+0x80/0x80
[ 0.136000] Code: 48 89 c7 41 83 cd ff 41 54 53 f3 a5 66 c7 80 da 00 00 00 00 00 be d0 00 00 00 0f b7 0d 20 a9 fc ff 8b 05 e2 c5 28 00 8d 44 01 ff <f7> f1 31 d2 89 05 c8 c2 fc ff 8d 81 ff 7f 00 00 f7 f1 89 c3 89
[ 0.136000] RIP [<ffffffff81d4b9f2>] smp_store_boot_cpu_info+0x58/0x191
[ 0.136000] RSP <ffff8800e9f71e98>
[ 0.268006] ---[ end trace b76376c23f194273 ]---
[ 0.270564] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.270564]

Revision history for this message
Ben Schmitz (bschmitz) wrote :

For the record; c4 instances do not have this problem

Revision history for this message
Donald Babcock (donaldbabcock) wrote :

I have 4 similar trusty virtual machines 3.13.0-155 running on the same hyper-v 12r2 host on intel e5s.

Three of the four were rebooted after the 155 kernel without issue (so far). The four were built from the same base install, however they have diverged in config for the last ~4 years.

One is exhibiting the described behavior in the op, after reboot followed by bad maps for various things running (clamd, apache, mysql, etc). Roll back to disk checkpoint before the reboot (3.13.0-153 running state) returns the machine to an operational state.

Revision history for this message
Dragos L (dragoslupsoiu) wrote :

I'm using Ubuntu 14.04.5 LTS \n \l , after patching kernel linux-generic to 3.13.0.155 and intel microcode to 3.20180425.1~ubuntu0.14.04.2 , i reboot to apply the patches and the system freeze after booting, CPU is Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz , there is a kernel or microcode bug? after role back to oldest version it went normal
Does the newest version available of kernel and intel microcode fixed those bugs?

Aug 16 04:47:14 lw24 kernel: [ 310.096591] Call Trace:
Aug 16 04:47:14 lw24 kernel: [ 310.096594] [<ffffffff8173983f>] dump_stack+0x64/0x80
Aug 16 04:47:14 lw24 kernel: [ 310.096598] [<ffffffff8117e374>] print_bad_pte+0x1a4/0x250
Aug 16 04:47:14 lw24 kernel: [ 310.096601] [<ffffffff8117f6ae>] vm_normal_page+0x6e/0x80
Aug 16 04:47:14 lw24 kernel: [ 310.096604] [<ffffffff8118ae5f>] change_protection_range+0x55f/0x720
Aug 16 04:47:14 lw24 kernel: [ 310.096607] [<ffffffff8118b085>] change_protection+0x65/0xb0
Aug 16 04:47:14 lw24 kernel: [ 310.096619] [<ffffffff811a164b>] change_prot_numa+0x1b/0x40
Aug 16 04:47:14 lw24 kernel: [ 310.096623] [<ffffffff810a60c2>] task_numa_work+0x1d2/0x300
Aug 16 04:47:14 lw24 kernel: [ 310.096626] [<ffffffff8108ef8f>] task_work_run+0xaf/0xd0
Aug 16 04:47:14 lw24 kernel: [ 310.096632] [<ffffffff81014ed7>] do_notify_resume+0x97/0xb0
Aug 16 04:47:14 lw24 kernel: [ 310.096636] [<ffffffff817421b2>] retint_signal+0x48/0x86
Aug 16 04:47:14 lw24 kernel: [ 310.096638] BUG: Bad page map in process mysqld pte:80000007ab3aa966 pmd:851dfb067
Aug 16 04:47:14 lw24 kernel: [ 310.098066] addr:00007f0034a1b000 vm_flags:08200073 anon_vma:ffff881051458ca8 mapping: (null) index:7f0034a1b
Aug 16 04:47:14 lw24 kernel: [ 310.099504] CPU: 5 PID: 2799 Comm: mysqld Tainted: G B 3.13.0-155-generic #205-Ubuntu
Aug 16 04:47:14 lw24 kernel: [ 310.099505] Hardware name: Supermicro Super Server/X10DRL-i, BIOS 1.1b 09/11/2015
Aug 16 04:47:14 lw24 kernel: [ 310.099506] 0000000000000000 ffff88105174bd00 ffffffff8173983f 00007f0034a1b000
Aug 16 04:47:14 lw24 kernel: [ 310.099509] ffff88084fb79bc0 ffff88105174bd50 ffffffff8117e374 80000007ab3aa966
Aug 16 04:47:14 lw24 kernel: [ 310.099512] 0000000851dfb067 00000007f0034a1b ffff880851dfb0d8 80000007ab3aa966

Revision history for this message
timeodonovan (timeodonovan) wrote :

Just confirm, the 3.13.0-156 build from duplicate bug 1787127 resolved this. Thanks!

Revision history for this message
Ben Schmitz (bschmitz) wrote :

our issue with this kernel was also addressed by 3.13.0-156

Revision history for this message
smzhou (shimingzhou1980) wrote :

I build custom kernel from ubuntu kernel source 3.13.0-158 with bfq patch ,bug still happen .The error message is :
[35952.358708] BUG: Bad page map in process java pte:01e67320 pmd:16efdd067

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.