Kernel Oops - unable to handle kernel NULL pointer dereference at (null); Call Trace: [<ffffffff810fb39b>] ? audit_compare_dname_path+0x2b/0xa0

Bug #1450442 reported by Alex Tomlins
88
This bug affects 14 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Trusty
Fix Released
Critical
Chris J Arges
Utopic
Fix Released
Critical
Chris J Arges

Bug Description

[Impact]
Ubuntu VMWare instances running 3.13.0-51 will crash with the following backtrace:

[ 12.357276] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 12.357886] IP: [<ffffffff8136cb80>] strlen+0x0/0x30
[ 12.358457] PGD 230fe9067 PUD 230d5c067 PMD 0
[ 12.359034] Oops: 0000 [#1] SMP
[ 12.359590] Modules linked in: tcp_diag inet_diag vmw_vsock_vmci_transport vsock ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack crct10dif_pclmul iptable_filter crc32_pclmul ip_tables ghash_clmulni_intel aesni_intel aes_x86_64 ppdev lrw x_tables gf128mul vmwgfx glue_helper ablk_helper cryptd ttm drm vmw_balloon serio_raw shpchp parport_pc lp i2c_piix4 parport mac_hid vmw_vmci psmouse mptspi vmw_pvscsi e1000 mptscsih floppy vmxnet3 mptbase
[ 12.364773] CPU: 2 PID: 1718 Comm: fail2ban-server Not tainted 3.13.0-51-generic #84-Ubuntu
[ 12.365587] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
[ 12.367276] task: ffff880230fc3000 ti: ffff8802308c4000 task.ti: ffff8802308c4000
[ 12.368159] RIP: 0010:[<ffffffff8136cb80>] [<ffffffff8136cb80>] strlen+0x0/0x30
[ 12.369073] RSP: 0018:ffff8802308c5d60 EFLAGS: 00010212
[ 12.369963] RAX: 000000000000000d RBX: 000000000000000d RCX: 0000000000002df0
[ 12.370973] RDX: 0000000000000012 RSI: 0000000000000000 RDI: 0000000000000000
[ 12.372005] RBP: ffff8802308c5d90 R08: ffff8800b9218648 R09: ffff8802308c5d60
[ 12.372988] R10: 0000000000000002 R11: ffff88023082e180 R12: 0000000000000012
[ 12.373901] R13: 0000000000000000 R14: ffff880231f1b3f8 R15: ffff8800b9218460
[ 12.374827] FS: 00007f196f84c740(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
[ 12.375752] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 12.376667] CR2: 0000000000000000 CR3: 0000000230872000 CR4: 00000000000407e0
[ 12.377684] Stack:
[ 12.378612] ffffffff810fb39b 0000000000000000 0000000000000004 ffff88022ff74838
[ 12.379559] ffff8800b9218400 ffff8800b9218460 ffff8802308c5df8 ffffffff810fdb36
[ 12.380516] ffffffff811d56e0 000000042ff74838 ffff880231f1b3c0 ffff88022febecf8
[ 12.381506] Call Trace:
[ 12.382630] [<ffffffff810fb39b>] ? audit_compare_dname_path+0x2b/0xa0
[ 12.383784] [<ffffffff810fdb36>] __audit_inode_child+0xb6/0x330
[ 12.384912] [<ffffffff811d56e0>] ? d_instantiate+0x50/0x70
[ 12.386013] [<ffffffff811ca060>] vfs_mknod+0x110/0x160
[ 12.387145] [<ffffffff816bf475>] unix_bind+0x2a5/0x360
[ 12.388207] [<ffffffff810ff142>] ? __audit_sockaddr+0x42/0x80
[ 12.389250] [<ffffffff8160d4c0>] SYSC_bind+0xe0/0x120
[ 12.390297] [<ffffffff8172e9fa>] ? do_page_fault+0x1a/0x70
[ 12.391303] [<ffffffff8160e4de>] SyS_bind+0xe/0x10
[ 12.392426] [<ffffffff817330bd>] system_call_fastpath+0x1a/0x1f
[ 12.393581] Code: 89 f8 48 89 e5 f6 82 40 c7 84 81 20 74 15 0f 1f 44 00 00 48 83 c0 01 0f b6 10 f6 82 40 c7 84 81 20 75 f0 5d c3 66 0f 1f 44 00 00 <80> 3f 00 55 48 89 e5 74 15 48 89 f8 0f 1f 40 00 48 83 c0 01 80
[ 12.396831] RIP [<ffffffff8136cb80>] strlen+0x0/0x30
[ 12.397812] RSP <ffff8802308c5d60>
[ 12.398769] CR2: 0000000000000000
[ 12.399743] ---[ end trace 2c5a33d31a03347e ]---

We've also seen this on our precise machines that are running the backported trusty kernel.

When reverting to kernel 3.13.0-49 this no longer occurs.

[Test Case]
1) Run an Ubuntu VMWare instance with the affected kernel.

apt-get install auditd
echo "-w /etc/test" >>/etc/audit/audit.rules
/etc/init.d/auditd restart
apt-get install linux-headers-3.13.0-51 linux-headers-3.13.0-51-generic linux-image-3.13.0-51-generic
reboot
attempt to login or ssh into the host - you'll get a similar stacktrace.

[Fix]
commit fcf22d8267ad2601fe9b6c549d1be96401c23e0b upstream

--

uname -a:
Linux search-2 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:08:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

cat /proc/version_signature:
Ubuntu 3.13.0-51.84-generic 3.13.11-ckt18

Revision history for this message
Alex Tomlins (alex-tomlins) wrote :
Revision history for this message
Alex Tomlins (alex-tomlins) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: trusty
Revision history for this message
Chris J Arges (arges) wrote :

I suspect commit 4a92843601ad0f5067f441d2f0dca55bbe18c076.

Commit fcf22d8267ad2601fe9b6c549d1be96401c23e0b is also needed to fix that, I'll build a test kernel with this in it so you can verify.

Changed in linux (Ubuntu Trusty):
assignee: nobody → Chris J Arges (arges)
status: New → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → New
Chris J Arges (arges)
Changed in linux (Ubuntu Utopic):
assignee: nobody → Chris J Arges (arges)
Changed in linux (Ubuntu Trusty):
status: Confirmed → In Progress
importance: Undecided → Medium
Changed in linux (Ubuntu Utopic):
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Brad Figg (brad-figg) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Chris J Arges (arges) wrote :

Alex,
Can you test the following build to see if it fixes your issue?
http://people.canonical.com/~arges/lp1450442/

Thanks

Chris J Arges (arges)
Changed in linux (Ubuntu Trusty):
importance: Medium → Critical
Changed in linux (Ubuntu Utopic):
importance: Medium → Critical
description: updated
Revision history for this message
Pete Cheslock (pete-cheslock) wrote :
Revision history for this message
Pete Cheslock (pete-cheslock) wrote :

I've tested the build from http://people.canonical.com/~arges/lp1450442/ - and i'm no longer able to replicate this issue. This looks like it works for me.

penalvch (penalvch)
tags: added: regression-update
Chris J Arges (arges)
description: updated
Revision history for this message
Chris J Arges (arges) wrote :

Sent patches for 3.13/3.16 to kernel team ML for review.

Revision history for this message
Alex Tomlins (alex-tomlins) wrote :

Hi Chris, thanks for the speedy response to this.

To add another confirmation: I've tested your build on a couple of our servers, and I'm no longer seeing the Oops, so this looks to have addressed the issue.

thanks,
Alex

Chris J Arges (arges)
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Philipp Kern (pkern) wrote :

What's the ETA for trusty?

Revision history for this message
Chris J Arges (arges) wrote :

The fix is currently in the -proposed kernel. (3.13.0-52.85)

Revision history for this message
Roman Fiedler (roman-fiedler-deactivatedaccount) wrote :

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1451360 is marked as duplicate. The fix from here changes the behaviour of the duplicate (SSH login now working again, but still kernel OOPS).

So if both have common cause (very likely), then 3.13.0-52.85 is only incomplete fix.

Revision history for this message
Alex Tomlins (alex-tomlins) wrote :

Scratch that last comment...

I see (from http://kernel.ubuntu.com/git/ubuntu/ubuntu-trusty.git/log/) that the fix is not in Ubuntu 3.13.0-52.85. We'll need to wait for 3.13.0-52.86...

Revision history for this message
Janne Snabb (snabb) wrote :

I encountered this issue on a Hetzner VPS. It is a KVM based virtual server, not VMware. After rebooting I was unable to login through ssh. Accessing the system from console was possible thugh running many commands resulted in "Killed". The kernel stack trace was the same as in the original bug report.

There is no quick and dirty workaround documented yet on this bug report, so I add it.

Do the following to get your system quickly back to usable state while waiting for the patched kernel:

1) disable starting "auditd" at boot (for example "chmod 000 /etc/init.d/auditd" is an easy and ugly way to do it)

2) reboot the system (in my case the "reboot" command did not work, I had to hard-reset the system)

Done.

Revision history for this message
Jinn Ko (jinnko) wrote :

Janne, good point. There's another possible workaround in certain circumstances. You can also clear the auditd rules which should allow you to continue working on a running system. This would be done by issuing an "auditctl -D", after which you should be able to use the running system, albeit without any auditing.

Revision history for this message
Jeroen Pulles (jeroen-pulles) wrote :

> Janne, good point. There's another possible workaround in certain circumstances. You can also clear the auditd rules which should allow you to continue working on a running system. This would be done by issuing an "auditctl -D", after which you should be able to use the running system, albeit without any auditing.

 I have various systems with auditing that trigger the null reference without audit rules on the specific pieces. Ie. `chmod 0755 /run/foobar` hangs, even though a system only has a fs write rule for /etc/something; I am not sure that clearing the rules is enough. Like you said: "possible" and "certain circumstances".

Revision history for this message
David Andruczyk (david-andruczyk) wrote :

3.13.0.52-85 still has the same panic related to the audit subsystem....

Revision history for this message
Adam Conrad (adconrad) wrote :

Right, the fix is in 3.13.0-52.86, not 3.13.0-52.85.

Revision history for this message
Chris J Arges (arges) wrote :

This kernel has the patches that fix the issue:
https://launchpad.net/ubuntu/+source/linux/3.13.0-52.86

If you can please verify this this kernel and post the results to this bug.
Thanks,

Revision history for this message
David Andruczyk (david-andruczyk) wrote :

3.13.0-52.86 DOES work and no longer exhibits the crash/oops when booted.

Revision history for this message
Chris J Arges (arges) wrote :

David,
Woo hoo. Thanks and sorry about the confusion regarding kernel versions earlier.

tags: added: verification-done
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-52.86

---------------
linux (3.13.0-52.86) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1451288

  [ Upstream Kernel Changes ]

  * audit: create private file name copies when auditing inodes
    - LP: #1450442

 -- Brad Figg <email address hidden> Sun, 03 May 2015 18:36:19 -0700

Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Released
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.16.0-37.51

---------------
linux (3.16.0-37.51) utopic; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1451489

  [ Upstream Kernel Changes ]

  * Fix a broken backport causing boot failure on gen8 Intel
    - LP: #1449401

 -- Brad Figg <email address hidden> Mon, 04 May 2015 09:42:43 -0700

Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Released
status: In Progress → Fix Released
Revision history for this message
Pete Cheslock (pete-cheslock) wrote :

I'm still able to recreate this issue with kernel version 3.13.0-52-generic #85-Ubuntu SMP Wed Apr 29 16:44:17 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

It looks like a different set of audit rules causes the same issue.

To replicate:
Install 3.13.0-52-generic kernel
apt-get install auditd

in /etc/audit/audit.rules
---
-D
-b 5000
-f 0
-r 15000
-a exit,always -F arch=b64 -S execve -S exit -S exit_group -S fork -S clone -S vfork -S accept -S accept4 -S connect -S bind -S listen
---

restart auditd
below stacktrace happens.

Stacktrace:

[ 186.897309] BUG: unable to handle kernel NULL pointer dereference at 0000000000000690
[ 186.897322] IP: [<ffffffff8136cbb0>] strlen+0x0/0x30
[ 186.897331] PGD 0
[ 186.897334] Oops: 0000 [#1] SMP
[ 186.897339] Modules linked in: dm_crypt crct10dif_pclmul crc32_pclmul ghash_clmulni_intel isofs aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd
[ 186.897357] CPU: 0 PID: 2206 Comm: sudo Not tainted 3.13.0-52-generic #85-Ubuntu
[ 186.897363] task: ffff880003286000 ti: ffff880002a04000 task.ti: ffff880002a04000
[ 186.897368] RIP: e030:[<ffffffff8136cbb0>] [<ffffffff8136cbb0>] strlen+0x0/0x30
[ 186.897375] RSP: e02b:ffff880002a05df0 EFLAGS: 00010286
[ 186.897379] RAX: ffff880002a05d40 RBX: 0000000000000690 RCX: 0000000000000000
[ 186.897382] RDX: 0000000000000036 RSI: 0000000000000690 RDI: 0000000000000690
[ 186.897385] RBP: ffff880002a05e08 R08: 0000000000000000 R09: 000000000000fffe
[ 186.897389] R10: 0000000000000000 R11: ffff880002a05c06 R12: ffff8801d298f340
[ 186.897393] R13: 0000000000000000 R14: ffff8801d0fa2000 R15: 0000000000000000
[ 186.897401] FS: 00007f4a94370840(0000) GS:ffff8801dee00000(0000) knlGS:0000000000000000
[ 186.897408] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 186.897412] CR2: 0000000000000690 CR3: 00000000031f5000 CR4: 0000000000002660
[ 186.897418] Stack:
[ 186.897420] ffffffff810f7fda ffff8801d298f340 ffff8801d0fa2060 ffff880002a05e78
[ 186.897425] ffffffff810f9581 ffffffff8172a480 ffffffff81c55740 ffff880002a05e60
[ 186.897430] ffffffff8172a480 ffff880002a05ef0 ffff880002a05e60 ffffffff810f6b93
[ 186.897435] Call Trace:
[ 186.897441] [<ffffffff810f7fda>] ? audit_log_untrustedstring+0x1a/0x30
[ 186.897445] [<ffffffff810f9581>] audit_log_name+0x281/0x320
[ 186.897451] [<ffffffff8172a480>] ? _raw_spin_unlock_irqrestore+0x20/0x40
[ 186.897455] [<ffffffff8172a480>] ? _raw_spin_unlock_irqrestore+0x20/0x40
[ 186.897459] [<ffffffff810f6b93>] ? audit_buffer_free+0x73/0xa0
[ 186.897463] [<ffffffff810fbe37>] audit_log_exit+0x3d7/0xb90
[ 186.897467] [<ffffffff810fe5bf>] __audit_syscall_exit+0x27f/0x2e0
[ 186.897472] [<ffffffff81733224>] sysret_audit+0x17/0x21
[ 186.897474] Code: 89 f8 48 89 e5 f6 82 40 c7 84 81 20 74 15 0f 1f 44 00 00 48 83 c0 01 0f b6 10 f6 82 40 c7 84 81 20 75 f0 5d c3 66 0f 1f 44 00 00 <80> 3f 00 55 48 89 e5 74 15 48 89 f8 0f 1f 40 00 48 83 c0 01 80
[ 186.897508] RIP [<ffffffff8136cbb0>] strlen+0x0/0x30
[ 186.897511] RSP <ffff880002a05df0>
[ 186.897513] CR2: 0000000000000690
[ 186.897516] ---[ end trace 2626030fc35ecb54 ]---

Revision history for this message
David Andruczyk (david-andruczyk) wrote : RE: [Bug 1450442] Re: Kernel Oops - unable to handle kernel NULL pointer dereference at (null); Call Trace: [<ffffffff810fb39b>] ? audit_compare_dname_path+0x2b/0xa0
Download full text (8.6 KiB)

The problem was resolved in #86, not #85

--
David J. Andruczyk
Systems Administrator
University IT - Enterprise Applications
44 Celebration Drive, Suite 3-100
Rochester, NY 14627
E-mail: <email address hidden>
Office: 585-275-9106

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Pete Cheslock
Sent: Friday, May 15, 2015 11:55 AM
To: Andruczyk, David
Subject: [Bug 1450442] Re: Kernel Oops - unable to handle kernel NULL pointer dereference at (null); Call Trace: [<ffffffff810fb39b>] ? audit_compare_dname_path+0x2b/0xa0

I'm still able to recreate this issue with kernel version 3.13.0-52-generic #85-Ubuntu SMP Wed Apr 29 16:44:17 UTC 2015 x86_64
x86_64 x86_64 GNU/Linux

It looks like a different set of audit rules causes the same issue.

To replicate:
Install 3.13.0-52-generic kernel
apt-get install auditd

in /etc/audit/audit.rules
---
-D
-b 5000
-f 0
-r 15000
-a exit,always -F arch=b64 -S execve -S exit -S exit_group -S fork -S clone -S vfork -S accept -S accept4 -S connect -S bind -S listen
---

restart auditd
below stacktrace happens.

Stacktrace:

[ 186.897309] BUG: unable to handle kernel NULL pointer dereference at 0000000000000690
[ 186.897322] IP: [<ffffffff8136cbb0>] strlen+0x0/0x30
[ 186.897331] PGD 0
[ 186.897334] Oops: 0000 [#1] SMP
[ 186.897339] Modules linked in: dm_crypt crct10dif_pclmul crc32_pclmul ghash_clmulni_intel isofs aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd
[ 186.897357] CPU: 0 PID: 2206 Comm: sudo Not tainted 3.13.0-52-generic #85-Ubuntu
[ 186.897363] task: ffff880003286000 ti: ffff880002a04000 task.ti: ffff880002a04000
[ 186.897368] RIP: e030:[<ffffffff8136cbb0>] [<ffffffff8136cbb0>] strlen+0x0/0x30
[ 186.897375] RSP: e02b:ffff880002a05df0 EFLAGS: 00010286
[ 186.897379] RAX: ffff880002a05d40 RBX: 0000000000000690 RCX: 0000000000000000
[ 186.897382] RDX: 0000000000000036 RSI: 0000000000000690 RDI: 0000000000000690
[ 186.897385] RBP: ffff880002a05e08 R08: 0000000000000000 R09: 000000000000fffe
[ 186.897389] R10: 0000000000000000 R11: ffff880002a05c06 R12: ffff8801d298f340
[ 186.897393] R13: 0000000000000000 R14: ffff8801d0fa2000 R15: 0000000000000000
[ 186.897401] FS: 00007f4a94370840(0000) GS:ffff8801dee00000(0000) knlGS:0000000000000000
[ 186.897408] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 186.897412] CR2: 0000000000000690 CR3: 00000000031f5000 CR4: 0000000000002660
[ 186.897418] Stack:
[ 186.897420] ffffffff810f7fda ffff8801d298f340 ffff8801d0fa2060 ffff880002a05e78
[ 186.897425] ffffffff810f9581 ffffffff8172a480 ffffffff81c55740 ffff880002a05e60
[ 186.897430] ffffffff8172a480 ffff880002a05ef0 ffff880002a05e60 ffffffff810f6b93
[ 186.897435] Call Trace:
[ 186.897441] [<ffffffff810f7fda>] ? audit_log_untrustedstring+0x1a/0x30
[ 186.897445] [<ffffffff810f9581>] audit_log_name+0x281/0x320
[ 186.897451] [<ffffffff8172a480>] ? _raw_spin_unlock_irqrestore+0x20/0x40
[ 186.897455] [<ffffffff8172a480>] ? _raw_spin_unlock_irqrestore+0x20/0x40
[ 186.897459] [<ffffffff810f6b93>] ? audit_buffer_free+0x73/0xa0
[ 186.897463] [<...

Read more...

Revision history for this message
Simon Déziel (sdeziel) wrote :

On 05/15/2015 11:55 AM, Pete Cheslock wrote:
> I'm still able to recreate this issue with kernel version
> 3.13.0-52-generic #85-Ubuntu SMP Wed Apr 29 16:44:17 UTC 2015 x86_64
> x86_64 x86_64 GNU/Linux

The fix landed in the kernel (#86) right after the one you are running
(#85).

Revision history for this message
Pete Cheslock (pete-cheslock) wrote :

Ah - crap - sorry about that. You are right. Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.