Ubuntu server should not suppress conosle outputs

Bug #1245808 reported by Munehisa Kamata on 2013-10-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
procps (Ubuntu)
Undecided
Unassigned

Bug Description

Hello,

Ubuntu server(12.04 LTS, 12.10, 13.04 and 13.10) currently uses the following /proc/sys/kernel/printk configuration by default.

ubuntu@ip-10-120-14-14:~$ cat /proc/sys/kernel/printk
4 4 1 7

This configuration suppresses register dump and stacktrace outputs from lockup detector and khungtaskd on a console.

For example:

lockup detector:
[41812904.107136] BUG: soft lockup - CPU#1 stuck for 22s! [insmod:1003]
[41812904.107213] Stack:
[41812904.107230] Call Trace:
[41812904.107269] Code: 90 41 89 fe 65 44 8b 2c 25 10 da 00 00 66 66 90 0f ae e8 e8 f9 58 d0 ff 66 90 41 89 c4 eb 11 66 90 f3 90 65 8b 1c 25 10 da 00 00 <41> 39 dd 75 20 66 66 90 0f ae e8 e8 d6 58 d0 ff 66 90 89 c2 44
[41812908.227029] INFO: rcu_sched detected stall on CPU 1 (t=15000 jiffies)

khungtaskd:
[41818362.469083] INFO: task swapon:1032 blocked for more than 120 seconds.
[41818362.469096] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[41818482.409149] INFO: task swapon:1032 blocked for more than 120 seconds.
[41818482.409161] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

These lack the most important information for debugging lockup/hung-up issue and make debugging such kernel issue difficult.

On the other hand, other distros basically use the different configuration by default as shown below. I believe that Ubuntu server should use this instead of the currnet one.

[ec2-user@ip-10-132-147-86 ~]$ cat /proc/sys/kernel/printk
7 4 1 7

This will not suppress such important console outputs and we can understand where a problem happens at a glance.

For example:

[41814106.447145] BUG: soft lockup - CPU#1 stuck for 23s! [insmod:1066]
[41814106.447160] Modules linked in: stallmod(O+) isofs acpiphp
[41814106.447174] CPU 1
[41814106.447178] Modules linked in: stallmod(O+) isofs acpiphp
[41814106.447191]
[41814106.447198] Pid: 1066, comm: insmod Tainted: G O 3.2.0-52-virtual #78-Ubuntu
[41814106.447209] RIP: e030:[<ffffffff8101c216>] [<ffffffff8101c216>] native_read_tsc+0x6/0x20
[41814106.447225] RSP: e02b:ffff8801d0b4dea8 EFLAGS: 00000246
[41814106.447230] RAX: 000000004c5a957c RBX: 0000000000000001 RCX: 000000004c5a954c
...snip...
[41814106.447296] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[41814106.447304] Process insmod (pid: 1066, threadinfo ffff8801d0b4c000, task ffff8801d09b96e0)
[41814106.447311] Stack:
[41814106.447315] ffff8801d0b4ded8 ffffffff8131693a 0000000000001bbd 0000000000000000
[41814106.447326] ffffffffa000c000 0000000001351010 ffff8801d0b4dee8 ffffffff8131686c
[41814106.447336] ffff8801d0b4df08 ffffffffa000702a 0000000000000000 ffffffffa0009020
[41814106.447347] Call Trace:
[41814106.447357] [<ffffffff8131693a>] delay_tsc+0x4a/0x80
[41814106.447365] [<ffffffffa000c000>] ? 0xffffffffa000bfff
[41814106.447371] [<ffffffff8131686c>] __const_udelay+0x2c/0x30
[41814106.447380] [<ffffffffa000702a>] stall_timeout+0x2a/0x38 [stallmod]
[41814106.447387] [<ffffffffa000c02e>] init_module+0x2e/0x1000 [stallmod]
[41814106.447397] [<ffffffff81002040>] do_one_initcall+0x40/0x180
...snip...
[41814106.447541] [<ffffffff81661b42>] system_call_fastpath+0x16/0x1b
[41814112.951787] INFO: rcu_sched detected stall on CPU 1 (t=15000 jiffies)
[41814112.951801] sending NMI to all CPUs:
[41814123.412289] sched: RT throttling activated

On Ubuntu server, the default configuration is in /etc/sysctl.d/10-console-messages.conf owned by procps package.

ubuntu@ip-10-120-14-14:~$ cat /etc/sysctl.d/10-console-messages.conf

# the following stops low-level messages on console
kernel.printk = 4 4 1 7
ubuntu@ip-10-120-14-14:~$

Here is a possible fix.

--- /etc/sysctl.d/10-console-messages.conf.orig 2013-09-12 07:36:09.101819575 +0000
+++ /etc/sysctl.d/10-console-messages.conf 2013-09-12 07:40:16.517819543 +0000
@@ -1,3 +1,6 @@

-# the following stops low-level messages on console
-kernel.printk = 4 4 1 7
+# To stop low-level messages on console, use the following configuration instead.
+#
+# kernel.printk = 4 4 1 7
+#
+kernel.printk = 7 4 1 7

As far as I can see, this has been originally brought from Debian[1] but I don't think that the point of the original report is still true today. I believe that suppressing register dump and stacktrace on console by default is not good idea, especially on server-class OS.
Of course, this can be easily configured by end-user themselves, but most of them will never touch such configuration unless someone asks after a problem has happened.

Do you have any concerns about this? Any comments would be appreciated.

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=292834

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers