mkdir/rm/sleep/ls causes kernel 'BUG: unable to handle kernel paging request'
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
I have a network of stock 16.04.2 LTS (Xenial Xerus) servers with entirely unmodified "4.4.0-62-generic #83-Ubuntu" kernel running on a private network; they run telemetry programs, mostly sh/php out of crontab, with very light user interaction for configuration via apache and extremely occasional adminstrator ssh access. They all are on the same hardware: same motherboard, same amount of RAM, vary similar very small SATA SSD disks.
A recent fault made us examine the logs, and we see that since 2017 about half a dozen servers are reporting kernel bugs about once a month.
BUG: unable to handle kernel paging request at ffff88032fc00062
CPU: 0 PID: 26071 Comm: mkdir Not tainted 4.4.0-62-generic #83-Ubuntu
The details vary. The most common command is mkdir, but also rm, head, basename, ls, sleep. (There are every-minute cronjobs sh-scripts which run these commands.)
About half of the logs show tainted (G, D) and have untainted.
I have found no pattern with time of day, uptime, load (0.16, 0.22, 0.25 for following report), day of week.
This is a typical syslog entry, from 2021-01-29; it has the same issue in March and May (Comm: mkdir, but tainted G D).
Jan 29 19:50:17 hostname kernel: [2315584.884470] BUG: unable to handle kernel paging request at ffff88042fc80062
Jan 29 19:50:17 hostname kernel: [2315584.884500] IP: [<ffffffff811af
Jan 29 19:50:17 hostname kernel: [2315584.884524] PGD 220b067 PUD 0
Jan 29 19:50:17 hostname kernel: [2315584.884538] Oops: 0002 [#1] SMP
Jan 29 19:50:17 hostname kernel: [2315584.884552] Modules linked in: ppdev snd_hda_
Jan 29 19:50:17 hostname kernel: [2315584.884744] CPU: 1 PID: 10730 Comm: mkdir Not tainted 4.4.0-62-generic #83-Ubuntu
Jan 29 19:50:17 hostname kernel: [2315584.884760] Hardware name: /PD11TI, BIOS MTCDT10N.
Jan 29 19:50:17 hostname kernel: [2315584.884779] task: ffff880034c13fc0 ti: ffff8800c8838000 task.ti: ffff8800c8838000
Jan 29 19:50:17 hostname kernel: [2315584.884795] RIP: 0010:[<
Jan 29 19:50:17 hostname kernel: [2315584.884816] RSP: 0000:ffff8800c8
Jan 29 19:50:17 hostname kernel: [2315584.884842] RAX: 0000000000000001 RBX: ffffea000285d540 RCX: 00000002ffffffff
Jan 29 19:50:17 hostname kernel: [2315584.884878] RDX: 0000000300000062 RSI: 0000000000000021 RDI: ffffea000285d540
Jan 29 19:50:17 hostname kernel: [2315584.884915] RBP: ffff8800c883bc28 R08: ffffffff81cd2dc4 R09: ffffffff81cd2db3
Jan 29 19:50:17 hostname kernel: [2315584.884951] R10: 0000000000000000 R11: ffffffff81cd2da2 R12: ffff88012fff7f80
Jan 29 19:50:17 hostname kernel: [2315584.884987] R13: 0000000000800000 R14: ffffea000285d500 R15: ffff88012fff77c0
Jan 29 19:50:17 hostname kernel: [2315584.885027] FS: 00007fa2813a180
Jan 29 19:50:17 hostname kernel: [2315584.885065] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 19:50:17 hostname kernel: [2315584.885088] CR2: ffff88042fc80062 CR3: 0000000034f10000 CR4: 00000000000006e0
Jan 29 19:50:17 hostname kernel: [2315584.885125] Stack:
Jan 29 19:50:17 hostname kernel: [2315584.885144] ffff8800c883bcf0 ffffffff811af98d ffff88012fff96c0 00000001df4d6b62
Jan 29 19:50:17 hostname kernel: [2315584.885186] ffff880035327a10 ffff880035327a00 ffff8800c97a7628 ffff8800c97a7628
Jan 29 19:50:17 hostname kernel: [2315584.885227] 00000000df4d6b62 ffff880035327a00 ffff88012fff96d0 0000000000000000
Jan 29 19:50:17 hostname kernel: [2315584.885269] Call Trace:
Jan 29 19:50:17 hostname kernel: [2315584.885296] [<ffffffff811af
Jan 29 19:50:17 hostname kernel: [2315584.885324] [<ffffffff81198
Jan 29 19:50:17 hostname kernel: [2315584.885355] [<ffffffff811e3
Jan 29 19:50:17 hostname kernel: [2315584.885383] [<ffffffff811c1
Jan 29 19:50:17 hostname kernel: [2315584.885410] [<ffffffff811c9
Jan 29 19:50:17 hostname kernel: [2315584.885436] [<ffffffff811ad
Jan 29 19:50:17 hostname kernel: [2315584.885464] [<ffffffff8106b
Jan 29 19:50:17 hostname kernel: [2315584.885490] [<ffffffff8106b
Jan 29 19:50:17 hostname kernel: [2315584.885517] [<ffffffff8183a
Jan 29 19:50:17 hostname kernel: [2315584.885541] Code: 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 4f 58 89 f6 55 b8 01 00 00 00 48 89 e5 48 8d 54 31 42 <65> 0f c0 02 83 c0 01 65 8a 49 41 38 c8 7f 02 5d c3 d0 f9 0f be
Jan 29 19:50:17 hostname kernel: [2315584.885736] RIP [<ffffffff811af
Jan 29 19:50:17 hostname kernel: [2315584.885763] RSP <ffff8800c883bc28>
Jan 29 19:50:17 hostname kernel: [2315584.885783] CR2: ffff88042fc80062
Jan 29 19:50:17 hostname kernel: [2315584.886024] ---[ end trace f32f2db37ef9c9df ]---
I have several years of logs on multiple machines for this and will be happy to supply whatever information is necessary. As these machines are in service it is difficult to do experiments like run different kernels, but I'll consider any requests carefully.
Please ask for whatever logs/information might be helpful.
Thanks in advance for any help.
Jonathan
result of ubuntu-bug linux
Redacted to remove hostname and private details from syslog