mkdir/rm/sleep/ls causes kernel 'BUG: unable to handle kernel paging request'

Bug #1928405 reported by Jonathan L
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I have a network of stock 16.04.2 LTS (Xenial Xerus) servers with entirely unmodified "4.4.0-62-generic #83-Ubuntu" kernel running on a private network; they run telemetry programs, mostly sh/php out of crontab, with very light user interaction for configuration via apache and extremely occasional adminstrator ssh access. They all are on the same hardware: same motherboard, same amount of RAM, vary similar very small SATA SSD disks.

A recent fault made us examine the logs, and we see that since 2017 about half a dozen servers are reporting kernel bugs about once a month.

 BUG: unable to handle kernel paging request at ffff88032fc00062
 CPU: 0 PID: 26071 Comm: mkdir Not tainted 4.4.0-62-generic #83-Ubuntu

The details vary. The most common command is mkdir, but also rm, head, basename, ls, sleep. (There are every-minute cronjobs sh-scripts which run these commands.)

About half of the logs show tainted (G, D) and have untainted.

I have found no pattern with time of day, uptime, load (0.16, 0.22, 0.25 for following report), day of week.

This is a typical syslog entry, from 2021-01-29; it has the same issue in March and May (Comm: mkdir, but tainted G D).

Jan 29 19:50:17 hostname kernel: [2315584.884470] BUG: unable to handle kernel paging request at ffff88042fc80062
Jan 29 19:50:17 hostname kernel: [2315584.884500] IP: [<ffffffff811af629>] __inc_zone_state+0x19/0x60
Jan 29 19:50:17 hostname kernel: [2315584.884524] PGD 220b067 PUD 0
Jan 29 19:50:17 hostname kernel: [2315584.884538] Oops: 0002 [#1] SMP
Jan 29 19:50:17 hostname kernel: [2315584.884552] Modules linked in: ppdev snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel coretemp snd_hda_codec serio_raw snd_hda_core snd_hwdep snd_pcm snd_timer snd lpc_ich shpchp soundcore parport_pc mac_hid 8250_fintek parport ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse ahci e1000e libahci ptp pps_core video fjes
Jan 29 19:50:17 hostname kernel: [2315584.884744] CPU: 1 PID: 10730 Comm: mkdir Not tainted 4.4.0-62-generic #83-Ubuntu
Jan 29 19:50:17 hostname kernel: [2315584.884760] Hardware name: /PD11TI, BIOS MTCDT10N.85T.0201.2014.1209.1030 12/09/2014
Jan 29 19:50:17 hostname kernel: [2315584.884779] task: ffff880034c13fc0 ti: ffff8800c8838000 task.ti: ffff8800c8838000
Jan 29 19:50:17 hostname kernel: [2315584.884795] RIP: 0010:[<ffffffff811af629>] [<ffffffff811af629>] __inc_zone_state+0x19/0x60
Jan 29 19:50:17 hostname kernel: [2315584.884816] RSP: 0000:ffff8800c883bc28 EFLAGS: 00010203
Jan 29 19:50:17 hostname kernel: [2315584.884842] RAX: 0000000000000001 RBX: ffffea000285d540 RCX: 00000002ffffffff
Jan 29 19:50:17 hostname kernel: [2315584.884878] RDX: 0000000300000062 RSI: 0000000000000021 RDI: ffffea000285d540
Jan 29 19:50:17 hostname kernel: [2315584.884915] RBP: ffff8800c883bc28 R08: ffffffff81cd2dc4 R09: ffffffff81cd2db3
Jan 29 19:50:17 hostname kernel: [2315584.884951] R10: 0000000000000000 R11: ffffffff81cd2da2 R12: ffff88012fff7f80
Jan 29 19:50:17 hostname kernel: [2315584.884987] R13: 0000000000800000 R14: ffffea000285d500 R15: ffff88012fff77c0
Jan 29 19:50:17 hostname kernel: [2315584.885027] FS: 00007fa2813a1800(0000) GS:ffff88012fc80000(0000) knlGS:0000000000000000
Jan 29 19:50:17 hostname kernel: [2315584.885065] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 19:50:17 hostname kernel: [2315584.885088] CR2: ffff88042fc80062 CR3: 0000000034f10000 CR4: 00000000000006e0
Jan 29 19:50:17 hostname kernel: [2315584.885125] Stack:
Jan 29 19:50:17 hostname kernel: [2315584.885144] ffff8800c883bcf0 ffffffff811af98d ffff88012fff96c0 00000001df4d6b62
Jan 29 19:50:17 hostname kernel: [2315584.885186] ffff880035327a10 ffff880035327a00 ffff8800c97a7628 ffff8800c97a7628
Jan 29 19:50:17 hostname kernel: [2315584.885227] 00000000df4d6b62 ffff880035327a00 ffff88012fff96d0 0000000000000000
Jan 29 19:50:17 hostname kernel: [2315584.885269] Call Trace:
Jan 29 19:50:17 hostname kernel: [2315584.885296] [<ffffffff811af98d>] zone_statistics+0x5d/0xa0
Jan 29 19:50:17 hostname kernel: [2315584.885324] [<ffffffff81198d29>] __alloc_pages_nodemask+0x159/0x2a0
Jan 29 19:50:17 hostname kernel: [2315584.885355] [<ffffffff811e3f5e>] alloc_pages_vma+0xbe/0x240
Jan 29 19:50:17 hostname kernel: [2315584.885383] [<ffffffff811c1e11>] handle_mm_fault+0x1491/0x1820
Jan 29 19:50:17 hostname kernel: [2315584.885410] [<ffffffff811c9563>] ? do_mmap+0x333/0x420
Jan 29 19:50:17 hostname kernel: [2315584.885436] [<ffffffff811adc2b>] ? vm_mmap_pgoff+0xbb/0xe0
Jan 29 19:50:17 hostname kernel: [2315584.885464] [<ffffffff8106b4f7>] __do_page_fault+0x197/0x400
Jan 29 19:50:17 hostname kernel: [2315584.885490] [<ffffffff8106b782>] do_page_fault+0x22/0x30
Jan 29 19:50:17 hostname kernel: [2315584.885517] [<ffffffff8183a778>] page_fault+0x28/0x30
Jan 29 19:50:17 hostname kernel: [2315584.885541] Code: 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 4f 58 89 f6 55 b8 01 00 00 00 48 89 e5 48 8d 54 31 42 <65> 0f c0 02 83 c0 01 65 8a 49 41 38 c8 7f 02 5d c3 d0 f9 0f be
Jan 29 19:50:17 hostname kernel: [2315584.885736] RIP [<ffffffff811af629>] __inc_zone_state+0x19/0x60
Jan 29 19:50:17 hostname kernel: [2315584.885763] RSP <ffff8800c883bc28>
Jan 29 19:50:17 hostname kernel: [2315584.885783] CR2: ffff88042fc80062
Jan 29 19:50:17 hostname kernel: [2315584.886024] ---[ end trace f32f2db37ef9c9df ]---

I have several years of logs on multiple machines for this and will be happy to supply whatever information is necessary. As these machines are in service it is difficult to do experiments like run different kernels, but I'll consider any requests carefully.

Please ask for whatever logs/information might be helpful.

Thanks in advance for any help.

Jonathan

Revision history for this message
Jonathan L (7-jonathan) wrote :

result of ubuntu-bug linux

Redacted to remove hostname and private details from syslog

Revision history for this message
Jonathan L (7-jonathan) wrote :

Typo: should be:

About half of the logs show tainted (G, D) and *HALF* untainted.

These computers are rebooted perhaps once a month, typically by power failure.

Revision history for this message
Jonathan L (7-jonathan) wrote :

Apologies for horribly formatted syslog output in description, file attached.

Revision history for this message
Chris Guiver (guiverc) wrote :

Thank you for reporting this bug to Ubuntu.

Ubuntu 16.04 (xenial) reached end-of-life on April 29, 2021.

See this document for currently supported Ubuntu releases:
https://wiki.ubuntu.com/Releases

We appreciate that this bug may be old and you might not be interested in discussing it any more. But if you are then please upgrade to the latest Ubuntu version and re-test. If you then find the bug is still present in the newer Ubuntu version, please add a comment here telling us which new version it is in.

FYI: You're also referring to unpatched and outdated kernels for xenial/16.04. If you tried a patched kernel the issue may no longer have existed (fyi: last xenial kernel was 4.4.0.210.216 available using standard support which has now ended).

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jonathan L (7-jonathan) wrote :

> Ubuntu 16.04 (xenial) reached end-of-life on April 29, 2021.

It did indeed!

I'll rebuild one or two of the servers with 20.04 LTE and report back.

As it only infrequently manifests, it will take a while to see whether that fixes the issue.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.