Ubuntu
apport package

systemd-journald, udev, logind crashed with SIGABRT -- get killed by 1 min watchdog timeout on longer kernel lockups

Bug #1433320 reported by Fila Kolodny on 2015-03-17

510

This bug affects 41 people

Affects		Status	Importance	Assigned to	Milestone
	apport (Ubuntu)	Fix Released	High	Martin Pitt
	systemd (Ubuntu)	Won't Fix	Medium	Unassigned

Bug Description

Errors Bucket
-------------
https://errors.ubuntu.com/bucket/?id=/lib/systemd/systemd-journald%3A6%3A__epoll_wait_nocancel%3Asd_event_wait%3Asd_event_run%3Amain

Sometimes userspace get locked up hard: A QEMU instance sees long hangs while the host gets suspended, USB devices/drivers lock up for some time (like in this bug's dmesg), or Google hangouts in bug 1495585.
The kernel detects that some process(es) get hard-locked up, the hardware watchdog (which systemd configures for some crucial services to auto-restart them via "WatchdogSec=1min") picks this up and restarts the process.

In that case, these crash reports only describe the symptom, not the cause. They should be suppressed since they provide no information that can be used to diagnose the actual root problem.

ProblemType: CrashDistroRelease: Ubuntu 15.04
Package: systemd 219-4ubuntu5 [modified: usr/share/dbus-1/system-services/org.freedesktop.systemd1.service]
ProcVersionSignature: Ubuntu 3.19.0-8.8-generic 3.19.1
Uname: Linux 3.19.0-8-generic x86_64
ApportVersion: 2.16.2-0ubuntu3
Architecture: amd64
Date: Tue Mar 17 17:18:08 2015
ExecutablePath: /lib/systemd/systemd-journald
InstallationDate: Installed on 2014-02-24 (386 days ago)
InstallationMedia: Ubuntu 13.10 "Saucy Salamander" - Release amd64 (20131016.1)
Lsusb:
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 003: ID 0e0f:0002 VMware, Inc. Virtual USB Hub
Bus 002 Device 002: ID 0e0f:0003 VMware, Inc. Virtual Mouse
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: VMware, Inc. VMware Virtual Platform
ProcCmdline: /lib/systemd/systemd-journald
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-8-generic root=UUID=7f776823-8240-460b-ab2d-45c8a2b2f2b6 ro quiet splash
Signal: 6SourcePackage: systemd
StacktraceTop:
__epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81
?? ()
?? ()
__libc_start_main (main=0x7f94ef54c760, argc=1, argv=0x7ffe7c6e2fa8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe7c6e2f98) at libc-start.c:287
?? ()
Title: systemd-journald crashed with SIGABRT in __epoll_wait_nocancel()
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 05/20/2014
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd05/20/2014:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

See original description

Tags:

Related branches

lp:~apport-hackers/apport/trunk

CVE References

2015-1338

Revision history for this message

Fila Kolodny (fila) wrote on 2015-03-17:

CurrentDmesg.txt Edit (119.8 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (3.1 KiB, text/plain; charset="utf-8")
Disassembly.txt Edit (991 bytes, text/plain; charset="utf-8")
Lspci.txt Edit (99.0 KiB, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (1.7 KiB, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (3.8 KiB, text/plain; charset="utf-8")
ProcMaps.txt Edit (6.1 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (3.1 KiB, text/plain; charset="utf-8")
ProcStatus.txt Edit (799 bytes, text/plain; charset="utf-8")
Registers.txt Edit (816 bytes, text/plain; charset="utf-8")
Stacktrace.txt Edit (882 bytes, text/plain; charset="utf-8")
SystemdDelta.txt Edit (894 bytes, text/plain; charset="utf-8")
SystemdFailedUnits.txt Edit (1.2 KiB, text/plain; charset="utf-8")
ThreadStacktrace.txt Edit (928 bytes, text/plain; charset="utf-8")
UdevDb.txt Edit (221.5 KiB, text/plain; charset="utf-8")
UdevLog.txt Edit (544.0 KiB, text/plain; charset="utf-8")

information type:

Private → Public

Revision history for this message

Apport retracing service (apport) wrote on 2015-03-17:

StacktraceTop:
__epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81
sd_event_wait (e=0x7f94f0fc5230, timeout=<optimized out>) at ../src/libsystemd/sd-event/sd-event.c:2373
sd_event_run (timeout=<optimized out>, e=<optimized out>) at ../src/libsystemd/sd-event/sd-event.c:2500
main (argc=8, argv=0x7ffe7c6e2850) at ../src/journal/journald.c:109

Revision history for this message

Apport retracing service (apport) wrote on 2015-03-17: Stacktrace.txt

Stacktrace.txt Edit (2.8 KiB, text/plain)

Revision history for this message

Apport retracing service (apport) wrote on 2015-03-17: StacktraceSource.txt

StacktraceSource.txt Edit (721 bytes, text/plain)

Revision history for this message

Apport retracing service (apport) wrote on 2015-03-17: ThreadStacktrace.txt

ThreadStacktrace.txt Edit (2.9 KiB, text/plain)

Changed in systemd (Ubuntu):
importance:	Undecided → Medium
tags:	removed: need-amd64-retrace

Martin Pitt (pitti) on 2015-03-26

summary:

- systemd-journald crashed with SIGABRT in __epoll_wait_nocancel()
+ systemd-journald crashed with SIGABRT in sd_event_wait()

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-03-29: Re: systemd-journald crashed with SIGABRT in sd_event_wait()

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status:	New → Confirmed

Brian Murray (brian-murray) on 2015-04-20

description:

updated

Revision history for this message

Martin Pitt (pitti) wrote on 2015-05-06:

Fixed in http://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?h=experimental&id=894d0f2d21d8e

Changed in systemd (Ubuntu):
status:	Confirmed → Fix Committed

Revision history for this message

Martin Pitt (pitti) wrote on 2015-05-06:

Sorry, wrong bug. This *might* be a duplicate of bug 1437896 due to memory corruption, but it's too early to tell as none of the duplicates here have a reproducer. We'll see if we will stop getting new reports once the fix for bug 1437896 lands.

Changed in systemd (Ubuntu):
status:	Fix Committed → Confirmed

Apport retracing service (apport) on 2015-09-11

tags:

added: wily

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2015-09-11:

For me this happens only on a QEMU-based virtual machine, not on real iron.

Martin Pitt (pitti) on 2015-09-14

summary:

- systemd-journald crashed with SIGABRT in sd_event_wait()
+ systemd-journald, udev, logind get killed by 1 min watchdog timeout

Revision history for this message

Martin Pitt (pitti) wrote on 2015-09-14: Re: systemd-journald, udev, logind get killed by 1 min watchdog timeout

#10

There is nothing specific to epoll_wait(), that's just the function that these daemons wait in for practially all the time. The SIGABRT is due to systemd killing/restarting the processes as they don't respond withing 1 minute (watchdog). In Fila's case the kernel reports hung tasks (in a page fault, i. e. slow/broken swap?), but in Till's reports there is no other apparent error message.

This is not just limited to QEMU, e. g. duplicate bug 1438805 is a real machine with a hardware watchdog. So we still only merely know the symptoms -- something stalls these QEMU or real laptop processes, but we don't know yet how. Does anyone have some observations/recipes how to reproduce this? I. e. did you start a lot of QEMU instances in parallel or something else what could make these slow? Maybe a running SSD TRIM in the background?

Changed in systemd (Ubuntu):
status:	Confirmed → Incomplete

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2015-09-14:

#11

My situation is more or less the following: On my laptop I have several virtual machines (usually 1 or 2 running at the same time) to have different versions of Ubuntu handy and also to test network printing use cases using only one physical machine, both for development of the printing part of Ubuntu.

Now to move the laptop from on spot to the other or over night I simply close the lid which does a suspend-to-RAM. stopping the system but keeping its state. When I resume by opening the lid everything gets running again from the point where I have left off, but with the clock beinbg on the current hour then and probably suggesting to certain processes that hours have passed waiting for an answer and causing timeouts. This couls probably be the cause of the crashes.

Revision history for this message

Martin Pitt (pitti) wrote on 2015-09-15:

#12

Till> ah, so this involves suspending the VMs? I figure QEMU's watchdog then doesn't handle this very well. Do other submitters also experience this during suspend?

Martin Pitt (pitti) on 2015-09-17

summary:

- systemd-journald, udev, logind get killed by 1 min watchdog timeout
+ systemd-journald, udev, logind crashed with SIGABRT in
+ __epoll_wait_nocancel() -- get killed by 1 min watchdog timeout

Revision history for this message

Martin Pitt (pitti) wrote on 2015-09-17: Re: systemd-journald, udev, logind crashed with SIGABRT in __epoll_wait_nocancel() -- get killed by 1 min watchdog timeout

#13

Steve analyzed this in the duplicate bug 1495585. I'll adjust the description here to make it easier to find/understand.

Changed in systemd (Ubuntu):
status:	Incomplete → Won't Fix
Changed in apport (Ubuntu):
status:	New → Triaged
importance:	Undecided → High
assignee:	nobody → Martin Pitt (pitti)
description:	updated
summary:	- systemd-journald, udev, logind crashed with SIGABRT in - __epoll_wait_nocancel() -- get killed by 1 min watchdog timeout + systemd-journald, udev, logind crashed with SIGABRT -- get killed by 1 + min watchdog timeout on longer kernel lockups
Changed in apport (Ubuntu):
status:	Triaged → In Progress

Martin Pitt (pitti) on 2015-09-17

Changed in systemd (Ubuntu):
status:	Won't Fix → Triaged

Martin Pitt (pitti) on 2015-09-22

Changed in apport (Ubuntu):
status:	In Progress → Fix Committed
Changed in systemd (Ubuntu):
status:	Triaged → Won't Fix

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-09-25:

#14

This bug was fixed in the package apport - 2.19-0ubuntu1

---------------
apport (2.19-0ubuntu1) wily; urgency=medium

  * New upstream release:
    - apport: Drop re-nicing. This might decrease the time a user has to wait
      for apport to finish the core dump for a crashed/hanging foreground
      process. (See LP #1278780)
    - kernel_crashdump: Enforce that the log/dmesg files are not a symlink.
      This prevents normal users from pre-creating a symlink to the
      predictable .crash file, and thus triggering a "fill up disk" DoS attack
      when the .crash report tries to include itself. Thanks to halfdog for
      discovering this! (CVE-2015-1338, part of LP #1492570)
    - SECURITY FIX: Fix all writers of report files (package_hook,
      kernel_crashdump, and similar) to open the report file exclusively,
      i. e. fail if they already exist. This prevents privilege escalation
      through symlink attacks. Note that this will also prevent overwriting
      previous reports with the same same. Thanks to halfdog for discovering
      this! (CVE-2015-1338, LP: #1492570)
    - apport: Ignore process restarts from systemd's watchdog. Their traces
      are usually useless as they don't have any information about the actual
      reasaon why processes hang (like VM suspends or kernel lockups with bad
      hardware) (LP: #1433320)

-- Martin Pitt <email address hidden> Thu, 24 Sep 2015 14:41:54 +0200

Changed in apport (Ubuntu):
status:	Fix Committed → Fix Released

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2015-10-06:

#15

For my QEMU-based virtual machines this fix seems not to have worked out. I still get the same amount of crash reports even with the fixed Apport installed.

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2015-10-06:

#16

See my duplicate bug report bug 1500968 as an example for a crash report which occured with the fixed Apport installed.

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2015-10-08:

#17

pitti, the problem seems not to be fixed yet, today I got this one: bug 1504167

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2015-10-16:

#18

pitti, seems still not to be fixed, I started my QEMU virtual machine with Wily and got: bug 1506796, bug 1506797, bug 1506798, bug 1506799, bug 1506800, and bug 1506801.

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2016-03-30:

#19

pitti, I still get the same crashes as I get all the time. Seems still not be fixed.

Revision history for this message

Martin Pitt (pitti) wrote on 2016-03-30:

#20

Well, the "crash" itself can't be fixed in userspace -- your kernel causes very long hangs in userspace processes for some reason, and the watchdog timer sees this and kills/restarts the hung processes.

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2016-03-30:

#21

pitti, the problem occurs only on a virtual machine which is on a laptop which gets frequently suspended (suspend to RAM). So processing gets stopped at a point of time and when it carries on after waking up from suspend the watchdog timers have all timed out. They should be stopped during the suspend, too.

Revision history for this message

Yuan Song (ysong) wrote on 2016-09-04:

#22

I got this bug pretty frequently on my 16.04 desktop(4.4.0-36-generic #55)

Program received signal SIGABRT, Aborted.
0x00007f8b6afa160a in timerfd_settime () at ../sysdeps/unix/syscall-template.S:84
#0 0x00007f8b6afa160a in timerfd_settime () at ../sysdeps/unix/syscall-template.S:84
#1 0x000055d36e6c416e in ?? ()
#2 0x000055d36e69f4ea in ?? ()
#3 0x00007f8b6aeba830 in __libc_start_main (main=0x55d36e69e540, argc=1, argv=0x7fffecd24e58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffecd24e48)
at ../csu/libc-start.c:291
#4 0x000055d36e69f8c9 in ?? ()
A debugging session is active.

Revision history for this message

Joe Marty (josephmarty+ubuntu) wrote on 2018-08-20:

#23

I'm having this same error occur on a new Ubuntu 18.04 VM, on VMWare Player. I don't understand why it was marked as "won't fix" for systemd... ?

This is happening when I suspend my host machine (windows 10). Upon resuming, the VM has crashed and goes through the startup process, then reports this error.

Revision history for this message

Joe Marty (josephmarty+ubuntu) wrote on 2018-08-20:

#24

VMWare log after crash and restart Edit (262.0 KiB, text/plain)

I don't know if this will help, but here's the vmware log:

Revision history for this message

Joe Marty (josephmarty+ubuntu) wrote on 2018-08-28:

#25

Incidentally, I only get this bug in a virtual machine that I had "copied" from one computer to another. I was able to re-create the virtual machine from scratch (using the same install image, and installing the same software, etc) on the new computer, and did not get the same issue in the new VM.

Revision history for this message

Joe Marty (josephmarty+ubuntu) wrote on 2018-08-30:

#26

I spoke too soon. I'm now getting the same error in the new VM, which was created from scratch.

Revision history for this message

itszn (itszn) wrote on 2018-12-11:

#27

I have just started using an Ubuntu 18.04.1 LTS server image in VMWare Workstation 15 this week.
When ever my host suspends, the systemd's watchdog times out many services including logind. This causes the x server to crash. (Which is really annoying since it kills all my terminals)

This is reproducible every time I suspend my host.

Systemd logs the following:
ubuntu systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
ubuntu systemd[1]: systemd-logind.service: Killing process 1122 (systemd-logind) with signal SIGABRT.

Increasing the watchdog time does not seem to help as it still is killed after the longer time.

I'm going to try removing the WatchdogSec option but that seems like a hack.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1436169
Bug #1437895
Bug #1438805
Bug #1440011
Bug #1440682
Bug #1445057
Bug #1494785
Bug #1494946
Bug #1494950
Bug #1494952
Bug #1494954
Bug #1495258
Bug #1495259
Bug #1495260
Bug #1495261
Bug #1495262
Bug #1495263
Bug #1495585
Bug #1495589
Bug #1496596
Bug #1496598
Bug #1496599
Bug #1497723
Bug #1497724
Bug #1497725
Bug #1500968
Bug #1504167
Bug #1506796
Bug #1506797
Bug #1506799
Bug #1506800
Bug #1506801
Bug #1523856
Bug #1552973
Bug #1555281
Bug #1555283
Bug #1561693
Bug #1561754
Bug #1562316
Bug #1562569
Bug #1562570
Bug #1562571
Bug #1562572
Bug #1562573
Bug #1563292
Bug #1570799

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuapport package

systemd-journald, udev, logind crashed with SIGABRT -- get killed by 1 min watchdog timeout on longer kernel lockups

Bug Description

Related branches

CVE References

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
apport package