[lucid] breaks apport: core dumps get aborted even if core_pattern is a pipe

Bug #498525 reported by Martin Pitt
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Andy Whitcroft
Lucid
Fix Released
High
Andy Whitcroft

Bug Description

Current lucid kernel does not work with apport crash detection any more. Previous kernels ignored RLIMIT_CORE if /proc/sys/kernel/core_pattern was a pipe instead of a file, i. e. crashes could be intercepted by Apport in a default system.

Now this does not seem to work any more.

$ bash -c 'kill -SEGV $$'
Segmentation fault

Notice no "(core dumped)", and dmesg says:

Dec 19 16:02:45 tick kernel: [54640.971006] Process 19685(bash) has RLIMIT_CORE set to 0
Dec 19 16:02:45 tick kernel: [54640.971013] Aborting core

An alternative reproducer is calling the kernel core dump crash detection part of apport's test suite:

$ /usr/share/apport/testsuite/crash
* empty core dumps do not generate a report
* check test process creation/killing with apport
Traceback (most recent call last):
  File "/usr/share/apport/testsuite/crash", line 150, in <module>
    check_crash()
  File "/usr/share/apport/testsuite/crash", line 58, in check_crash
    assert os.WCOREDUMP(result), 'result: 0x%02X should include WCOREDUMP' % result
AssertionError: result: 0x0B should include WCOREDUMP

ProblemType: Bug
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: STAC92xx Analog [STAC92xx Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: STAC92xx Analog [STAC92xx Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: martin 6192 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xefebc000 irq 21'
   Mixer name : 'SigmaTel STAC9200'
   Components : 'HDA:83847690,10280201,00102201'
   Controls : 7
   Simple ctrls : 5
Date: Sat Dec 19 16:04:20 2009
DistroRelease: Ubuntu 10.04
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=96523246-f56d-4385-a46f-292cefc7a970
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Alpha amd64 (20091209)
MachineType: Dell Inc. Latitude D430
Package: linux-image-2.6.32-8-generic 2.6.32-8.12
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-8-generic root=UUID=c58ab6de-7f75-4e41-9888-2a9338bd55c6 ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-8.12-generic
Regression: Yes
RelatedPackageVersions: linux-firmware 1.28
Reproducible: Yes
SourcePackage: linux
Tags: lucid needs-upstream-testing regression-potential
TestedUpstream: No
Uname: Linux 2.6.32-8-generic x86_64
dmi.bios.date: 05/21/2007
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A00
dmi.board.name: 0HU754
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA00:bd05/21/2007:svnDellInc.:pnLatitudeD430:pvr:rvnDellInc.:rn0HU754:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Latitude D430
dmi.sys.vendor: Dell Inc.

Revision history for this message
Martin Pitt (pitti) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :

I take the liberty to bump priority on this, since it stops us from getting important crash notifications throughout the distro.

Changed in linux (Ubuntu Lucid):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
milestone: none → lucid-alpha-2
Changed in linux (Ubuntu Lucid):
status: New → Triaged
Andy Whitcroft (apw)
Changed in linux (Ubuntu Lucid):
assignee: Canonical Kernel Team (canonical-kernel-team) → Andy Whitcroft (apw)
Andy Whitcroft (apw)
Changed in linux (Ubuntu Lucid):
status: Triaged → In Progress
Revision history for this message
Andy Whitcroft (apw) wrote :

Ok this has changed recently. It is part of a new mechanism for detecting recursive core-dump, ie. dumps in the core dump handler. This changed in the commit below:

  commit 725eae32df7754044809973034429a47e6035158
  Author: Neil Horman <email address hidden>
  Date: Wed Sep 23 15:56:54 2009 -0700

    exec: make do_coredump() more resilient to recursive crashes

    Change how we detect recursive dumps.

    Currently we have a mechanism by which we try to compare pathnames of the
    crashing process to the core_pattern path. This is broken for a dozen
    reasons, and just doesn't work in any sort of robust way.

    I'm replacing it with the use of a 0 RLIMIT_CORE value. Since helper apps
    set RLIMIT_CORE to zero, we don't write out core files for any process
    with that particular limit set. It the core_pattern is a pipe, any
    non-zero limit is translated to RLIM_INFINITY.

    This allows complete dumps to be captured, but prevents infinite recursion
    in the event that the core_pattern process itself crashes.

    [<email address hidden>: coding-style fixes]
    Signed-off-by: Neil Horman <email address hidden>
    Reported-by: Earl Chew <email address hidden>
    Cc: Oleg Nesterov <email address hidden>
    Cc: Andi Kleen <email address hidden>
    Cc: Alan Cox <email address hidden>
    Signed-off-by: Andrew Morton <email address hidden>
    Signed-off-by: Linus Torvalds <email address hidden>

The practicle upshot of which seems to be that setting the limit to 0 stops coredumps even for pipes. However, setting it to a very low value, say 1, will restore the original behaviour without allowing a real dump to occur where pipes are not in use.

Revision history for this message
Andy Whitcroft (apw) wrote :

Having discussed this on IRC it seems we need this set as early as possible so it would either have to be changed in the kernel or changed in the initramfs/upstart. Will look at how hard it would be to change cleanly in the kernel.

Andy Whitcroft (apw)
Changed in linux (Ubuntu Lucid):
status: In Progress → Fix Committed
Revision history for this message
Kees Cook (kees) wrote :

"0" is a valid RLIMIT_CORE value and should not be overloaded to mean "running pipe handler". Instead, the pipe handler should get an RLIMIT_CORE of "1" and leave the rest of the system able to run handlers when processes have already been started with RLIMIT_CORE 0.

Preferably, RLIMIT_CORE values should not be overloaded at all, and the kernel should know which process is the handler, and ignore crashes of that pid.

Kees Cook (kees)
Changed in linux (Ubuntu Lucid):
status: Fix Committed → In Progress
Revision history for this message
Andy Whitcroft (apw) wrote :

Ok the first stab at this involved updating the system default core limit to 1, which would then give us the semanatics we desire. However this does not work as pam will zap all of the user processes to a core limit of 0 and they will again be ignored. Will try out switching the recursion detection to 1. Note that apport does not set its limit at all currently and that will need fixing either way.

Revision history for this message
Andy Whitcroft (apw) wrote :

Ok new patch to move this marker to be a rlimit core of 1 byte:

$ cat killme
#!/usr/bin/python

import resource
import os
import sys

resource.setrlimit(resource.RLIMIT_CORE, (int(sys.argv[1]), -1))
os.kill(os.getpid(), 11)

$ echo "core" | sudo tee -a /proc/sys/kernel/core_pattern

$ ./killme 0
Segmentation fault
$ ./killme 1
Segmentation fault
$ ./killme 2
Segmentation fault
$ ./killme 1024
Segmentation fault
$ ./killme 4096
Segmentation fault (core dumped)

$ echo "|/usr/share/apport/apport %p %s %c" | sudo tee -a /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c

$ ./killme 0
Segmentation fault (core dumped)
$ ./killme 1
Segmentation fault
$ ./killme 2
Segmentation fault (core dumped)
$ ./killme 1024
Segmentation fault (core dumped)
$ ./killme 4096
Segmentation fault (core dumped)

Changed in linux (Ubuntu Lucid):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (9.6 KiB)

This bug was fixed in the package linux - 2.6.32-10.14

---------------
linux (2.6.32-10.14) lucid; urgency=low

  [ Alex Deucher ]

  * SAUCE: drm/radeon/kms: fix LVDS setup on r4xx
    - LP: #493795

  [ Andy Whitcroft ]

  * Revert "(pre-stable) acpi: Use the ARB_DISABLE for the CPU which model
    id is less than 0x0f."
  * config-check -- ensure the checks get run at build time
  * config-check -- check the processed config during updateconfigs
  * config-check -- CONFIG_SECCOMP may not be present
  * TUN is now built in ignore
  * SAUCE: acpi battery -- delay first lookup of the battery until first
    use
  * SAUCE: async_populate_rootfs: move rootfs init earlier
  * ubuntu: AppArmor -- update to mainline 2010-01-06
  * SAUCE: move RLIMIT_CORE pipe dumper marker to 1
    - LP: #498525

  [ Dave Airlie ]

  * (pre-stable) drm/radeon/kms: fix crtc vblank update for r600

  [ Leann Ogasawara ]

  * Add asix to nic-usb-modules file
    - LP: #499785

  [ Peter Zijlstra ]

  * (pre-stable) sched: Fix balance vs hotplug race

  [ Tim Gardner ]

  * [Config] Enable CONFIG_FUNCTION_TRACER
    - LP: #497989
  * [Config] Drop lpia from getabis
  * [Config] Build in TUN/TAP driver
    - LP: #499491
  * [Config] DH_COMPAT=5

  [ Upstream Kernel Changes ]

  * Revert "(pre-stable) drm/i915: Avoid NULL dereference with
    component_only tv_modes"
  * Revert "(pre-stable) drm/i915: Fix sync to vblank when VGA output is
    turned off"
  * USB: usb-storage: fix bug in fill_inquiry
  * USB: option: add pid for ZTE
  * firewire: ohci: handle receive packets with a data length of zero
  * rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling
    of ->completed counter
  * rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed
    counter
  * rcu: Fix note_new_gpnum() uses of ->gpnum
  * rcu: Remove inline from forward-referenced functions
  * perf_event: Fix invalid type in ioctl definition
  * perf_event: Initialize data.period in perf_swevent_hrtimer()
  * perf: Don't free perf_mmap_data until work has been done
  * PM / Runtime: Fix lockdep warning in __pm_runtime_set_status()
  * sched: Check for an idle shared cache in select_task_rq_fair()
  * sched: Fix affinity logic in select_task_rq_fair()
  * sched: Rate-limit newidle
  * sched: Fix and clean up rate-limit newidle code
  * x86/amd-iommu: attach devices to pre-allocated domains early
  * x86/amd-iommu: un__init iommu_setup_msi
  * x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking
    up the PCI tree
  * x86: Fix iommu=nodac parameter handling
  * x86: GART: pci-gart_64.c: Use correct length in strncmp
  * x86: ASUS P4S800 reboot=bios quirk
    - LP: #366682
  * x86, apic: Enable lapic nmi watchdog on AMD Family 11h
  * ssb: Fix range check in sprom write
  * ath5k: allow setting txpower to 0
  * ath5k: enable EEPROM checksum check
  * hrtimer: Fix /proc/timer_list regression
  * ALSA: hrtimer - Fix lock-up
  * ALSA: hda - Terradici HDA controllers does not support 64-bit mode
  * KVM: x86 emulator: limit instructions to 15 bytes
  * KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
  * KVM: s390: Make psw available on all exits...

Read more...

Changed in linux (Ubuntu Lucid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.