ceph-osd early start interferes with kdump-tools during kernel dump

Bug #1461429 reported by Louis Bouchard
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Invalid
Medium
Unassigned
Trusty
Fix Released
Medium
Unassigned

Bug Description

When a kernel crash dump occurs on a system with kdump-tools configured and enabled, kexec triggers a reboot of the server which will start kdump-tools to capture the kernel dump.

On systems running CEPH osd's, the configured osds will start even if kdump-tools is setup to start very early in the boot phase. Even replacing the kdump-tools sysVinit script by an upstart job that runs before the runlevel signal is not sufficient.

The /etc/init/ceph-osd.conf job still runs while kdump is capturing the dums, triggering the OOM killer.

Louis Bouchard (louis)
Changed in ceph (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Louis Bouchard (louis-bouchard)
Revision history for this message
Louis Bouchard (louis) wrote :
Download full text (4.0 KiB)

Here is an example of a captured session :

[ 399.597207] SysRq : Trigger a crash
[ 399.599050] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 399.600421] IP: [<ffffffff81457fc6>] sysrq_handle_crash+0x16/0x20
[ 399.600745] PGD 3758e067 PUD 3d616067 PMD 0
[ 399.600745] Oops: 0002 [#1] SMP
...
 * Starting enable remaining boot-time encrypted block devices[74G[ OK ]
Cloud-init v. 0.7.5 running 'init' at Wed, 03 Jun 2015 07:33:31 +0000. Up 8.83 seconds.
ci-info: ++++++++++++++++++++++++Net device info++++++++++++++++++++++++
ci-info: +--------+------+-----------+-------------+-------------------+
ci-info: | Device | Up | Address | Mask | Hw-Address |
ci-info: +--------+------+-----------+-------------+-------------------+
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . |
ci-info: | eth0 | True | 10.5.1.39 | 255.255.0.0 | fa:16:3e:09:27:0b |
ci-info: +--------+------+-----------+-------------+-------------------+
ci-info: +++++++++++++++++++++++++++++Route info+++++++++++++++++++++++++++++
ci-info: +-------+-------------+----------+-------------+-----------+-------+
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+-------------+----------+-------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 10.5.0.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 10.5.0.0 | 0.0.0.0 | 255.255.0.0 | eth0 | U |
ci-info: +-------+-------------+----------+-------------+-----------+-------+
 * Starting Ceph OSD[74G[ OK ] <<<<<<<<<<<<<<<
 * Starting Signal sysvinit that local filesystems are mounted[74G[ OK ]
 * Starting configure network device security[74G[ OK ]
 * Starting flush early job output to logs[74G[ OK ]
...
 * Starting Ceph OSD (start all instances)[74G[ OK ]
 * Starting regular background program processing daemon[74G[ OK ]
 * Starting deferred execution scheduler[74G[ OK ]
Starting kdump-tools: * Starting Ceph MON (start all instances)[74G[ OK ]
 * Stopping save kernel messages[74G[ OK ]
 * Starting Ceph MON[74G[ OK ]
 * Starting automatic crash report generation[74G[ OK ]
 * Stopping CPU interrupts balancing daemon[74G[ OK ]
 * running makedumpfile -c -d 31 /proc/vmcore /var/crash/201506030733/dump-incomplete
 * Stopping Ceph MON (start all instances)[74G[ OK ]
 * Starting Ceph monitor (all instances)[74G[ OK ]
The kernel version is not supported.
The created dumpfile may be incomplete.
cyclic buffer size has been changed: 32767 => 32640
Excluding unnecessary pages : [ 0.0 %] /
* Starting Create Ceph client.admin key when possible[74G[ OK ]
Excluding unnecessary pages : [100.0 %] |
Excluding unnecessary pages : [100.0 %] \
Excluding unnecessary pages : [100.0 %] -
Excluding unnecessary pages : [100.0 %] /
Excluding unnecessary pages : [100.0 %] |
 * Starting OpenSSH server[74G[ OK ]
Copying data : [ 9.4 %] \
Copying data : [ 25.1 %] -
Copying data : [ 47.2 %] /
Copying data : [...

Read more...

Changed in ceph (Ubuntu Trusty):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Louis Bouchard (louis-bouchard)
Changed in ceph (Ubuntu):
assignee: Louis Bouchard (louis-bouchard) → nobody
status: Confirmed → Invalid
Revision history for this message
Louis Bouchard (louis) wrote :

Marking the development release as Invalid since Wily uses systemd

Chris J Arges (arges)
summary: - ceph-osd early start intefere with kdump-tools during kernel dump
+ ceph-osd early start interferes with kdump-tools during kernel dump
Revision history for this message
Louis Bouchard (louis) wrote :

This is caused by the fact that Ceph OSDs are started by the following udev rule : /lib/udev/rules.d/95-ceph-osd.rules. This rules emits the ceph-osd signal that triggers the job to start.

The problem is not with ceph but with how kdump-tools relies on sysVinit script ordering to run early.

Louis Bouchard (louis)
Changed in ceph (Ubuntu Trusty):
status: Confirmed → Fix Released
assignee: Louis Bouchard (louis-bouchard) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.