test fails randomly

Bug #1583979 reported by Louis Bouchard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
autopkgtest (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

When two Tests: blocks are used in the debian/tests/control file of the kdump-tools DEP8 tests, the test-crashdump test fails on random occasions.

Here is the content of the control file :

#Tests: install test-crashdump
Tests: install
Depends: linux-crashdump
Restrictions: needs-root, isolation-machine, allow-stderr

Tests: test-crashdump
Depends: linux-crashdump
Restrictions: needs-root, isolation-machine, allow-stderr

When the following d/t/control file is used, the test-crashdump test will work each time :

Tests: install test-crashdump
Depends: linux-crashdump
Restrictions: needs-root, isolation-machine, allow-stderr

Description of the DEP8 tests:
==============================
The 'install' test will verify that installation and configuration of kdump-tools and makedumpfile have been done correctly according to the 'linux-crashdump' dependancy of the test. It will then call /tmp/autopkgtest-reboot with the enable-crashkernel marker.

The second test 'test-crashdump' will set the ADT_REBOOT_MARK to local and trigger a kernel crash dump. This is the test that fails when two Tests: blocks are present. Most of the time it fails by not being able to mount Root and drops to rescue.

If only one Tests: block is used, the kernel crash dump will run kdump-tools that will trigger makedumpfile and reboot to complete the test.

How to reproduce the problem:
=============================
$ dget https://launchpad.net/~louis-bouchard/+archive/ubuntu/kdump-dep8-failure/+files/makedumpfile_1.5.9-6~dep8.dsc
$ adt-buildvm-ubuntu-cloud
$ cd makedumpfile-1.5.9
$ adt-run --unbuilt-tree $(pwd) --- qemu --show-boot ../../adt-yakkety-amd64-cloud.img

Here is a capture of a failure :

[ OK ] Mounted Huge Pages File System. [15/1787]
[ 2.625656] systemd[1]: Mounted Debug File System.
[ OK ] Mounted Debug File System.
[ 2.627135] systemd[1]: Started Journal Service.
[ OK ] Started Journal Service.
[FAILED] Failed to start Remount Root and Kernel File Systems.
See 'systemctl status systemd-remount-fs.service' for details.
[ OK ] Started Create list of required sta...ce nodes for the current kernel.
[ OK ] Started Set console scheme.
...
[FAILED] Failed to start Create Volatile Files and Directories.
See 'systemctl status systemd-tmpfiles-setup.service' for details.
[ OK ] Started Set console font and keymap.
[ OK ] Started Tell Plymouth To Write Out Runtime Data.
         Starting Update UTMP about System Boot/Shutdown...
[FAILED] Failed to start Network Time Synchronization.
See 'systemctl status systemd-timesyncd.service' for details.

The test will evenutally timeout.

It is worth mentionning that, while debugging the issue, I have often seen cases where by removing the --show-boot and attaching to the TTYS0 console with minicom on the second QEMU process that does the kernel crash, the test will succeed even with two Tests: blocks.

Might be useful to mention that, when two Tests: blocks are used, two qemu processes are executed.

I was able to get one successful test run with two Tests: block by attaching to the console while writing up this bug and two failures w/o console attached, with or without the --show-boot

Revision history for this message
Louis Bouchard (louis) wrote :

For some reason, I am not able to reproduce the issue when -d is used. This comfort my suspiscion that it is a timing issue within QEMU that doesn't happen if :

1) a second qemu process is not started (only one Tests: block)
2) the second qemu process has enough time to settle (-d slowing down execution)

Just a guess

Revision history for this message
Louis Bouchard (louis) wrote :

Hello again,

Just so you know, the issue IS NOT related to one or two Tests: blocks. I'm seeing the same behavior with only one block when running the same tests on an i386 xenial image.

Martin Pitt (pitti)
summary: - test fails randomly if two Tests: blocks are used in control files
+ test fails randomly
Changed in autopkgtest (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for autopkgtest (Ubuntu) because there has been no activity for 60 days.]

Changed in autopkgtest (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.