ReplayKernel.test_x86_64_pc fails intermittently
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
QEMU |
Expired
|
Undecided
|
Unassigned |
Bug Description
Even though this acceptance test is already skipped on GitLab CI, the intermittent failures can be seen on other environments too.
The record phase works fine, but during the replay phase fail to finish booting the kernel (until the expected place):
16:34:47 DEBUG| [ 0.034498] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
16:34:47 DEBUG| [ 0.034790] Spectre V2 : Spectre mitigation: LFENCE not serializing, switching to generic retpoline
16:34:47 DEBUG| [ 0.035093] Spectre V2 : Mitigation: Full generic retpoline
16:34:47 DEBUG| [ 0.035347] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
16:34:47 DEBUG| [ 0.035667]
16:36:02 ERROR|
16:36:02 ERROR| Reproduced traceback from: /home/cleber/
16:36:02 ERROR| Traceback (most recent call last):
16:36:02 ERROR| File "/var/lib/
16:36:02 ERROR| self.run_
16:36:02 ERROR| File "/var/lib/
16:36:02 ERROR| False, shift, args, replay_path)
16:36:02 ERROR| File "/var/lib/
16:36:02 ERROR| self.wait_
16:36:02 ERROR| File "/var/lib/
16:36:02 ERROR| vm=vm)
16:36:02 ERROR| File "/var/lib/
16:36:02 ERROR| _console_
16:36:02 ERROR| File "/var/lib/
16:36:02 ERROR| msg = console.
16:36:02 ERROR| File "/usr/lib64/
16:36:02 ERROR| def readinto(self, b):
16:36:02 ERROR| File "/home/
16:36:02 ERROR| raise RuntimeError("Test interrupted by SIGTERM")
16:36:02 ERROR| RuntimeError: Test interrupted by SIGTERM
16:36:02 ERROR|
On my workstation, I can replicate the failure roughly once every 50 runs.
I'm actually able to increase the reproducibility to ~ 90% when running 8 of those tests simultaneously (on an 8 core system).