Bootspeed completes 16-17 runs and then fails to connect to machine

Bug #1099519 reported by Max Brustkern on 2013-01-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
UTAH
High
Max Brustkern

Bug Description

Sometimes, the bootspeed process will complete 16 or 17 runs worth of data gathering, but then when the machine reboots we time out attempting to connect to it again after 10 minutes.

Changed in utah:
status: New → Triaged
importance: Undecided → High
Max Brustkern (nuclearbob) wrote :

I got a good look at this today. The syslog on the machine has no entries for 17 minutes. I guess if we had tried for 20 minutes instead of 10, we would have been able to continue. I'm going to look into raising this timeout to 30 minutes to be safe.

Max Brustkern (nuclearbob) wrote :

I raised the timeout to 30 minutes, so this should be fixed.

Changed in utah:
status: Triaged → Fix Released
Max Brustkern (nuclearbob) wrote :

I saw the process fail after a single collection attempt here:
http://10.97.0.1:8080/job/bootspeed-backfill-desktop-i386-acer-veriton-01/29/console
If this continues to occur, we can reopen the bug and attempt to recreate.

Changed in utah:
assignee: nobody → Max Brustkern (nuclearbob)
Gema Gomez (gema) wrote :

may this be due to the CPU thermal protection? If it happens again, can we check with rick in the lab if this is due to the CPU refusing to restart due to thermal issues? or PSU thermal protection?

We could set up lm-sensors on the bootspeed tests setup and monitor the temperature of the CPU in case it overheats, and wait for it to cool down if needed:
https://help.ubuntu.com/community/SensorInstallHowto

Gema Gomez (gema) wrote :

Here is an example of how the output looks on my machine:
10:33:47 gema-pc:~ % sensors
atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage: +0.86 V (min = +0.80 V, max = +1.60 V)
 +3.3 Voltage: +3.47 V (min = +2.97 V, max = +3.63 V)
 +5 Voltage: +5.09 V (min = +4.50 V, max = +5.50 V)
 +12 Voltage: +12.32 V (min = +10.20 V, max = +13.80 V)
CPU FAN Speed: 1095 RPM (min = 600 RPM)
CHASSIS1 FAN Speed: 0 RPM (min = 600 RPM)
CHASSIS2 FAN Speed: 0 RPM (min = 600 RPM)
POWER FAN Speed: 0 RPM (min = 600 RPM)
CPU Temperature: +41.5°C (high = +60.0°C, crit = +95.0°C)
MB Temperature: +37.0°C (high = +45.0°C, crit = +75.0°C)

So if any of the values approaches maxs or highs for the test machine, we should insert a wait time in the tests.
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +36.0°C (high = +83.0°C, crit = +99.0°C)
Core 1: +32.0°C (high = +83.0°C, crit = +99.0°C)
Core 2: +35.0°C (high = +83.0°C, crit = +99.0°C)
Core 3: +31.0°C (high = +83.0°C, crit = +99.0°C)

Gema Gomez (gema) wrote :

This could be related to the X hang on bug 1096943

Changed in utah:
status: Fix Released → In Progress
Max Brustkern (nuclearbob) wrote :

This still happens occasionally, but less since we've moved back to desktop.

Max Brustkern (nuclearbob) wrote :

The new bootspeed process doesn't have this issue. Possibly underlying utah improvements fixed it.

Changed in utah:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers