bootchart sometimes doesn't generate all times when annotating

Bug #1091708 reported by Max Brustkern
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu Boot Speed Project
New
High
Unassigned

Bug Description

We run bootchart many times, over and over, to gather bootspeed data in the QA lab. Sometimes, it doesn't generate a bootchart at all, which is already filed as lp:598096 Sometimes, however, it generates a bootchart, but when we ask for annotation to get timings for the specific segments of the boot, some of the entries in the times file are empty. I'm attaching all the files we've gathered for a run where that occurred, and I can continue running the same process to generate more failures.

Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :

negativedesktop.tgz contains all of the other files. In the future, I'll just upload tarballs, but I wanted to have the individual files for one example easily accessible.

Changed in ubuntu-boot-speed:
assignee: nobody → Steve Langasek (vorlon)
Revision history for this message
Steve Langasek (vorlon) wrote :

I understand bootchart is being called with the following commandline:

bootchart --quiet --format=$FORMAT --crop-after=gnome-panel,mutter,unity-panel-service --annotate=ureadahead,mountall,hostname,hwclock --annotate=Xorg --annotate=gnome-session --annotate-file=$DIR/times --output=$DIR/bootchart.$FORMAT $DIR/bootchart.tgz

For the failed bootchart in question, running with --verbose instead of --quiet produces the following output:

parsing 'bootchart.tgz'
parsing 'header'
  0.0 seconds
parsing 'proc_stat.log'
  0.1 seconds
parsing 'proc_diskstats.log'
  0.34 seconds
parsing 'proc_ps.log'
warning: no parent for pid '2' with ppid '0'
1751 samples, avg. sample length 4.000000
process list size: 719
  11.07 seconds
no selected proc in list
merged 0 logger processes
pruned 252 process, 0 exploders, 128 threads, and 18 runs
False
bootchart written to 'bootchart.png'

Crucial is the line, 'no selected proc in list'. This means that as far as bootchart is concerned, the data collection ended *before* the system startup actually completed, where system startup is defined as "one of gnome-panel, mutter, or unity-panel-service has started".

So we need to figure out why this definition of "startup" is failing here. It could be a timing thing, or it could be that the relevant process names have changed.

Revision history for this message
Steve Langasek (vorlon) wrote :

Or, it *could* be a problem with the desktop startup itself. Looking at a chart for a good boot, we see that unity-panel-service is started over dbus by gtk-window-decorator, with gtk-window-decorator itself being launched by some part of compiz after a call to /usr/lib/nux/unity_support_test. In the bad boot, we see unity_support_test called, using a bit of CPU, and exiting. So I don't think this is just a timing thing, I think the desktop hasn't started correctly.

Could someone from the desktop team have a look at this?

Changed in ubuntu-boot-speed:
assignee: Steve Langasek (vorlon) → Canonical Desktop Team (canonical-desktop-team)
Revision history for this message
Sebastien Bacher (seb128) wrote :

Is it easy to reproduce the issue? Could we have the .xsession-errors file corresponding to a boot with a broken chart?

Revision history for this message
Max Brustkern (nuclearbob) wrote :

It occurs intermittently, but I can do some runs and try to recreate it. Will it be in the home directory of the user running the bootchart command?

Revision history for this message
Sebastien Bacher (seb128) wrote :

it would be in the directory of the user running the unity session

Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

That's weird, unity-panel-service isn't started by gtk-window-decorator at all (it doesn't know about it), but by the compiz process itself (unity is starting it).

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Here are xsession error files from a case where this is occurring on every reboot. The machine is available to look at, but it won't be for long since we have other jobs we need to run there. I'll try to recreate it on another machine.

Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Gema Gomez (gema)
Changed in ubuntu-boot-speed:
importance: Undecided → High
Revision history for this message
Sebastien Bacher (seb128) wrote :

the xsession-errors log seems to indicate xorg is going away under the session feet, could hit a segfault or something, can you attach Xorg.0.log and Xorg.0.log.old to the bug after getting the issue?

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Here's a Xorg log.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Here's another.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Here's the bootchart times file.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Here's the Xorg log

Revision history for this message
Max Brustkern (nuclearbob) wrote :

Here's the bootchart tarball.

Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Steve Langasek (vorlon) wrote :

The latest boot chart shows a run affected by bug #1124330. The delayed lightdm startup results in the system not fully booting to the desktop by the time the boot charting job gets cut off, making it impossible to calculate full timings.

Bug #1124330 has been worked around from tomorrow's images on, so shouldn't be an ongoing problem. I don't know whether it makes sense to let the boot charting run longer to handle these situations, or if such runs should just be explicitly discarded as failures.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

We were seeing this pretty regularly for a few days, and we haven't seen it at all today or yesterday. If we don't see it tomorrow, I'll be fairly confident that the workaround put in place by the foundations team for correctly detecting the end of the boot process is working.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

I've got another machine where this is occurring. I'm attaching logs.

Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :
Revision history for this message
Max Brustkern (nuclearbob) wrote :

Here is the latest bootchart tgz from when I was gathering logs to file the bug.

Revision history for this message
Steve Langasek (vorlon) wrote :

In this latest boot, unity-panel-service is never started. This seems to be because compiz's opengl plugin fails to load; from xsession-errors:

[...]
compiz (core) - Info: Loading plugin: workarounds
compiz (core) - Info: Starting plugin: workarounds
compiz (core) - Error: Plugin 'opengl' not loaded.
[...]

No mention of any of the unity plugins, just a pair of references to 'opengl' not loading.

So once again, the bug here is *not* that bootchart doesn't output timing data, the bug is that you don't have a valid desktop boot to generate a bootchart against. You will need to debug this on the hardware / install in question.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

I was on the machine, and the unity panel was available and working, so I'll have to take a look once we can recreate this again. From an end-user perspective, the desktop always appears to be working correctly when this bug occurs. I can get some more information on the specific processes running once I can recreate it again.

Changed in ubuntu-boot-speed:
assignee: Canonical Desktop Team (canonical-desktop-team) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.