suspend/cycle_resolutions_after_suspend can not be executed properly

Bug #1382321 reported by Jerry Kao
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Checkbox Provider - Base
Fix Released
High
Sylvain Pineau

Bug Description

When running suspend/cycle_resolutions_after_suspend on HP Z820-2, plainbox take a picture a screenshot then keep running as attached picture. The process is never end.
Restart system and rerun test for 3 times, the situation is the same.
Can switch all resolutions manually without problem.

BIOS J63 v03.69
plainbox version
ii plainbox 0.13.1~ppa~ubuntu14.04.1 all toolkit for software and hardware integration testing
ii plainbox-glmark2-es2-meta 0.8~ppa~ubuntu14.04.1 amd64 metapackage to selectively install glmark2-es2
ii plainbox-insecure-policy 0.13.1~ppa~ubuntu14.04.1 all policykit policy required to use plainbox (insecure version)
ii plainbox-provider-certification-client 0.8~ppa~ubuntu14.04.1 all Client Certification
ii plainbox-provider-checkbox 0.12~ppa2~ubuntu14.04.1 amd64 CheckBox provider for PlainBox
ii plainbox-provider-oem 0.1ubuntu28 all plainbox oem provider
ii plainbox-provider-oem-stella 0.1ubuntu28 all plainbox oem provider for stella
ii plainbox-provider-resource-generic 0.10~ppa~ubuntu14.04.1 amd64 CheckBox generic resource jobs provider
ii python3-plainbox 0.13.1~ppa~ubuntu14.04.1 all toolkit for software and hardware testing (python3 module)

Related branches

Revision history for this message
Jerry Kao (jerry.kao) wrote :
tags: added: ce-qa-concern
Revision history for this message
Jerry Kao (jerry.kao) wrote :
description: updated
Zygmunt Krynicki (zyga)
affects: checkbox-support → checkbox
Zygmunt Krynicki (zyga)
Changed in checkbox:
assignee: nobody → Zygmunt Krynicki (zkrynicki)
Jerry Kao (jerry.kao)
Changed in checkbox:
status: New → Confirmed
importance: Undecided → High
description: updated
Revision history for this message
Pierre Equoy (pieq) wrote :

Similar issue happens when running the test graphics/1_cycle_resolution_ with another laptop (it's the first time I see this issue, when testing on other laptops this test passes without problems).

$ dpkg -l | grep plainbox
ii plainbox 0.16~ppa~ubuntu14.04.1 all toolkit for software and hardware integration testing
ii plainbox-glmark2-es2-meta 0.11~ppa~ubuntu14.04.1 amd64 metapackage to selectively install glmark2-es2
ii plainbox-insecure-policy 0.16~ppa~ubuntu14.04.1 all policykit policy required to use plainbox (insecure version)
ii plainbox-provider-certification-client 0.11~ppa~ubuntu14.04.1 all Client Certification
ii plainbox-provider-checkbox 0.15~ppa2~ubuntu14.04.1 amd64 CheckBox provider for PlainBox
ii plainbox-provider-oem 0.1ubuntu32 all plainbox oem provider
ii plainbox-provider-oem-stella 0.1ubuntu32 all plainbox oem provider for stella
ii plainbox-provider-resource-generic 0.13~ppa~ubuntu14.04.1 amd64 CheckBox generic resource jobs provider
ii python3-plainbox 0.16~ppa~ubuntu14.04.1 all toolkit for software and hardware testing (python3 module)

Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

Would it be possible to access one of those systems (from yantok) to help debugging this issue?

Zygmunt Krynicki (zyga)
Changed in checkbox:
status: Confirmed → In Progress
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

So with the debugging we've done with Pierre Equoy (on site) I know that:

- It's not a bug in the framework. The programs we run via the job just hang
- It's not something I can reproduce here. It could be ubuntu version difference, driver difference or something entirely different
- The test procedure requires the tester to log in as guest once. This may have important ramifications on the state of the machine after that happens. We didn't manage to re-test this issue (we'll try tomorrow) after a clean boot. In general, we should really know this and understand how it affects the testing stack.
- The failure *is* reproducible on that machine with 'plainbox dev script 2013.com.canonical.certification::suspend/cycle_resolutions_after_suspend" (which doesn't run dependencies and can be tried quickly)

I'm currently blocked on access to that machine (it's on a network not available externally) so I cannot proceed.

Oh, and also, given this information, I've moved it to the provider. Any fix we can do is probably related to the provider's test itself. The core could only add a feature to time out and fail tests after a given amount of time (which is now possible with the new glib test runner, even if the machine itself is suspended!).

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I've got access to the affected hardware. I'll debug the issue tomorrow.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I've confirmed that the test command itself just hangs. Probably, there's an interactive prompt on screen. Trying to confirm that now. To reproduce this issue, run:

DISPLAY=:0 PLAINBOX_PROVIDER_DATA=/usr/share/2013.com.canonical.certification\:checkbox/data /usr/lib/2013.com.canonical.certification\:checkbox/bin/xrandr_cycle --keyword=after_suspend --screenshot-dir .

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Testing via VNC shows nothing suspicious except that there's no actual screen size change as shutter is still blocking (it doesn't quit). I'll try to reproduce that on 14.04 locally.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This just hangs, with shutter still active as a process (sleeping)

u@<hostname>:~$ PLAINBOX_PROVIDER_DATA=/usr/share/2013.com.canonical.certification\:checkbox/data /usr/lib/2013.com.canonical.certification\:checkbox/bin/xrandr_cycle --keyword=after_suspend --screenshot-dir .
defined(@array) is deprecated at /usr/bin/shutter line 3736.
 (Maybe you should just omit the defined()?)
defined(@array) is deprecated at /usr/bin/shutter line 3747.
 (Maybe you should just omit the defined()?)
WARNING: gnome-web-photo is missing --> screenshots of websites will be disabled!

WARNING: Net::DBus::GLib is missing --> Ubuntu One support will be disabled!

WARNING: Image::ExifTool is missing --> writing Exif information will be disabled!

*** unhandled exception in callback:
*** Permission denied at /usr/share/perl5/Shutter/App/Autostart.pm line 59.
*** ignoring at /usr/bin/shutter line 2893.
WARNING: XFIXES extension not found - using a default cursor image
*** unhandled exception in callback:
*** Permission denied at /usr/share/perl5/Shutter/App/Autostart.pm line 59.
*** ignoring at /usr/bin/shutter line 2893.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Regarding the perl exception that we see. On that machine the test user is a member of the following groups:

uid=1001(u) gid=1001(u) groups=1001(u),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),110(lpadmin),125(sambashare)

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

$ apt-cache policy shutter
shutter:
  Installed: 0.90.1-0ubuntu1.1
  Candidate: 0.90.1-0ubuntu1.1
  Version table:
 *** 0.90.1-0ubuntu1.1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe amd64 Packages
        100 /var/lib/dpkg/status
     0.90.1-0ubuntu1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages

Looking at shutter source code I see that it's trying to open "$dir/shutter.destktop" for writing (I think, my perl is rusty) and dies. I wonder why that happens in the first place and what's dir. Looking deeper

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Ok. I've traced this to ~/.config/autostart being owned by root.

I'm going to INVALID this bug. This is not something we can fix.

In general, if the tester is using root to run the whole test toolkit for *ANY* reason they are responsible to chown stuff back to the normal user after testing is finished. Otherwise stuff like that will happen again and again, each time costing us a lot of resources to trace.

I'm inclined to add a sanity check validation that will check if the user is running as root and if so, create a special file ~/.local/share/plainbox/was-started-as-root-in-the-past. Seeing that file plainbox will enable costly validation on each start-up, to bail out if there are *any* files owned by root in $HOME.

Changed in checkbox:
status: In Progress → Invalid
Zygmunt Krynicki (zyga)
Changed in checkbox:
assignee: Zygmunt Krynicki (zkrynicki) → nobody
Revision history for this message
Pierre Equoy (pieq) wrote :

I'm running into this problem again.

After checking, ~/.config/autostart is owned by root, but I've done the install myself and I'm sure I didn't run any command with sudo, so I don't know why it's root... This lead to the same problem when running into the test that uses shutter to take several screenshots, as already described by Jerry above.

Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

After greping autostart i nthe checkbox provider I found that the pm_test script (always started as root in our job commands) has the following code:

        default_config_directory = os.path.expanduser('~{0}/.config'
                                                      .format(username))
        config_directory = os.getenv('XDG_CONFIG_HOME',
                                     default_config_directory)
        autostart_directory = os.path.join(config_directory, 'autostart')
        if not os.path.exists(autostart_directory):
            os.makedirs(autostart_directory)

So if the SUT does not have an autostart folder, this script will create it with root as owner.

@Pierre, could you confirm that you run some power management stress tests before the cycle_resolutions?

Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

We have to change ownership of .config/autostart back to the normal user.

2 use cases:

- the script is run by sudo
- the script is run by pkexec

Changed in checkbox:
status: Invalid → Triaged
affects: checkbox → plainbox-provider-checkbox
Changed in plainbox-provider-checkbox:
milestone: none → 0.16
Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

From http://www.freedesktop.org/software/polkit/docs/0.105/pkexec.1.html:

    "In addition the PKEXEC_UID environment variable is set to the user id of the process invoking pkexec."

Changed in plainbox-provider-checkbox:
assignee: nobody → Sylvain Pineau (sylvain-pineau)
status: Triaged → In Progress
Zygmunt Krynicki (zyga)
Changed in plainbox-provider-checkbox:
status: In Progress → Fix Committed
Changed in plainbox-provider-checkbox:
status: Fix Committed → Fix Released
Revision history for this message
Roxanne Fan (matrixf) wrote :

@Sylvain,
Is there anyone verified the bug before it turned into fix release?

Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

@Roxanne

I reproduced the bug and identified the root cause. It was finally checkbox fault as another job was preventing
suspend/cycle_resolutions_after_suspend to work properly. Gavin confirmed that before releasing on W49 he did a run w/o problems.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.