[0.49.0rc2] systemd stuck at unknown process during booting up

Bug #1841874 reported by Ray Chen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Next Generation Checkbox (CLI)
Fix Released
Critical
Sylvain Pineau

Bug Description

[Summary]
System stuck at booting up and keep waiting some process done, this issue It's seems like occurred on Reboot/Poweroff stress testing.

After force shutdown and reboot, I was trying to Specify default target at Grub:

  systemd.unit=multi-user.target

Then I can boot to terminal and collect systemd log also running QAbro

[Additional information]
CPU: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz (8x)
GPU: 00:02.0 VGA compatible controller: Intel Corporation Device 9b41 (rev 02)
kernel-version: 4.15.0-1042-oem
system-product-name: Inspiron 5594
Image: somerville-bionic-amd64-iso-hybrid-20180608-47
bios-version: 0.5.2
system-manufacturer: Dell Inc.

Related branches

Revision history for this message
Ray Chen (ray.chen) wrote :

Automatically attached

Revision history for this message
Ray Chen (ray.chen) wrote :

Automatically attached

Revision history for this message
Ray Chen (ray.chen) wrote :

Automatically attached

Revision history for this message
Ray Chen (ray.chen) wrote :

Automatically attached

Ray Chen (ray.chen)
summary: - systemd stuck at unknow process after reboot
+ [0.49.0rc2] systemd stuck at unknown process during booting up process
Revision history for this message
Ray Chen (ray.chen) wrote : Re: [0.49.0rc2] systemd stuck at unknown process during booting up process
description: updated
summary: - [0.49.0rc2] systemd stuck at unknown process during booting up process
+ [0.49.0rc2] systemd stuck at unknown process during booting up
Ray Chen (ray.chen)
description: updated
Revision history for this message
Ray Chen (ray.chen) wrote :

After resume checkbox-cli in text mode and launch checkbox-cli again, the job remain to resume was reboot-30, I tried to rerun it but not able to rerun, I forced to quit checkbox-cli by ctrl-c and delete the remain session, reboot the system, The system able to boot up in normal status.

Now I can confirmed that reboot stress may cause issue during boot up.

noted: this platform was I+A

Revision history for this message
Ray Chen (ray.chen) wrote :

Just do some verification on Checkbox 0.48, Run Power managment stress 30(automated)

This issue doesn't occurred on stable 0.48 version.

Changed in plainbox-provider-checkbox:
status: New → Confirmed
Revision history for this message
Ray Chen (ray.chen) wrote :

This issue also occurred on Intel UMA system

Changed in plainbox-provider-checkbox:
importance: Undecided → Critical
Revision history for this message
Ray Chen (ray.chen) wrote :

journalctl log for I+A system

Changed in plainbox-provider-checkbox:
assignee: nobody → Sylvain Pineau (sylvain-pineau)
milestone: none → 0.49.0
Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

@Ray, I've run the reboot 30 stress tests on my I+N, no issues. Could you please record a video on yours from the test plan selection to the hang please?

Changed in plainbox-provider-checkbox:
status: Confirmed → Incomplete
assignee: Sylvain Pineau (sylvain-pineau) → Ray Chen (ray.chen)
Revision history for this message
Ray Chen (ray.chen) wrote :

Per discuession at September 10th and timelapes
https://usercontent.irccloud-cdn.com/file/tnlbdKtU/IMG_2565.MP4

re-Assign to Sylvain

Changed in plainbox-provider-checkbox:
assignee: Ray Chen (ray.chen) → Sylvain Pineau (sylvain-pineau)
Changed in plainbox-provider-checkbox:
status: Incomplete → In Progress
Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

Could be once again the plymouth bug, see:

https://bugs.launchpad.net/plainbox-provider-checkbox/+bug/1829808/comments/16

Scott confirmed he was running tests with 0.9.3-1ubuntu7.18.04.1

The fix for the above issue was delivered with 0.9.3-1ubuntu7.18.04.2

Revision history for this message
Ray Chen (ray.chen) wrote :

Hi,

I was looking for HWE help on this issue to find out what happened.

seems /etc/gdm3/customer.conf was modified as below,

[daemon]
AutomaticLoginEnable = true
AutomaticLogin = root

after remark above 2 lines that I able to login to system.

according to https://fedoraproject.org/wiki/Enabling_Root_User_For_GNOME_Display_Manager

"Fedora uses a Password Authentication Module (PAM) called pam_succeed_if.so. This module is designed to issue an authentication success or failure based on characteristics of the account belonging to the authenticating user. One use is to select whether to load other modules based on this test. This module blocks root login for GDM, and can be toggled on or off as necessary."

so, it's like root try to login > PAM block root login > X failed > gdm re-try root auto login > reject > recycle

There are some information as below:

1. Betty, Scott and I check the /etc/gdm3/customer.conf at very beginning (when stress just start) , the "AutomaticLogin = u", so it's maybe explained why sometime it's stuck at reboot *30 and sometime it's Poweroff 30*

Any information need please let us know, thanks.

Revision history for this message
Ray Chen (ray.chen) wrote :

I'll upgrade plymouth to latest version to see what happened.

Revision history for this message
Ray Chen (ray.chen) wrote :

Test steps:

Upgrade plymouth to 0.9.3-1ubuntu7.18.04.2

Run Powerooff and reboot stress

==Reboot stress started ( also monitor /etc/gdm3/customer.conf every boot)
Reboot stress able to finished
AutomaticLogin always "u",
After reboot test finished, the customer.conf was restore

Then Power off stress start and ask for
"Enter sudo password: "
After enter user password "u" I found AutomaticLogin = root in /etc/gdm3/customer.conf was made

Now I can finally found the change was made when power off start and asking for user password afterward, but I don't know why the "AutomaticLogin" user become to root not the user.

note: with supply
echo "$USER ALL=(ALL:ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/checkbox
the behavior are the same, user was changed to root after poweroff stress started.

Please refer to Video between reboot stress and poweroff started
https://usercontent.irccloud-cdn.com/file/IU2lEYrI/IMG_2604.TRIM.MP4

Revision history for this message
Ray Chen (ray.chen) wrote :

I just test single poweroff and reboot automated.
poweroff are run first and after poweroff finished, the reboot test started, AutomaticLogin = root in /etc/gdm3/customer.conf also changed at reboot started.

Conclusion: AutomaticLogin = root was made when second run start.

Revision history for this message
Scott Hu (huntu207) wrote :

the symptom what i met is AutomaticLogin set to root in customer.conf at first run power off/ reboot.

Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

Reproduced!

The key is indeed to run one of the reboot or poweroff test before the stress tests.
I need to confirm this theory but the consequence are the same, I'm trying to start a Gnome session for the root user (0).

So my theory is that when auto resumes, the terminal where checkbox is restarted is owned by root.

Betty Lin (bettyl)
tags: added: ce-qa-concern
affects: plainbox-provider-checkbox → checkbox-ng
Changed in checkbox-ng:
milestone: 0.49.0 → none
milestone: none → 1.5.0
Changed in checkbox-ng:
status: In Progress → Fix Committed
Revision history for this message
Betty Lin (bettyl) wrote :

Verified the bug passed with I+N (201903-26914)

u@u-Latitude-5501:~$ checkbox-cli --version
checkbox-ng: 1.5.0rc3
checkbox-support: 0.42.0rc1
com.canonical.ce:oem: 1.0
certification-client: 0.38.0rc2
plainbox-provider-checkbox: 0.49.0rc3
plainbox-provider-resource-generic: 0.41.0rc2
plainbox-provider-sru: 1.14.0rc1
plainbox-provider-tpm2: 1.11.0rc2

Revision history for this message
Scott Hu (huntu207) wrote :

Verify passed with UMA config (201907-27235)

checkbox-ng: 1.5.0rc3
checkbox-support: 0.42.0rc1
com.canonical.ce:oem: 1.0
certification-client: 0.38.0rc2
plainbox-provider-checkbox: 0.49.0rc3
plainbox-provider-resource-generic: 0.41.0rc2
plainbox-provider-sru: 1.14.0rc1
plainbox-provider-tpm2: 1.11.0rc2

Revision history for this message
Ray Chen (ray.chen) wrote :

Verify passed with I+A config (201906-27141)
Plan were run:
Single power-off and Reboot automated
Stress power-off and Reboot automated (30 times)
C3 result: https://certification.canonical.com/hardware/201906-27141/submission/151484/

checkbox-ng: 1.5.0rc3
checkbox-support: 0.42.0rc1
com.canonical.ce:oem: 1.0
certification-client: 0.38.0rc2
plainbox-provider-checkbox: 0.49.0rc3
plainbox-provider-resource-generic: 0.41.0rc2
plainbox-provider-sru: 1.14.0rc1
plainbox-provider-tpm2: 1.11.0rc2

tags: added: cqa-verified
Changed in checkbox-ng:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.