[Dell OptiPlex 3011] Sometimes system won't be able to run suspend stress test

Bug #1335026 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Won't Fix
High
AceLan Kao
linux (Ubuntu)
Won't Fix
High
AceLan Kao

Bug Description

CID: 201302-12682 Dell OptiPlex 3011

Sometimes this system cannot be suspended for multiple times with
sudo fwts -P s3 --s3-multiple=5 --s3-device-check --s3-device-check-delay 45
(The first suspend cycle probably would success)

Error could be found in the log file:
S3 cycle 1 of 5
pm-suspend returned 128 after 20 seconds.
FAILED [MEDIUM] ShortSuspend: Test 1, Unexpected: S3 slept for 20 seconds, less
than the expected 30 seconds.
FAILED [HIGH] PMActionPowerStateS3: Test 1, pm-action tried to put the machine
in the requested power state but failed.

When this happened, even suspend with the system menu would fail.

I do have it successfully suspend for 5 times once.

Steps:
1. Install 14.04, boot to desktop
2. Try to suspend this system for multiple times with the command above
3. Reboot and re-test it.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-24-generic 3.13.0-24.46 [modified: boot/vmlinuz-3.13.0-24-generic]
ProcVersionSignature: Ubuntu 3.13.0-24.46-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
NonfreeKernelModules: wl
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 1339 F.... pulseaudio
CurrentDesktop: Unity
Date: Tue Jun 24 02:25:16 2014
HibernationDevice: RESUME=UUID=1877e238-0ac4-41ae-8f13-91e1e3636b73
InstallationDate: Installed on 2014-06-19 (4 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417)
MachineType: Dell Inc. OptiPlex 3011 AIO
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=d6534767-b55e-4ec0-b18e-cb1151963470 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-24-generic N/A
 linux-backports-modules-3.13.0-24-generic N/A
 linux-firmware 1.127
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/22/2013
dmi.bios.vendor: Dell Inc.
dmi.bios.version: X19
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 13
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrX19:bd02/22/2013:svnDellInc.:pnOptiPlex3011AIO:pvr01:rvnDellInc.:rn:rvr:cvnDellInc.:ct13:cvr:
dmi.product.name: OptiPlex 3011 AIO
dmi.product.version: 01
dmi.sys.vendor: Dell Inc.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Daniel Manrique (roadmr)
Changed in linux (Ubuntu):
importance: Undecided → Medium
Ara Pulido (ara)
Changed in linux (Ubuntu):
importance: Medium → High
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Verified with BIOS version from X19 -> A01 -> A02 -> A04, this issue is always there.

When this happens, call trace and a lot of USB related error messages could be found in dmesg
[ 91.593385] usb 3-4: device not accepting address 8, error -71
[ 91.889414] usb 3-4: new high-speed USB device number 10 using xhci_hcd

Also, when rebooting this system or shutting it down after this, it will hang on the splash screen with Ubuntu and dots.
Console shows that those USB error messages keep popping up

Changed in linux (Ubuntu):
assignee: Anthony Wong (anthonywong) → Edward Lin (airken)
Revision history for this message
Edward Lin (airken) wrote :

1. Blocklist lpc_ich and gpio_ich module to eliminate ACPI error.
The bug still exists.

2. Update kernel to 3.16-r3,
The bug still exists.

3. Change USB keyboard to other USB port.
The USB error log disappears and the system is able to reboot into X.

4. The result of fwts indicates that the error is the firmware bug.

By 3 & 4, it may be a hardware or bios bug.?field.comment=1. Blocklist lpc_ich and gpio_ich module to eliminate ACPI error.
The bug still exists.

2. Update kernel to 3.16-r3,
The bug still exists.

3. Change USB keyboard to other USB port.
The USB error log disappears and the system is able to reboot into desktop.

4. The result of fwts indicates that the error is the firmware bug.

By 3 & 4, it should be a hardware/bios bug.

Revision history for this message
Anthony Wong (anthonywong) wrote :

Ivan, can you check if those firmware errors are related to the suspend failure?

Changed in linux (Ubuntu):
assignee: Edward Lin (airken) → Ivan Hu (ivan.hu)
Revision history for this message
Ivan Hu (ivan.hu) wrote :

the error,
 s3: HIGH Kernel message: [ 457.594493] ACPI Error: Field [DRQL] at 144 exceeds Buffer [NULL] size 128 (bits) (20131115/dsopcode-236)
 s3: HIGH Kernel message: [ 457.594498] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.SIO1.DSRS] (Node ffff880079715c58), AE_AML_BUFFER_LIMIT (20131115/psparse-536)
 s3: HIGH Kernel message: [ 457.594502] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.UAR1._SRS] (Node ffff880079715ed8), AE_AML_BUFFER_LIMIT (20131115/psparse-536)

is from buggy firmware, it occured on other Dell's machines. Don't have acpidump on this machine, so past on the analysis on that machine.

When s3 resume, will set the resource, first call the _CSR of the UAR1 device to check the current resource, then set the resoure via _SRS,
the UART1 _CRS call SIO1.DCRS, then return CRS1 -> buffer size 128 (bits), but _SRS call SIO1.DSRS -> but operate with the buffer size 144(bits)
                        CreateByteField (Arg0, 0x11, DRQL)
                        CreateByteField (Arg0, 0x14, DMAC)
so the error is from, the current resource tell kernel using 128bits, but the set resource operate 144bits.

see the ACPI table,
the _CSR->DCSR always return CRS1,
                    Method (DCRS, 2, NotSerialized)
                    {
                        If (LEqual (Arg0, Zero))
                        {
                            ENFG (0x0C)
                            Store (CR6A, Local0)
                            Store (Local0, IOLO)
                            Store (CR6B, Local0)
                            Store (Local0, IOHI)
                            Store (IOLO, IORL)
                            Store (IOHI, IORH)
                            Store (0x08, LNA1)
                            Store (GIRQ (Arg0), IRQL)
                            EXFG ()
                            Return (CRS1)
                        }

                        Return (CRS1)
                    }

but the _SRS, parse/execution the resoure exceeds the CSR1 buffer,
                    Method (DSRS, 2, NotSerialized)
                    {
                        CreateByteField (Arg0, 0x02, IOLO)
                        CreateByteField (Arg0, 0x03, IOHI)
                        CreateWordField (Arg0, 0x09, IRQL)
                        CreateByteField (Arg0, 0x11, DRQL)
                        CreateByteField (Arg0, 0x14, DMAC)
                        If (LEqual (Arg1, Zero))
                        {
                            ENFG (0x0C)
                            STIO (0x6A, IOLO, IOHI, Zero)
                            SIRQ (Arg1, IRQL)
                            EXFG ()
                            DCNT (Arg1, One)
                        }
                    }
The errors above seem to cause from a buggy firmware, and need to be checked.
It might impact the UART devices, but driver might take care the UART functions.

These failures should not impact the S3 function.

Revision history for this message
Ivan Hu (ivan.hu) wrote :

As for the fwts S3 test find device configuration differences.
 s3: Found 1 differences in device configuation during S3 cycle.

This mean fwts checks the the device configuration between sleep and wake, and it found a device configuration is different. This is not the cause of S3 fail but it shows the device might have problem before and after sleep.

compare device configuration, it shows
  Device: event5
  Device Name: Laptop_Integrated_Webcam_HD
  Phy: usb-0000:00:14.0-4/button
is gone after waking up.

All clues point to USB might have problem, need the USB expert to look into it.

Changed in linux (Ubuntu):
assignee: Ivan Hu (ivan.hu) → Anthony Wong (anthonywong)
Changed in linux (Ubuntu):
assignee: Anthony Wong (anthonywong) → Gavin Guo (mimi0213kimo)
Changed in hwe-next:
assignee: nobody → Gavin Guo (mimi0213kimo)
importance: Undecided → High
status: New → Triaged
Changed in hwe-next:
assignee: Gavin Guo (mimi0213kimo) → Adam Lee (adam8157)
Changed in linux (Ubuntu):
assignee: Gavin Guo (mimi0213kimo) → Adam Lee (adam8157)
Adam Lee (adam8157)
Changed in hwe-next:
status: Triaged → Confirmed
Adam Lee (adam8157)
Changed in linux (Ubuntu):
assignee: Adam Lee (adam8157) → Anthony Wong (anthonywong)
Changed in hwe-next:
assignee: Adam Lee (adam8157) → Anthony Wong (anthonywong)
Revision history for this message
Anthony Wong (anthonywong) wrote :

@PHLin

I think there are 2 things we can do:
1. test a mainline kernel
2. Update the bios, BIOS A12 (http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=XTFRC&fileId=3403816568&osCode=W764&productCode=optiplex-3010&categoryId=BI) has this fix:
- Fixed Microsoft Windows sleep potential issue.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Changed in hwe-next:
status: Confirmed → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

With 3.19.0-rc2, the same ACPI error messages still could be found in the test report:
 s3: HIGH Kernel message: [ 259.859494] ACPI Error: Field [DRQL] at 144 exceeds Buffer [NULL] size 128 (bits) (20141107/dsopcode-236)
 s3: HIGH Kernel message: [ 259.859501] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.SIO1.DSRS] (Node ffff880100509c58), AE_AML_BUFFER_LIMIT (20141107/psparse-536)
 s3: HIGH Kernel message: [ 259.859506] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.UAR1._SRS] (Node ffff880100509ed8), AE_AML_BUFFER_LIMIT (20141107/psparse-536)

Also, usb-related error messages still could be found in dmesg output:
[ 789.574361] usb 1-4: reset high-speed USB device number 37 using xhci_hcd
[ 789.574426] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 28.
[ 789.574428] usb 1-4: hub failed to enable device, error -22
[ 789.646845] mei_me 0000:00:16.0: hbm: properties response: wrong status = 1 CLIENT_NOT_FOUND
[ 789.646847] mei_me 0000:00:16.0: mei_irq_read_handler ret = -71.
[ 789.646861] mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS fw status = 1E000255 60000106
[ 789.686227] usb 1-4: reset high-speed USB device number 37 using xhci_hcd
[ 789.686263] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 28.
[ 789.686264] usb 1-4: hub failed to enable device, error -22

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

BTW, this system finished the 30 cycle suspend test with 3.19.0-rc2 kernel, and it doesn't have this kind of reboot stuck issue.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

With BIOS upgraded to A06 [1] (the A12 BIOS here is for Optiplex 3010), nothing changed on 3.13.0-rc2.
It finished the 30 cycle suspend test again, with the same error message in dmesg and fwts report.

Also, the failed to be enabled device: usb 1-4, which I guess is the webcam, still working after suspend.

/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/6p, 480M
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/6p, 480M
        |__ Port 1: Dev 3, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
        |__ Port 2: Dev 4, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
        |__ Port 3: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 480M
        |__ Port 4: Dev 6, If 0, Class=Mass Storage, Driver=usb-storage, 480M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
    |__ Port 2: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M
    |__ Port 4: Dev 126, If 0, Class=Video, Driver=uvcvideo, 480M
    |__ Port 4: Dev 126, If 1, Class=Video, Driver=uvcvideo, 480M

Test case:
1. Boot to desktop, open cheese, make sure the webcam is working correctly
2. Run sudo fwts -P s3 --s3-multiple=30 --s3-device-check --s3-device-check-delay 45
3. Check if the webcam is still working.

[1] http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=NFFGR&fileId=3401380743&osCode=W764&productCode=optiplex-3011-aio&languageCode=EN&categoryId=BI

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Oops, the kernel mentioned in comment #14 should be 3.19.0-rc2

Also, I tried to do multiple manual suspend on this system with 3.19.0-rc2, on the 7th cycle, the webcam dropped.
/dev/video0 disappeared, and the webcam disappeared from lsusb output.

Attachment is the dmesg output when this happens.

Changed in hwe-next:
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in hwe-next:
assignee: Anthony Wong (anthonywong) → AceLan Kao (acelankao)
Changed in linux (Ubuntu):
assignee: Anthony Wong (anthonywong) → AceLan Kao (acelankao)
AceLan Kao (acelankao)
Changed in hwe-next:
status: Confirmed → In Progress
Revision history for this message
AceLan Kao (acelankao) wrote :

unbind the usb camera, and then it can pass S3 100 times.
I'll try tracing the camera driver to figure out the issue.

AceLan Kao (acelankao)
Changed in hwe-next:
status: In Progress → Triaged
Changed in hwe-next:
status: Triaged → Won't Fix
AceLan Kao (acelankao)
Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.