11ab:4363 sky2 driver (Marvell Ethernet devices) problem after resume

Bug #1007841 reported by mlx
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
New
Undecided
auto-netdev
linux (Ubuntu)
Incomplete
Low
Unassigned

Bug Description

After resuming from suspend, a Marvel 88E8055 (wired NIC) will fail to detect carrier (link is permanently down). Reloading the sky2 module does not help - the module will fail to initialize the hardware and the eth0 interfaces is gone until reboot.

The suspend/resume for this NIC has worked without issues in 10.10 and some older releases I've had permanently installed. I haven't really tested this with other releases that have come and gone since - which is the best course of action here:
- testing older releases, or
- obtaining older kernel versions (from mainline ppa?) and testing those?

Upstream contacted: http://www.spinics.net/lists/netdev/msg203017.html
http://marc.info/?l=linux-pm&m=134229404527442&w=2

WORKAROUND:
Delete the NIC from the system and have kernel perform a rescan on the PCI bus:
echo 1 > /sys/class/net/eth0/device/remove ; sleep 1;\
echo 1 > /sys/class/pci_bus/0000\:00/rescan

Adjust PCI bus number as appropriate.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-24-generic 3.2.0-24.39
ProcVersionSignature: Ubuntu 3.2.0-24.39-generic 3.2.16
Uname: Linux 3.2.0-24-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC262 Analog [ALC262 Analog]
   Subdevices: 0/1
   Subdevice #0: subdevice #0
ApportVersion: 2.0.1-0ubuntu8
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: michalp 1932 F.... pulseaudio
 /dev/snd/pcmC0D0p: michalp 1932 F...m pulseaudio
CRDA:
 country SK:
  (2402 - 2482 @ 40), (N/A, 20)
  (5170 - 5250 @ 40), (N/A, 20)
  (5250 - 5330 @ 40), (N/A, 20), DFS
  (5490 - 5710 @ 40), (N/A, 27), DFS
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xfc400000 irq 47'
   Mixer name : 'Realtek ALC262'
   Components : 'HDA:10ec0262,17340000,00100202'
   Controls : 33
   Simple ctrls : 19
Date: Sat Jun 2 17:08:19 2012
InstallationMedia: Kubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120423)
MachineType: FUJITSU SIEMENS ESPRIMO Mobile U9200
ProcEnviron:
 LANGUAGE=
 TERM=xterm
 LANG=sk_SK.UTF-8
 LC_MESSAGES=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-24-generic root=UUID=e8d55f9c-868e-4354-afc7-f15447baa72b ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-24-generic N/A
 linux-backports-modules-3.2.0-24-generic N/A
 linux-firmware 1.79
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/22/2008
dmi.bios.vendor: Phoenix
dmi.bios.version: 1.11 - 067 - 1566
dmi.board.name: S11D
dmi.board.vendor: FUJITSU SIEMENS
dmi.board.version: 1.0
dmi.chassis.asset.tag: FSC
dmi.chassis.type: 10
dmi.chassis.vendor: FUJITSU SIEMENS
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnPhoenix:bvr1.11-067-1566:bd04/22/2008:svnFUJITSUSIEMENS:pnESPRIMOMobileU9200:pvr1.0:rvnFUJITSUSIEMENS:rnS11D:rvr1.0:cvnFUJITSUSIEMENS:ct10:cvr1.0:
dmi.product.name: ESPRIMO Mobile U9200
dmi.product.version: 1.0
dmi.sys.vendor: FUJITSU SIEMENS

Revision history for this message
mlx (myxal-mxl) wrote :
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
mlx (myxal-mxl) wrote :

Oh, I thought the dmesg would be added in full by ubuntu-bug. Here's dmesg grepped for sky2.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.5kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5-rc1-quantal/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
mlx (myxal-mxl) wrote :

Tested with -rc2. Link detection is still getting lost, but the dmesg output seems different:

[ 159.669517] sky2 0000:04:00.0: eth0: Link is up at 1000 Mbps, full duplex, flow control both
[ 164.028333] sky2 0000:04:00.0: eth0: disabling interface
[ 167.396068] sky2 0000:04:00.0: Refused to change power state, currently in D3
[ 169.490630] sky2 0000:04:00.0: eth0: enabling interface
[ 439.928861] sky2 0000:04:00.0: eth0: disabling interface
[ 448.916156] sky2 0000:04:00.0: wake-up capability enabled by ACPI
[ 449.228068] sky2 0000:04:00.0: Refused to change power state, currently in D3
[ 449.256908] sky2 0000:04:00.0: wake-up capability disabled by ACPI
[ 452.083650] sky2 0000:04:00.0: eth0: enabling interface

tags: added: kernel-bug-exists-upstream
removed: needs-upstream-testing
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
mlx (myxal-mxl) wrote :

Interesting observation: The problem only seems to occur when I wake up the system by opening the lid. When I suspend using the GUI and wake up by pressing the PWR button, the NIC works as it should. This behaviour is the same for both ubuntu and mainline kernel.

Revision history for this message
mlx (myxal-mxl) wrote :

Hello again. After doing my first bisection, I found the offending commit to be:

commit 7afe1845dd1e7c90828c942daed7e57ffa7c38d6
Author: Sameer Nanda <email address hidden>
Date: Mon Jul 25 17:13:29 2011 -0700

>init: skip calibration delay if previously done
>
>For each CPU, do the calibration delay only once. For subsequent calls,
>use the cached per-CPU value of loops_per_jiffy.
>
>This saves about 200ms of resume time on dual core Intel Atom N5xx based
>systems. This helps bring down the kernel resume time on such systems
>from about 500ms to about 300ms.

Now, not being familiar with ubuntu's (or anyone's) development process, I'm a bit confused as to which kernel version this occurred in. Before I tried bisect, I just compiled a few of the tagged revisions and concluded the faulty patch was committed sometime in October or November in kernel version 3.1.0-1.1, after 3.1.0-1.0. Then when I performed the bisection, the kernel version according to Makefile was 3.0.0, and the commits were dated around July.

Is the commit number applicable to mainline git repository (the bisection was done on a clone of ubuntu-precise repo), or do I need a translation to mainline git commit numbers?

Revision history for this message
mlx (myxal-mxl) wrote :

Bug tracking moved to linux-netdev: http://www.spinics.net/lists/netdev/msg203017.html
Attaching 7z-compressed mmiotraces (good and bad).

penalvch (penalvch)
summary: - sky2 driver (Marvell Ethernet devices) problem after resume
+ 11ab:4363 sky2 driver (Marvell Ethernet devices) problem after resume
Revision history for this message
penalvch (penalvch) wrote :

Marking Triaged as mainline tested, kernel commit bisected, and upstream contacted.

tags: added: bisect-done kernel-bug-exists-upstream-v3.5-rc1-quantal regression-release
Changed in linux (Ubuntu):
status: Confirmed → Triaged
penalvch (penalvch)
description: updated
Revision history for this message
Chris Cheney (ccheney) wrote :

Is there any progress on this bug? I am seeing the same thing on my system with up to date 13.04 after having recently reinstalled it. It had been working for 7 years (!) before this problem finally cropped up. I'm guessing if it is timing related perhaps the slight reconfiguration is what exposed the issue for me as well.

Revision history for this message
Chris Cheney (ccheney) wrote :

I just realized something I changed on my system when I reconfigured it which turns out to have allowed the nic to work previously. I disabled an extra sata controller on the system which I wasn't using anymore and then the nic would no longer work. I enabled it again and it works now, lol.

This definitely appears to be a timing issue!

My particular setup is a Gigabyte GA-965P-DQ6 motherboard which has a Marvell 88e8056 gigabit nic.

Revision history for this message
Chris Cheney (ccheney) wrote :

I should also note that this appears to affect more than just Linux. I had originally thought that Windows 7 somehow broke my nic because I temporarily loaded it on the box to update firmware on some of my other devices. The nic would not work after suspend on Windows 7 either even with the current Marvell drivers from their site.

So this problem is probably that the Marvell yukon ethernet controllers do not resume from suspend within the time allotted by spec, which sometimes is masked by other devices like an extra sata controller in my case and apparently was also previously masked by the calibration delay that mlx git bisected down to.

Chris Cheney (ccheney)
description: updated
Revision history for this message
mlx (myxal-mxl) wrote :

Hello Chris.
Regarding bug progress - sorry, no. I'm not that experienced with kernel code and building modules.
With that said, I have installed Arch some 2 months ago (kernel 3.10, now it's at 3.11), and a month ago a daily build of Kubuntu 13.10 (kernel 3.11-rc-something, now updated to stable 3.11). While I've used each only for about 2 weeks, I don't recall this bug occurring even once - I certainly cannot induce it by suspending and resuming immediately, something that DOES happen with Kubuntu 12.10, which I also keep around.
I suggest you try a new kernel and see what it does.

mlx (myxal-mxl)
description: updated
description: updated
Revision history for this message
penalvch (penalvch) wrote :

mlx, as per the upstream discussion, they asked you to contact the linux-pm mailing list. Did you get a chance to e-mail them?

Revision history for this message
mlx (myxal-mxl) wrote :
Revision history for this message
mlx (myxal-mxl) wrote :

Actually, that's the same link that's already posted in the summary since a year ago.

penalvch (penalvch)
tags: added: latest-bios-1.11
Revision history for this message
penalvch (penalvch) wrote :

mlx, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please just make a comment to this?

If reproducible, could you also please test the latest upstream kernel via http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13-rc6-trusty/ and advise to the results?

Changed in linux (Ubuntu):
importance: Medium → Low
status: Triaged → Incomplete
Revision history for this message
mlx (myxal-mxl) wrote :

Tested with today's daily (kernel 3.12.0-7), no issues on 2 susped/resume cycles.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.