Sky2 ethernet failing randomly with Marvell 88E8056 gigabit
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Invalid
|
Medium
|
|||
linux (Ubuntu) |
Invalid
|
Medium
|
Unassigned | ||
linux-source-2.6.22 (Ubuntu) |
Won't Fix
|
High
|
Unassigned |
Bug Description
Binary package hint: linux-source-2.6.22
Since Ubuntu Feisty the sky2 ethernet adapters have not been stable. Also after the bugfixes in kernel 2.6.22-11.32 (which is my current kernel) the failures continue to happen. Earlier on the problem appeared while doing big file transfers (>1 GB), as of 2.6.22-11.32 it now looks more like random freezes (without traffic). Doing a restart of the network (/etc/init.
In the the dmesg log I find usually the following information after a crash:
[ 8767.471702] eth1: hw csum failure.
[ 8767.471723] [<c02813ec>] __skb_checksum_
[ 8767.471739] [<c02813f8>] __skb_checksum_
[ 8767.471744] [<c02c2d30>] tcp_v4_
[ 8767.471751] [<c02762f0>] pci_conf1_
[ 8767.471766] [<c02776f9>] pci_read+0x29/0x30
[ 8767.471792] [<c02a51d2>] ip_local_
[ 8767.471814] [<c02a4dfa>] ip_rcv+0x2ea/0x5a0
[ 8767.471829] [<f8a674aa>] packet_
[ 8767.471837] [<c0280672>] __netdev_
[ 8767.471855] [<c0284677>] netif_receive_
[ 8767.471883] [<f89b503b>] sky2_poll+
[ 8767.471938] [<c0286978>] net_rx_
[ 8767.471958] [<c012d6b2>] __do_softirq+
[ 8767.471976] [<c012d795>] do_softirq+
[ 8767.471981] [<c012da7d>] irq_exit+0x6d/0x80
[ 8767.471984] [<c0106b20>] do_IRQ+0x40/0x70
[ 8767.472004] [<c0105223>] common_
[ 8767.472032] [<c01022b6>] mwait_idle_
[ 8767.472041] [<c01022d0>] mwait_idle+0x0/0x20
[ 8767.472048] [<c0102413>] cpu_idle+0x53/0xe0
[ 8767.472062] [<c03e3a85>] start_kernel+
[ 8767.472069] [<c03e31f0>] unknown_
[ 8767.472095] =======
Is there any hope this bug will be resolved, or should I just change to the proprietary sk86lin driver?

|
#64 |

|
#65 |
CentOS has older version of driver please update to latest version from 2.6.22.6 or 2.6.23-rc4. There are several bugs that caused tx timeouts (hung chip),
and a problem that led to PHY clock issues.

|
#66 |
The kernel version I encountered this on is 2.6.23-rc4, as marked in the bug report and is why I chose CentOS 4.5 install + "vanilla" kernel 2.6.23-rc4" under "Software Environment".

|
#67 |
Please enable the sky2 debugfs kernel configuration option.
Mount debugfs on somewhere (/debug)
Hang system then capture sky2 state. (cat /debug/sky2/eth0 >savefile)
It will show the status of IRQ and receive/transmit.

|
#68 |
Rebuilt 2.6.23-rc5 with SKY2_DEBUG. I've reproduced the issue where ifdown/ifup does not reset the interface properly.
# cat /debug/sky2/eth0
IRQ src=0 mask=c000001d control=0
Status ring (empty)
Tx ring pending=24...24 report=24 done=24
Rx ring hw get=169 put=169 last=1023

DrCore (launchpad-drsdre) wrote : | #1 |
- Full dmesg log Edit (38.1 KiB, text/plain)
Additional information
lspci:
00:00.0 Host bridge: Intel Corporation 82925X/XE Memory Controller Hub (rev 0e)
00:01.0 PCI bridge: Intel Corporation 82925X/XE PCI Express Root Port (rev 0e)
00:1b.0 Audio device: Intel Corporation 82801FB/
00:1c.0 PCI bridge: Intel Corporation 82801FB/
00:1c.1 PCI bridge: Intel Corporation 82801FB/
00:1c.2 PCI bridge: Intel Corporation 82801FB/
00:1d.0 USB Controller: Intel Corporation 82801FB/
00:1d.1 USB Controller: Intel Corporation 82801FB/
00:1d.2 USB Controller: Intel Corporation 82801FB/
00:1d.3 USB Controller: Intel Corporation 82801FB/
00:1d.7 USB Controller: Intel Corporation 82801FB/
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d4)
00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface Bridge (rev 04)
00:1f.1 IDE interface: Intel Corporation 82801FB/
00:1f.2 IDE interface: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 82801FB/
01:03.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer Controller (rev 01)
01:04.0 Mass storage controller: <pci_lookup_name: buffer too small> (rev 13)
01:05.0 Mass storage controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15)
05:00.0 VGA compatible controller: nVidia Corporation Unknown device 0402 (rev a1)

DrCore (launchpad-drsdre) wrote : | #2 |
- lspci -vvnnn Edit (13.1 KiB, text/plain)
More additional information:
uname -a:
Linux drebuntum 2.6.22-11-generic #1 SMP Fri Sep 7 05:07:05 GMT 2007 i686 GNU/Linux

Mahmoud Kassem (mmkassem) wrote : | #3 |
I have the same problem
Motherboard: Gigabyte 965P-DS3
Network: Marvell 88E8056
tested also with Gusty Live CD,
$ uname -a
Linux ubuntu 2.6.22-10-generic #1 SMP Wed Aug 22 08:11:52 GMT 2007 i686 GNU/Linux
$ lspci
04:00.0 Ethernet controller: Marvell Technology Group Ltd. Unknown device 4364 (rev 12)
(It is detected correct as 88E8056 in Feisty)
/var/log/messages full of:
sky2 eth0: rx error, status 0x5ca0002 length 1482
and some times:
sky2 eth0: rx error, status 0x5ca0002 length 1514
after the eth0 fail, these commands seems to make it work again with reboot:
sudo ifdown eth0; sudo ifup eth0;

Mahmoud Kassem (mmkassem) wrote : | #4 |
correction:
after the eth0 fail, these commands seems to make it work again **without** reboot:
sudo ifdown eth0; sudo ifup eth0;

|
#69 |
I can confirm that we can reproduce this issue (or one nearly identical to it). We are using the current stable 2.6.22.6 kernel on a system with a Marvell 88E8055 (Panasonic Toughbook CF-74).
To reproduce it, we can open any kind of persistent socket connection (such as an Apache SSL connection using a browser) and then yank the cable. We wait a bit and pop the cable back in and the driver is dead. We can't ping in or out until we down the interface, remove and reinsert the sky2 driver and bring the interface back up.
I will be happy to provide any info or test any patches you provide.

xtknight (xt-knight) wrote : | #5 |
I'm having the same problems, without any error messages in /var/log/messages. Also on a Gigabyte GA-965P-DS3.
Mine is detected like this in Gutsy too: 04:00.0 Ethernet controller: Marvell Technology Group Ltd. Unknown device 4364 (rev 12)
I can't remember how it was reported in Feisty.
Setting to confirmed. This needs to be addressed and I know it's not just me. Should be at least medium priority, probably affects TONS of people with new motherboards.

xtknight (xt-knight) wrote : | #6 |
This problem needs to be addressed.
Changed in linux-source-2.6.22: | |
status: | New → Confirmed |

|
#70 |
I'm having trouble here too. Using an Ubuntu Gutsy kernel:
Linux andy-desktop 2.6.22-10-generic #1 SMP Wed Aug 22 07:42:05 GMT 2007 x86_64 GNU/Linux
I'm not getting tx timeouts AFAIK. I'm not getting any driver crash dumps either. I'm just having connection issues. I'm not transferring anything big. I will be browsing the web, then all of a sudden the interface will get in some type of corrupted state where nothing works. Sometimes ifdown/ifup will do it, sometimes it will not. Sometimes dhclient works, sometimes not. Unloading sky2 and reloading it *always* fixes the problem, indicating some type of issue with the "current state" of the driver. Maybe a variable not getting cleared/etc but I can only guess.
Sometimes ifdown/ifup will work and then it will only work for about a minute. Redoing ifdown/ifup will make sky2 work for another few hours (it's like refilling your gas tank, just on a smaller level ;)).
Sometimes I will get Destination Host Unreachable from pinging my router, sometimes ping says nothing at all.
I tried with the modprobe sky2 debug=16 option but the log output looks not much different from when the adapter is working. And, I haven't caught it just when it stopped working, yet. I have only turned on my monitor to notice that my net wasn't working and then dumped a few logs of it. In any case, I don't think they're helpful but if you need them I will gladly post them.
Most importantly, this is a regression from 2.6.20. I hope this can get fixed and if so I'll notify those at Ubuntu and get this into the kernel and hopefully an exception for it if necessary.
Ubuntu bug link: https:/
Changed in linux: | |
status: | Unknown → Confirmed |

Brian Murray (brian-murray) wrote : | #7 |
Kernel team bug work flow indicates that confirmed bugs should be assigned to the kernel team, so in addition to adding an importance, I am assigning the bug to the kernel team.
Changed in linux-source-2.6.22: | |
assignee: | nobody → ubuntu-kernel-team |
importance: | Undecided → High |

xtknight (xt-knight) wrote : | #8 |
- sudo lspci -vvnn Edit (20.8 KiB, text/plain)
Thanks for taking a look at it, Brian.
Here is my "sudo lspci -vvnn".
I'm going to try disabling power states and disabling PCI MSI config tonight to see if it's interrupt- or power-related. I suggest everyone also post "cat /proc/interrupts":
CPU0 CPU1
0: 110357473 0 IO-APIC-edge timer
1: 151725 0 IO-APIC-edge i8042
6: 106 0 IO-APIC-edge floppy
7: 0 0 IO-APIC-edge parport0
8: 0 0 IO-APIC-edge rtc
9: 1 0 IO-APIC-fasteoi acpi
16: 11184206 0 IO-APIC-fasteoi uhci_hcd:usb3, ohci1394, nvidia
18: 19924361 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb7, EMU10K1
19: 2433345 0 IO-APIC-fasteoi uhci_hcd:usb6, ohci1394, libata, libata
20: 9065082 0 IO-APIC-fasteoi pata_pdc2027x
21: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
23: 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb5
507: 2723565 0 PCI-MSI-edge eth0
NMI: 0 0
LOC: 109387069 109387245
ERR: 0

xtknight (xt-knight) wrote : | #9 |
At one point I got this while pinging my router (sometimes it's just destination host unreachable):
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
Unfortunately at this time I didn't have debug=16 on.

xtknight (xt-knight) wrote : | #10 |
I notice a marked improvement when using the 2.6.23-rc6 kernel (kernel.org vanilla, compiled using the Master Kernel Thread @ UbuntuForums).
My suggested solution is to use the sky2.c out of the vanilla kernel. The one in the Ubuntu kernel 2.6.22-11 reports "sky2 1.14". Type this to see your version. "sudo ethtool -i eth0"
With 2.6.23 you get sky2 1.17. I suspect that there are missing patches in the Ubuntu 2.6.22 kernel (some were backported while others were not, when all needed to be backported at once). I think it needs to be sync'd with the vanilla kernel version.
I am not certain that using 1.17 (from vanilla) is a solution but I think it is a good idea in general. The patches were committed to the mainstream kernel for good reason. Even if it doesn't solve my problems (which, so far, I think it has), it may solve others'. Sky2 is a fast-changing driver and all improvements need to be backported ASAP IMO, especially considering that sky2 hardly works for maybe 10% of those using it. And, sky2 supports a very popular network chipset on new computers (Marvell Yukon).
I have attached a diff of the sky2.c from 2.6.23-rc6 and the one from the current Gutsy kernel. I think this is all that is needed for the changes. Here are the changes from 2.6.22-11 (Ubuntu) to 2.6.23-rc6, attached. I diff'd sky2.h and sky2.c and combined them into one diff file. I haven't tested compiling 2.6.22-11 with the new sky2 files yet (I have only compiled a 2.6.23-rc6 kernel) but I have a good feeling that sky2.c/sky2.h is where the problem lies. I hope to be able to test it myself very soon.
Please consider this solution and it would be nice if others could test this patch so we can get it into the final Gutsy.
You should be able to apply the patch like this. Keep Master Kernel Thread @ http://
Prerequisite: "sudo apt-get update && sudo apt-get upgrade". Reboot if there is a kernel update. Then follow these steps. If your kernel isn't exactly 2.6.22-11.32 then the patch should still work (unless there have been more modifications to sky2 since).
1. sudo apt-get install linux-source-2.6.22
2. cd /usr/src
3. tar xjvf linux-source-
4. Follow steps 1-3 at Master Kernel Thread (including getting in root). (Not step 4 because we have unpacked the Ubuntu linux source. Not step 5 because I have a modification.)
5. rm -rf linux && ln -s /usr/src/
6. patch -p2 < ~/Desktop/
7. Continue, starting from step 8 at Master Kernel Thread. You shouldn't need to go through any menu items because our current kernel is identical to the one we're compiling. For Step 10 it doesn't matter what revision you use, maybe something unique like "sky2" so it doesn't get confused with your current kernel.
I hope this helps somebody compile an Ubuntu kernel w/ a new, fixed sky2. Most importantly, please report back if it works for you. Thank you. If there are any problems I apologize, but never delete your old kernel for any reason.

xtknight (xt-knight) wrote : | #11 |

Mahmoud Kassem (mmkassem) wrote : | #12 |
I would like to add that this problem is no longer available in the Fedora kernel 2.6.22.5-76.fc7 (Aug 30 2007) although the sky2 driver version 1.14, so may be it is not the sky2 driver problem or may be it is a modified 1.14. Can someone please check the difference?
The previous fedora kernel had the same problem and always generated the same rx error in the messages log.

|
#71 |
I fixed my problems by using 2.6.23-rc6.

xtknight (xt-knight) wrote : | #13 |
By this point, it's pretty much confirmed that all my problems are solved by using 2.6.23-rc6. I haven't had an issue ever since I started using the new kernel.

Geoff (cheetaweb) wrote : | #14 |
This still happens (i.e. sky2 dying ) on 2.6.23-

xtknight (xt-knight) wrote : | #15 |
Geoff: please describe your problem more in depth. Do you have anything in dmesg? When does it die?
Also, I don't trust customized kernels. I use the official from kernel.org. If yours is easy to reproduce could you please try 2.6.23-rc6 ("latest prepatch") there? Thanks.

|
#72 |
Interesting, the only thing that went in between rc5 and rc6 was the restore multicast list on resume, which while potentially applicable, doesn't sound like it addresses the whole of the problem. Does rc5 fix the problem for you as well?

Tor Håkon Haugen (torh) wrote : | #16 |
FYI: There is an ongoing discussion about the sky2 driver in the kernel mailing list, so I guess they are aware of the problem.

|
#73 |
Sorry for the misunderstanding.
I fixed my problems by upgrading from the Ubuntu Linux 2.6.22-11 kernel to the vanilla 2.6.23-rc6 kernel. I hadn't even tried any other 2.6.23 yet. I'm thinking the Ubuntu kernel has a problem due to mismatched or partially backported patches, at least in my case.

xtknight (xt-knight) wrote : | #17 |
It seems my problem is different from Geoff's because mine is fixed by 2.6.23 and his is not.
I don't know how to help fix this problem but it is High importance and Confirmed. Is there a planned milestone for it?
Worse, I'm not sure if my problem can be fixed without causing further regressions for others.
Could others confirm if their issues are fixed by using 2.6.23, please? I see at least three or four different sky2 bugs in the wild (upload issues, maybe interrupt issues, hw checksum+crash dumps, no errors at all but no net at all)

|
#74 |
Created attachment 13006
debugfs sky2

|
#75 |
Created attachment 13007
debugfs sky2 (when it did work)

|
#76 |
I am still having issues with 2.6.23-rc6 and rc8, but it took awhile for them to begin happening again. I attached two debugfs logs of sky2.

xtknight (xt-knight) wrote : | #18 |
I actually started having issues with 2.6.23-rc6 and -rc8 also. So, apparently not all of the problem is fixed.

Mahmoud Kassem (mmkassem) wrote : | #19 |
I just to confirm that Fedora 7 kernel 2.6.22.5-76.fc7 does not have this problem, I've transferred 50GiB+ data using samba without problems. May be if Ubuntu kernel team looked at it, they can find a solution.

snoopy26 (kasper-biessenhofen) wrote : | #20 |
My Fedora kernel 2.6.22.9-91.fc7 still has this problem:
While doing a permanent ping to my gateway, after a while (1-3 hours)
ping: sendmsg: No buffer space available
show up.
rmmod sky2; modprobe sky2 ;
help for some minutes, but then 'No buffer space available' turn up again.
Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02)
Subsystem: Intel Corporation Unknown device 1001
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at 90100000 (32-bit, non-prefetchable) [size=4K]
ethtool -i eth0
driver: sky2
version: 1.14
firmware-version: N/A
bus-info: 0000:02:00.0

Mahmoud Kassem (mmkassem) wrote : | #21 |
snoopy26, Your network controller is an Intel Wireless Network Adapter, this bug report is for Marvell 88E8056 Ethernet. If the bug you mentioned exist on your *Ubuntu* Installation I suggest you open a new bug report for it.

snoopy26 (kasper-biessenhofen) wrote : | #22 |
Sorry, I extract the wrong section from lspci, the correct one is:
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8055 PCI-E Gigabit Ethernet Controller (rev 12)
Subsystem: Fujitsu Limited. Unknown device 139a
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 2299
Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=16K]
Region 2: I/O ports at 2000 [size=256]
[virtual] Expansion ROM at 90000000 [disabled] [size=128K]
I apologize and will open a bug report on bugzilla for FC7.

|
#77 |
I'm running SuSe 10.3 and with an updated kernel (2.6.23.
The interface is listed as "sky2 0000:02:00.0: v1.18 addr 0xd5020000 irq 17 Yukon-EC (0xb6) rev 1"
I only run 100mbit to a switch. Using it on a media server and unfortunately after a few hours of reasonably heavy use streaming media, the interface dies, then a 3-4 hours later, the machine crashes.
If I get to the machine before it dies, I can restart the interface, but as others report, it lasts for a shorter time.
When restarting it, "ifstatus" reports it as up in the failed mode, doing an "ifdown" and "ifup" restarts it.
ifup reports: "device: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 19)"
I see nothing in dmesg when the interface dies

|
#78 |
There is a problem on Yukon-EC that causes the receive fifo to hang.
Workaround code in 2.6.23 that is supposed to detect and fix it.
The problem also only occurs if there is no flow control. The sky2
autonegotiates to enable flow control but some hardware doesn't support
flow control or has it disables.

|
#79 |
Thanks.
Unfortunately the log reports:
kernel: [ 982.916325] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both
So I'm not sure it's limited to the case when flow control is on.
I noticed some threads earlier this year where you tried flow control off. Is that worth trying again with latest release, if so how?

|
#80 |
I can repeat the failure by trying to copy about 20G of files over a Samba connection from a Windows box. I can never get past 5G before it fails. So perhaps I can do some debugging for you?

Mahmoud Kassem (mmkassem) wrote : | #23 |
snoopy26, are you still having problems with Fedora 7 kernel 2.6.23.1-10.fc7 (it's using sky2 driver version 1.18). I am not any network issues.

Mahmoud Kassem (mmkassem) wrote : | #24 |
ubuntu kernel 2.6.22-14-server has the sky2 driver version 1.18 but still issues rx errors
Test:
2 PC with Marvell 88E8056 Ethernets
Source: Fedora 7 2.6.23.1-10.fc7 (Sky2 1.18)
Dist: Ubuntu Server 7.10 2.6.22-14-server (Sky2 1.18)
Data transferred: 145 GB using SCP
Summary:
The network device did not go down, but one rx error was added to the /var/log/messages on the ubuntu-server
No rx errors on the F7

xtknight (xt-knight) wrote : | #25 |
Well, with 2.6.23-rc8 my problems are reduced about 99%. Just to let you know. I haven't had any problems that I can definitely relate to the sky2 driver itself even though it has gone down once(?) in the past month.

snoopy26 (kasper-biessenhofen) wrote : Re: [Bug 138611] Re: Sky2 ethernet failing randomly with Marvell 88E8056 gigabit | #26 |
Running ...
Linux oder 2.6.23.1-21.fc7 #1 SMP Thu Nov 1 20:28:15 EDT 2007 x86_64 x86_64 i386-64 GNU/Linux
... since one week without any problems in sky2.
My problem seems to be fixed :-))
On Mon, Nov 05, 2007 at 02:58:30PM -0000, Mahmoud Kassem wrote:
> ubuntu kernel 2.6.22-14-server has the sky2 driver version 1.18 but
> still issues rx errors
>
> Test:
> 2 PC with Marvell 88E8056 Ethernets
> Source: Fedora 7 2.6.23.1-10.fc7 (Sky2 1.18)
> Dist: Ubuntu Server 7.10 2.6.22-14-server (Sky2 1.18)
> Data transferred: 145 GB using SCP
>
> Summary:
> The network device did not go down, but one rx error was added to the /var/log/messages on the ubuntu-server
> No rx errors on the F7
>
> --
> Sky2 ethernet failing randomly with Marvell 88E8056 gigabit
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.

|
#81 |
Is this the same bug as the original report, or is the bug becoming a tar baby for all the possible "my sky2 has hung" reports?
The original report said problem was reproducible after up/down. Not one
of the "my box hangs under load" problems.
Changed in linux: | |
status: | Confirmed → Incomplete |

|
#82 |
Sorry, no, to avoid raising another bug on sky2 this was the nearest I could find.
Sky2 hangs under load, that's the problem. Very repeatable.
I've now compiled and switched to the Marvell driver sk98lin, and that gives me no problems...

Leann Ogasawara (leannogasawara) wrote : | #27 |
Hi Guys,
There hasn't been any recent activity in this report and we were wondering if this is still an issue in the latest Hardy Alpha release. The Hardy Heron Alpha series contains an updated version of the kernel, version 2.6.24. You can download and try the new Hardy Heron Alpha release from http://
Changed in linux: | |
status: | New → Incomplete |
Changed in linux-source-2.6.22: | |
status: | Confirmed → Won't Fix |

treffer (rtreffer) wrote : | #28 |
Hi Leann,
I can confirm this behaviour. Especially scp/scp -r on my local network kills eth0 after ~500MB (actually [50; 1000]MB).
Happens with hardy based mythbuntu (should have the same kernel), zen-sources (see http://
So yes, this is still happening all the time here :( But just under heavy network load. I can run skype, rsync to the internet, apt-get or surf the web for days (literally days, the box was running for 2 days without network problems - even compiling gentoo in a chroot worked (wget, rsync, ...)).
I've tried to play around with ethtool but no real success, yet.
Hope this helps....

treffer (rtreffer) wrote : | #29 |
Update....
NOAPIC (as mentioned on gentoo-wiki.com) didn't help. Patching the kernel with the original marvell driver (available via http://
I'll add a pci 100MBit card and recheck - I've added a new switch/device (wrt150n with dd-wrt) so I'll have to recheck my network's hardware.... Everything else was stable till now :( (transfering terrabytes of data)
However I can confirm that the card seems to work like a charm at lower rates/higher latencies... Wired....
Happy hunting, I'm going to add a 100MBit card :( (this won't keep me from testing - the gbit card is onboard - asus p5k-vm for the record)
Oh, my tests worked like this:
"scp -r <user>@
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/
connected via
<notebook/realtek chip> <-- 100MBit/Patch cable --> <cheap 100MBit switch> <-- 100MBit/Patch cable --> <wrt150n with ddwrt> <-- 100MBit/Cross --> <normal box/marvell 88E8056>
I'll recheck the network chain with a 100MBit card - but I wouldn't expect a network problem....
Hope this helps.

Russ Price (rjp-ubu) wrote : | #30 |
I can reliably get my new system (XFX nForce 630i motherboard) with an integrated Marvell PCI-E Gigabit Ethernet controller to lock up HARD. All I have to do is FTP anything from another gigabit-equipped machine on my LAN, and the system instantly locks up requiring a hard reset. NFS transfers will also lock the system up. At slower speeds (i.e. pulling updates over the Internet) lockups are random. Once it locks up, only a hard reset will work.
I'm running Hardy Beta, but I had this problem in Gutsy as well. I'm also having problems with corrupted packages in {us,uk,
I tried using "options sky2 disable_msi=1" in /etc/modprobe.
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 14)
Subsystem: Marvell Technology Group Ltd. Unknown device 00ba
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 20
Region 0: Memory at efcfc000 (64-bit, non-prefetchable) [size=16K]
Region 2: I/O ports at df00 [size=256]
[virtual] Expansion ROM at efb00000 [disabled] [size=128K]

Russ Price (rjp-ubu) wrote : | #31 |
I dropped an RTL8169-based PCI card into the system and disabled the Marvell controller. This solved the lockup problem completely - and it also solved my problem with corrupted updates.
Now I'm debating whether or not to return the motherboard and get something with a Realtek, Nvidia, or Intel controller on-board, or keep the motherboard and get an Intel PCIe card so I don't have to waste a PCI slot. The on-board Marvell is simply unusable.

Mubashir Cheema (cheema) wrote : | #32 |
I can confirm that this bug exists in Ubuntu 8.04 beta 1 with kernel 2.6.24-12-generic. lspci reports that I have:
04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 13)
I dont experience complete lockups. however I experience extreme slowdowns over a local gigabit network when using just interactive ssh.
Changed in linux: | |
assignee: | nobody → ubuntu-kernel-team |
importance: | Undecided → Medium |
status: | Incomplete → Triaged |

Alan Podlesek (seqdcer) wrote : | #33 |
I'm also using Ubuntu 8.04 beta with kernel 2.6.24-12-generic, but have the 88E8055 controller.
I can ping sites without a problem, but firefox just times out.

Alan Podlesek (seqdcer) wrote : | #34 |
Some more info...
lspci says:
Ethernet controller: Marvell Technology Group Ltd. 88E8055 PCI-E Gigabit Ethernet Controller (rev 12)
ethtool -i eth0
driver: sky2
version: 1.20
firmware-version: N/A
bus-info: 0000:02:00.0
cat /var/log/messages
kernel: [ 164.639614] sky2 eth0: rx length error: status 0x5ea0100 length 1342
kernel: [ 164.741511] sky2 eth0: rx length error: status 0x5ea0100 length 1342
kernel: [ 164.840244] sky2 eth0: rx length error: status 0x5ea0100 length 1342
kernel: [ 165.261115] sky2 eth0: rx length error: status 0x5ea0100 length 1342

|
#83 |
Tried to find the bug source, but couldn't ;-( I used ubuntu 2.6.24 sources, placed the 2.6.22 (ubuntu) sky2.[ch] (ver. 1.18) files into the tree and applied the
[NET]: Make NAPI polling independent of struct net_device objects.
+
[NET]: Nuke SET_MODULE_OWNER macro.
patches (from git). Then I build the module, did a rmmod/modprobe, but nothing changed - the sky2 still fails with "sky2 eth0: rx error ..." in the dmesg.
Thus I guess the error could be somewhere else (maybe the napi polling isn't working quite right?), or maybe... I guess I'm gonna try to really find the bug...

|
#84 |
I have consistently had the same issue reported above where the kernel reports the following and the interface does not work. It seems to work fine the first timeyou bring up the interface, but if you do a ifdown/ifup you get the following message, but no connection.
"sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both"

Dossy Shiobara (dossy) wrote : | #35 |
+1. Seeing this problem on a new EVGA e-7100/630i board that also has the crappy Marvell NIC. Running on 8.04/Hardy release.
$ uname -a
Linux doc 2.6.24-16-generic #1 SMP Thu Apr 10 12:47:45 UTC 2008 x86_64 GNU/Linux
$ lspci | grep Marvell
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 14)
$ ethtool -i eth0
driver: sky2
version: 1.20
firmware-version: N/A
bus-info: 0000:03:00.0

Barius (bariuspelagic) wrote : | #36 |
Seeing the same issues here too. I'm on a Shuttle SG31G2.
> lspci | grep Marvell
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
> ethtool -i eth0
driver: sky2
version: 1.20
firmware-version: N/A
bus-info: 0000:02:00.0

Daniel J Blueman (danielblueman) wrote : | #37 |
There is a known issue with the 88E8056 network controller firmware related to this; you need to contact your motherboard vendor to request this specifically. I had similar issues with my 88E8053 (EC) until moving to firmware v2.2.

Leann Ogasawara (leannogasawara) wrote : | #38 |
The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:
1) If you are comfortable installing packages on your own, the linux-image-
--or--
2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://
Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Brian Long (briandlong) wrote : | #39 |
I've installed the 2.6.27-2-generic kernel from the Intrepid debs and I'm still having sky2 issues. If I right-click on the nm-applet, uncheck "Enable Networking" and re-check it, my link comes back up.
I'm running the Abit P5K Pro motherboard with the following device:
02:00.0 0200: 11ab:4364 (rev 12)
Subsystem: 1043:81f8
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 219
Region 0: Memory at feafc000 (64-bit, non-prefetchable) [size=16K]
Region 2: I/O ports at d800 [size=256]
Expansion ROM at feac0000 [disabled] [size=128K]
Capabilities: <access denied>
I'm going to update my motherboard BIOS to see if that helps the situation. I'll also contact ASUS to see if there is a NIC firmware update I can apply as mentioned in another comment (no update is mentioned on their website for this motherboard).

flower (otherego) wrote : | #40 |
i confirm sky2 rx length error on my x86-64 intrepid ibex alpha 6 after upgrade from 8.04.

misan (misan) wrote : | #41 |
Same here, I've tested Ibex Beta with similar results: random lockup of the nextwork device whenever traffic levels are quite high (transfers over the Full dupley 100 Mbps LAN).
Disabling and enabling the network cures the problem as so does removing and modprobing again sky2 module.

Wilbur Harvey (wilbur-harvey-spirentcom) wrote : | #42 |
I have a Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
Running Ubuntu 8.10 x64, latest updates as of this morning.
My network will only link at 100Mbps to my 1Gig Switch. It used to connect at 1Gbps a week or so ago.

Daniel J Blueman (danielblueman) wrote : | #43 |
Wilbur, if you're using a "desktop" (small) ethernet switch, power cycle it. It took me too long to realise that if one port negotiates down to 100Mb, it'll not run at higher speed until you powercycle the switch (found on at least 3 switches from 2 vendors). This is probably more a design limitation, so you pay a premium for expected functionality.

Wilbur Harvey (wilbur-harvey-spirentcom) wrote : | #44 |
That did the trick.

Uzzi (andreaussi-yahoo) wrote : | #45 |
Marvell Technology Group Ltd. 88E8040 on 2.6.27-7-generic doesn't work!

alfredo (software-zas) wrote : | #46 |
I have a Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) running Ubuntu 8.10 x86.
Same problem as Wilbur, but Daniels trick does not work for me.
Only 100Mbps to to 3Com switch...

alfredo (software-zas) wrote : | #47 |
Found this in my dmesg:
Bridge firewalling registered
[ 29.060313] br0: Dropping NETIF_F_UFO since no NETIF_F_HW_CSUM feature.
[ 29.061997] device lan1 entered promiscuous mode
[ 29.062586] sky2 lan1: enabling interface
[ 32.008591] sky2 lan1: speed/duplex mismatch<6>sky2 lan1: Link is up at 1000 Mbps, full duplex, flow control both
[ 34.652174] br0: port 1(lan1) entering learning state
[ 34.999033] sky2 lan1: Link is down.
[ 35.652447] br0: port 1(lan1) entering disabled state
[ 36.807290] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 36.934054] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[ 36.934630] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Plase use
[ 36.934632] nf_conntrack.acct=1 kernel paramater, acct=1 nf_conntrack module option or
[ 36.934634] sysctl net.netfilter.
[ 40.489485] sky2 lan1: Link is up at 100 Mbps, full duplex, flow control both
[ 40.489826] br0: port 1(lan1) entering learning state
lan1 is the marvell port...

alfredo (software-zas) wrote : | #48 |
curious - i changed the switch and got 1Gbit on the link....
But before changing the switch i had booted a winxp on the machine - getting 1Gbit link - on both switches!
Don't know how to get the firmware level of the switch....it's a 3C1670500A five port, the new is an eight port 3Com.

Launchpad Janitor (janitor) wrote : Kernel team bugs | #49 |
Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https:/

|
#85 |
Closing out old bugs
Changed in linux: | |
status: | Incomplete → Invalid |

angelinux (mail-angelodelia) wrote : | #50 |
Hi all,
I'm experimenting similar troubles with
Ubuntu 8.10 server, 64 bit edition - Kernel 2.6.27-11-server
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
It randomly crashes the entire kernel, with no message in any log, so hardware reset is needed. On the system console appears flashing and coloured characters but this doesn't help me smiling.
This apperas to be related to usagee of UDP based protocols.

Daniel J Blueman (danielblueman) wrote : | #51 |
Angelinux: the PCI revision is actually the firmware version, of which you have known-buggy version 1.2. I found behaviour moving to firmware 2.2 or newer to resolve these problems.
Your motherboard vendor has to supply you the updated firmware. If you're unable to get it, let me know.
Changed in linux (Ubuntu): | |
status: | Triaged → Invalid |

slashdevdsp (slashdevdsp) wrote : | #52 |
Hi Daniel
From you comment above - https:/
Could you please send me the updated firmware for the 88E8056 marvell gigabit ethernet controller - This is for Asus p6t deluse V2 board with dual gigabit ethernet.
I am running the latest kernel from ppa 2.6.31rc1 with the sky2 driver and I am still getting
sky2 eth0: speed/duplex mismatch sky2 eth0 and the

slashdevdsp (slashdevdsp) wrote : | #53 |
Hi Daniel
From you comment above - https:/
Could you please send me the updated firmware for the 88E8056 marvell gigabit ethernet controller - This is for Asus p6t deluse V2 board with dual gigabit ethernet.
I am running the latest kernel from ppa 2.6.31rc1 with the sky2 driver and I am still getting
sky2 eth0: speed/duplex mismatch sky2 eth0 and the link keeps going up and down

slashdevdsp (slashdevdsp) wrote : | #54 |
I should also point out that this is on a 64bit 9.04 with 2.6.31rc1 amd64 sky2 driver

Daniel J Blueman (danielblueman) wrote : | #55 |
The firmware for a number of Marvell ethernet controllers is at:
http://
I accept no responsibility and you flash this entirely at your own risk. There's documentation there and you can use 'strings' on the firmware files to identify which one is suitable.

Vikram Dhillon (dhillon-v10) wrote : Re: [Bug 138611] Re: Sky2 ethernet failing randomly with Marvell 88E8056 gigabit @ubuntu-kernel-team | #56 |
Daniel J Blueman wrote:
> The firmware for a number of Marvell ethernet controllers is at:
> http://
>
> I accept no responsibility and you flash this entirely at your own risk.
> There's documentation there and you can use 'strings' on the firmware
> files to identify which one is suitable.
>
>
Hi
You see there aren't native drivers for your ethernet device, so
one thing you can do is to unplug your modem and then plug it back in
after 5 mins. This worked for me...

Tuomas (mrtuomas) wrote : | #57 |
Some issues with Ubuntu 10.04 (64bit) on a Fujitsu-Siemens Lifebook S7210. For some time after bootup the network just doesn't work, apparently until the connection speed is reduced to 100Mbps.
lspci:
Ethernet controller: Marvell Technology Group Ltd. 88E8055 PCI-E Gigabit Ethernet Controller (rev 14)
ethtool -i eth0:
driver: sky2
version: 1.25
firmware-version: N/A
bus-info: 0000:04:00.0
uname -r:
2.6.32-22-generic
dmesg:
(lots of lines like this)
[ 965.596662] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
[ 966.029133] sky2 eth0: Link is down.
[ 968.782951] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
[ 969.387122] sky2 eth0: Link is down.
[ 971.949581] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
[ 972.999410] sky2 eth0: Link is down.
[ 975.662757] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
[ 976.008737] sky2 eth0: Link is down.
[ 988.121189] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both

Andreas Meier (rumpumpel1) wrote : | #58 |
I'm using ubuntu and I got this error message since about two years with every linux-kernel.
The problem appears with 32bit and with 64bit.
I have an Asus M2N-L mainboard which has two Marvell 88E8056 chips onboard.
The machine is used as router. eth0 is my internal interface and eth1 is the external interface
used by ppp0 to connect to the DSL-modem.
I get this error message only for eth1 !
Also I get a lot of dropped and frame errors for this interface:
eth1 ...
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:19876766 errors:614219 dropped:614219 overruns:0 frame:108483
Since Ubuntu 10.04 the situation has become worse: whereas prior only my logs got flooded,
now the interface stops working once a day.
Changed in linux: | |
importance: | Unknown → Medium |

Giardo (andrigiardo) wrote : | #59 |
Same problem for me...
That's mine lspci -vvnn:
02:00.0 Ethernet controller [0200]: Marvell Technology Group Ltd. 88E8071 PCI-E Gigabit Ethernet Controller [11ab:436b] (rev 16)
Subsystem: Acer Incorporated [ALI] Device [1025:014f]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 26
Region 0: Memory at febfc000 (64-bit, non-prefetchable) [size=16K]
Region 2: I/O ports at e800 [size=256]
Expansion ROM at febc0000 [disabled] [size=128K]
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data <?>
Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
Address: 00000000fee0100c Data: 4179
Capabilities: [c0] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #2, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <256ns, L1 unlimited
ClockPM+ Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [100] Advanced Error Reporting <?>
Capabilities: [130] Device Serial Number 90-fb-a6-
Kernel driver in use: sky2
Kernel modules: sky2
Dmesg is full of:
Mar 22 19:39:56 PcStream kernel: [88166.328760] sky2 eth0: Link is down.
Mar 22 19:40:00 PcStream kernel: [88169.870833] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both
Mar 22 19:40:01 PcStream kernel: [88171.027428] sky2 eth0: Link is down.
Mar 22 19:40:07 PcStream kernel: [88177.375698] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both
No way to got it working properly....
I will install a new ethernet card soon, this situation is not acceptable ( i'm running a cronjob every minutes with "ifconfig eth0 up" -.- it reduces downtimes at 4-5 sec but that's not a good solution)
Please solve this problem! And contact me if you need other informations! (sorry for my bad english :-D)

Forest (foresto) wrote : | #60 |
In case anyone wants to try a different driver, I just packaged the latest sk98lin from Marvell. It's working as a sky2 replacement with my Yukon-2 88E8056 chip.
When launchpad finishes the build, it should appear here:

MicheleSchiavo (micheleschi-gmail) wrote : | #61 |
after some kernels ago, this old bug was came back
changing MTU don't change anything...
[ 150.457097] sky2 0000:02:00.0: eth0: rx error, status 0x5ea0100 length 1502
[ 151.283689] sky2 0000:02:00.0: eth0: rx error, status 0x5ea0100 length 1502
[ 151.349995] sky2 0000:02:00.0: eth0: rx error, status 0x3020020 length 770
[ 151.360369] sky2 0000:02:00.0: eth0: rx error, status 0x2490002 length 585
[ 151.514566] sky2 0000:02:00.0: eth0: rx error, status 0x33c0020 length 828
[ 151.563110] sky2 0000:02:00.0: eth0: rx error, status 0x2bd0020 length 701
[ 151.667122] sky2 0000:02:00.0: eth0: rx error, status 0x59b0020 length 1435
[ 151.758506] sky2 0000:02:00.0: eth0: rx error, status 0x5d60002 length 1494
[ 151.763368] sky2 0000:02:00.0: eth0: rx error, status 0x1780020 length 380
[ 151.768621] sky2 0000:02:00.0: eth0: rx error, status 0x5d60002 length 1494
[ 159.334735] net_ratelimit: 5 callbacks suppressed
[ 159.334742] sky2 0000:02:00.0: eth0: rx error, status 0x5620020 length 1378
[ 159.430277] sky2 0000:02:00.0: eth0: rx error, status 0x9d0020 length 157
[ 165.603936] sky2 0000:02:00.0: eth0: rx error, status 0x3d60020 length 982
[ 165.824732] sky2 0000:02:00.0: eth0: rx error, status 0x5cc0002 length 1484
[ 165.846673] sky2 0000:02:00.0: eth0: rx error, status 0x5cc0002 length 1484
[ 165.851051] sky2 0000:02:00.0: eth0: rx error, status 0x3c10020 length 961
[ 165.859700] sky2 0000:02:00.0: eth0: rx error, status 0x3c00020 length 960
[ 167.804561] sky2 0000:02:00.0: eth0: rx error, status 0x40b0020 length 1035
[ 174.900664] sky2 0000:02:00.0: eth0: rx error, status 0x5d60002 length 1494
[ 181.612397] sky2 0000:02:00.0: eth0: rx error, status 0x5d60002 length 1494
[ 181.850806] sky2 0000:02:00.0: eth0: rx error, status 0x3980020 length 920
[ 182.031611] sky2 0000:02:00.0: eth0: rx error, status 0x1720020 length 370
[ 182.757526] sky2 0000:02:00.0: eth0: rx error, status 0x5d60002 length 1494
[ 182.760969] sky2 0000:02:00.0: eth0: rx error, status 0x5d60002 length 1494
[ 182.868165] sky2 0000:02:00.0: eth0: rx error, status 0x5d60002 length 1494
[ 183.319772] sky2 0000:02:00.0: eth0: rx error, status 0x5d60002 length 1494
[ 183.323903] sky2 0000:02:00.0: eth0: rx error, status 0x5d60002 length 1494
[ 183.660931] sky2 0000:02:00.0: eth0: rx error, status 0x2f50020 length 757
[ 209.649181] net_ratelimit: 3 callbacks suppressed
[ 209.649187] sky2 0000:02:00.0: eth0: rx error, status 0x5d60002 length 1494
[ 209.652552] sky2 0000:02:00.0: eth0: rx error, status 0x2340020 length 568
[ 213.390135] sky2 0000:02:00.0: eth0: rx error, status 0x4740020 length 1144
[ 217.839201] sky2 0000:02:00.0: eth0: rx error, status 0x1dc0002 length 476
[ 225.104332] sky2 0000:02:00.0: eth0: rx error, status 0x28f0002 length 655
[ 225.462348] sky2 0000:02:00.0: eth0: rx error, status 0x5a0020 length 90
[ 225.584245] sky2 0000:02:00.0: eth0: rx error, status 0x930020 length 147
[ 225.672717] sky2 0000:02:00.0: eth0: rx error, status 0x14c0002 length 332
[ 226.436303] sky2 0000:02:00.0: eth0: rx error, status 0x3920020 length 914
[ 226.448875] sky2 0000:02:00.0: eth0: rx error, status 0x3ea0020 length 1002
[ 226.823315] sky2 0000...

MicheleSchiavo (micheleschi-gmail) wrote : | #62 |
uname -a
Linux uzzmaster 3.0.0-13-generic #21-Ubuntu SMP Mon Oct 17 20:18:51 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

Remon Schopmeijer (r-schopmeijer) wrote : | #63 |
This is still a bug on the 4.2.0 kernel in 15.10 ...

|
#86 |
------8<-------
1 size_t fwrite(const void * __restrict ptr, size_t size, http://
2 size_t nmemb, register FILE * __restrict stream)
3 {
4 size_t retval; https:/
5 __STDIO_
6 http://
7 > __STDIO_
8
9 retval = fwrite_
10 https:/
11 __STDIO_
12 http://
13 return retval;
14 }
------>8-------
http://
Here, we are at line 7. Using the "next" command leads no where. However,
setting a breakpoint on line 9 and issuing "continue" works.
http://
Looking at the assembly instructions reveals that we're dealing with the
critical section entry code [1] that should never be interrupted, in this
case by the debugger's implicit breakpoints: http://
------8<-------
... http://
1 add_s r0,r13,0x38
2 mov_s r3,1
3 llock r2,[r0] <-.
4 brne.nt r2,0,14 --. | http://
5 scond r3,[r0] | |
6 bne -10 --|--'
7 brne_s r2,0,84 <-' http://
...
------>8-------
http://
Lines 3 until 5 (inclusive) are supposed to be executed atomically. Therefore,
GDB should never (implicitly) insert a breakpoint on lines 4 and 5, else the http://
program will try to acquire the lock again by jumping back to line 3 and
gets stuck in an infinite loop. https:/
The solution is to make GDB aware of these patterns so it inserts breakpoints
after the sequence -- line 6 in this example.
Most recent kernel where this bug did not occur: Unknown
Distribution: CentOS 4.5
Hardware Environment: Intel server board SE7320VP21
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8050 PCI-E ASF Gigabit Ethernet Controller (rev 18)
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
Capabilities: [e0] Express Legacy Endpoint IRQ 0
Subsystem: Intel Corporation: Unknown device 3466
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at deefc000 (64-bit, non-prefetchable) [size=16K]
I/O ports at b800 [size=256]
Expansion ROM at deec0000 [disabled] [size=128K]
Software Environment: CentOS 4.5 install + "vanilla" kernel 2.6.23-rc4
Problem Description:
Discovered while attempting to troubleshoot: /bugzilla. redhat. com/show_ bug.cgi? id=228733
https:/
I'm trying to understand the "tx timeout" messages, and how to reproduce them. In my test environment, I have 2 servers, each of which has a sky2 Marvell NIC connected to a switch as "eth0".
On server "B", I type "nc -l -p 3409 > /dev/null"
On server "A", I type "nc server-B 3409 < /dev/zero"
I see lots of traffic from A->B, as would be expected. If I shutdown eth0 on server "B" using "ifdown eth0", wait a few seconds, and then re-enable eth0 on server "B" using "ifup eth0", I see the following in "dmesg" output on server B:
sky2 eth0: disabling interface
sky2 eth0: enabling interface
sky2 eth0: ram buffer 48K
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
ip_tables: (C) 2000-2006 Netfilter Core Team
As expected... The problem is that server B can occasionally end up in a state where it is unable to ping or access the local subnet anymore. Both "mii-tool" and "ethtool eth0" shows a link present.
If I perform "ifdown eth0; ifup eth0" on server B, it doesn't help anything.
If I unload the sky2 module, then things clear up and I'm back on the network again.
I'm curious about this testcase because the symptom seems to match the earlier "tx timeout" messages; the driver tried to re-enable itself after a timeout, but it's still not able to see any traffic.
Steps to reproduce:
See "Problem Description" above. While traffic is continuously being transmitted from server "A" to server "B", shutdown the network interface on server "B", and then start the interface on server "B". Monitoring RX traffic on server "B" will indicate when it is no longer receiving the bytes sent from server "A".