ADMtek ADM8511 "Pegasus II" USB Ethernet causes oops

Bug #1547838 reported by a1291762
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

This is a rather old USB ethernet adapter. It worked fine for me with older distributions (including Ubuntu 10.04). I noticed it causing problems after upgrading to 14.04 and stopped using it as a result (since I needed reliable USB ethernet for work). I was also not completely sure if it was this device causing the issue or interplay between this device and a number of other networking USB devices that were used at work.

I'd like to use this ethernet adapter at home but the machine is running Ubuntu 14.04 and the issue persists.

I have collected some information that might help. I'm happy to try more things to collect information if that will help.

I'm running Ubuntu 14.04 server 64-bit. I just rebooted so I'm running the latest updates.

I was told to run ubuntu-bug linux but I'm not sure how much useful stuff is included in that. I'm including the kernel log which has some bad stuff happening in it. Not sure if it got the system-freezing oops or just the errors leading up to that.

It feels like a bug in the driver that corrupts memory because I see issues with network-related things (eg. a ping crashed, but the next ping didn't), leading to a full-system freeze.

This machine has been happily running Ubuntu for a looong time (though it was only recently upgraded to Ubuntu 14.04, previously it had been running 10.04). I have not had stability issues with the machine but plugging in the USB Ethernet reliably makes it freeze within minutes. The symptoms also match what I saw on my machine at work.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.19.0-49-generic 3.19.0-49.55~14.04.1
ProcVersionSignature: Ubuntu 3.19.0-49.55~14.04.1-generic 3.19.8-ckt12
Uname: Linux 3.19.0-49-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: amd64
Date: Sat Feb 20 21:23:29 2016
InstallationDate: Installed on 2015-12-31 (50 days ago)
InstallationMedia: Ubuntu-Server 14.04.3 LTS "Trusty Tahr" - Beta amd64 (20150805)
SourcePackage: linux-lts-vivid
UpgradeStatus: No upgrade log present (probably fresh install)
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.19.0-49-generic.
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/hwC0D1', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D2p', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/controlC0', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info: Error: [Errno 2] No such file or directory
Card0.Amixer.values: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=0cffa533-89ec-44d8-b5f9-6efe69c1b955
InstallationDate: Installed on 2015-12-31 (67 days ago)
InstallationMedia: Ubuntu-Server 14.04.3 LTS "Trusty Tahr" - Beta amd64 (20150805)
MachineType: Hewlett-Packard HP EliteBook 8530w
Package: linux (not installed)
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-49-generic root=UUID=d4c67c71-679c-4b9d-8e20-848149e8aedf ro debug ignore_loglevel crashkernel=384M-:128M
ProcVersionSignature: Ubuntu 3.19.0-49.55~14.04.1-generic 3.19.8-ckt12
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-49-generic N/A
 linux-backports-modules-3.19.0-49-generic N/A
 linux-firmware 1.127.20
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
Uname: Linux 3.19.0-49-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

WifiSyslog:

_MarkForUpload: True
dmi.bios.date: 06/08/2010
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: 68PDV Ver. F.12
dmi.board.name: 30E7
dmi.board.vendor: Hewlett-Packard
dmi.board.version: KBC Version 90.27
dmi.chassis.type: 10
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvr68PDVVer.F.12:bd06/08/2010:svnHewlett-Packard:pnHPEliteBook8530w:pvrF.12:rvnHewlett-Packard:rn30E7:rvrKBCVersion90.27:cvnHewlett-Packard:ct10:cvr:
dmi.product.name: HP EliteBook 8530w
dmi.product.version: F.12
dmi.sys.vendor: Hewlett-Packard

Revision history for this message
a1291762 (a1291762) wrote :
Revision history for this message
a1291762 (a1291762) wrote :

I've attached a screenshot of a hard-freeze. Will upload whatever logs were recovered too.

Revision history for this message
a1291762 (a1291762) wrote :

Hmm... I tried the crash kernel but it didn't seem to work. So I used the netconsole to capture more data than could fit on the screen.

Revision history for this message
a1291762 (a1291762) wrote :

I think the first netconsole missed some data at the start (I got a belated firewall warning) so here's a second capture that seems to start earlier.

In case it wasn't clear, most of the time I've been booting the machine and then inserting the USB device (so that I could be sure all the logging was ready to go before the problem happened).

Revision history for this message
a1291762 (a1291762) wrote :

Here's another netconsole, with the dongle inserted during boot.

Revision history for this message
a1291762 (a1291762) wrote :

The first bit of complaining on the last log didn't all make it to the netconsole and unlike previous instances, stayed on the screen for a while so I got a photo.

Revision history for this message
a1291762 (a1291762) wrote :

Note if I plug in the device but don't send traffic over it, the system manages to remain running for longer. I brought the interface up with a static IP and the system remained up for well over 5 minutes before it died. I was monitoring ifconfig and it had around 30 Rx packets and 300 Tx packets. I'm not sure what caused that, since it was static and no traffic should have been going out. With DHCP, pings, etc. going over the interface it kills the machine much quicker (all the previous crashes involved me sending data over the link, mostly just pings and whatever DHCP sends).

penalvch (penalvch)
affects: linux-lts-vivid (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
a1291762 (a1291762) wrote :

apport-collect requires a web browser. This machine is a server. It has lynx or links or something but Ubuntu's website does not work with that browser. How should I continue?

Revision history for this message
a1291762 (a1291762) wrote :

I couldn't figure out if there was a "crash" thing or not. I suspect not (/var/crash is empty).
Attached is the result of running sudo apport-cli -f -plinux --save bug.apport

Revision history for this message
a1291762 (a1291762) wrote : AlsaDevices.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
a1291762 (a1291762) wrote : BootDmesg.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : Card0.Codecs.codec.1.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : CurrentDmesg.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : IwConfig.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : Lspci.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : Lsusb.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : PciMultimedia.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : ProcEnviron.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : ProcInterrupts.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : ProcModules.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote :

It seems this is a known issue...
https://bugs.launchpad.net/ubuntu/+source/apport/+bug/997020

I have set w3m as the text-mode browser (it was already installed) and run apport-collect. It wants to upload nearly 600K, quite a bit more than the other command gave.

Revision history for this message
a1291762 (a1291762) wrote : UdevDb.txt

apport information

Revision history for this message
a1291762 (a1291762) wrote : UdevLog.txt

apport information

penalvch (penalvch)
tags: added: bios-outdated-f.20
Revision history for this message
a1291762 (a1291762) wrote :

I had to jump through some hoops to update the BIOS (since the updater was a Windows binary and this machine does not have Windows on it). But I finally sorted something out.

dmidecode output:
68PDV Ver. F.20
12/08/2011

The adapter seems to be stable now. I brought it up, did some pings, even some file transfers. I have not seen even a hint of a problem yet.

The machine at work this adapter was causing problems with was also a HP so maybe it too needed a BIOS update.

I am surprised, but thankful that it's working now.

Thanks.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
a1291762 (a1291762) wrote :

It seems I was mistaken.

The bug seems to remain, though it takes longer to trigger now than it did before.

I just got this crash over a netconsole.

Changed in linux (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

a1291762, in order to allow additional upstream developers to examine the issue, at your earliest convenience, could you please test the latest upstream kernel available from http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D ? Please keep in mind the following:
1) The one to test is at the very top line at the top of the page (not the daily folder).
2) The release names are irrelevant.
3) The folder time stamps aren't indicative of when the kernel actually was released upstream.
4) Install instructions are available at https://wiki.ubuntu.com/Kernel/MainlineBuilds .

If testing on your main install would be inconvenient, one may:
1) Install Ubuntu to a different partition and then test this there.
2) Backup, or clone the primary install.

If the latest kernel did not allow you to test to the issue (ex. you couldn't boot into the OS) please make a comment in your report about this, and continue to test the next most recent kernel version until you can test to the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this issue is fixed in the mainline kernel, please add the following tags by clicking on the yellow circle with a black pencil icon, next to the word Tags, located at the bottom of the report description:
kernel-fixed-upstream
kernel-fixed-upstream-X.Y-rcZ

Where X, and Y are the first two numbers of the kernel version, and Z is the release candidate number if it exists.

If the mainline kernel does not fix the issue, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-X.Y-rcZ

Please note, an error to install the kernel does not fit the criteria of kernel-bug-exists-upstream.

Also, you don't need to apport-collect further unless specifically requested to do so.

Once testing of the latest upstream kernel is complete, please mark this report Status Confirmed. Please let us know your results.

Thank you for your understanding.

tags: added: latest-bios-f.20
removed: bios-outdated-f.20
Changed in linux (Ubuntu):
importance: Low → Medium
status: Confirmed → Incomplete
Revision history for this message
a1291762 (a1291762) wrote :

It is still broken in the mainline kernel.

The top entry at the URL was this:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/

The package versions were:
Selecting previously unselected package linux-headers-4.5.0-040500-generic.
Selecting previously unselected package linux-headers-4.5.0-040500.
Selecting previously unselected package linux-image-4.5.0-040500-generic.

I selected the version explicitly from the grub menu to ensure the correct kernel was booting.

I've attached netconsole4.log, which shows a crash plus a complete boot-to-crash netconsole log with the 4.5.0 kernel.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream kernel-bug-exists-upstream-4.5
Revision history for this message
penalvch (penalvch) wrote :

a1291762, the next step is to fully commit bisect from kernel 2.6.22 to 3.19 in order to identify the last good kernel commit, followed immediately by the first bad one. This will allow for a more expedited analysis of the root cause of your issue. Could you please do this following https://wiki.ubuntu.com/Kernel/KernelBisection ?

Please note, finding adjacent kernel versions is not fully commit bisecting.

After the offending commit (not kernel version) has been identified, then please mark this report Status Confirmed.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

tags: added: needs-bisect regression-release
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
a1291762 (a1291762) wrote :

Am I supposed to be bisecting mainline kernel or the Ubuntu kernel?

That page seems a bit out of date, I'm guessing I should be cloning git://kernel.ubuntu.com/ubuntu/ubuntu-trusty.git ?

I am somewhat bandwidth challenged at home. Is there a recommended way to fetch this monstrosity over a low-bandwidth, low-reliability link?

Revision history for this message
a1291762 (a1291762) wrote :

In the absence of a reply to my question, I tried downloading various binary kernels to at least narrow down the scope of the bisecting I'll have to do. After trying older releases and having no failures, I unexpectedly found that 14.04 kernels only cover 3.11 to 3.13. But I'm running 14.04 and I'm on kernel 3.19?! It seems the 14.04.x updates include kernels from newer releases. So this bug is much more recent than I originally anticipated. How recent? Vivid recent.

Kernel 3.19.0-7 is good.
Kernel 3.19.0-51 is bad.

That cuts way down on the search space. I might try some more binary kernels in this range while I sort out how to do source builds (in case there are more commits than binaries - which is what I'm expecting).

I think this means I'll be able to download git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git to get the appropriate tags to bisect from. I found a hint to use fetch --depth=1, then fetch --depth=1000, etc. to incrementally fetch a large repo.

Revision history for this message
a1291762 (a1291762) wrote :

Sigh. I was forgetting about the linux-image-extra package. This proved to make a difference. Indeed, the post-Saucy versions that were succeeding were only doing so because the image-extra package was missing. Strangely, the USB dongle worked without that package but clearly there's an interaction with something else that needs a module from that package.

Armed with this new knowledge, I re-tested some of the builds I had tested previously (now with image-extra) and found this.

3.9.0-0 - good*
3.11.0-26 - bad

These are the first and last kernels of Saucy. The first one does not have an image-extra (the image package is larger). The last one does have an image-extra. I guess some time during Saucy, the packaging was changed?

My testing yesterday was booting with the USB dongle inserted (so that networking config could be applied during boot, similarly to how I intend to run the system). Today, I have been inserting the dongle after the system boots. I have attached netconsole5.log, which shows the kernel warning/error I got when I plugged the dongle in (on kernel 3.9.0-0). Note that despite this output, the system continued to run just fine.

Revision history for this message
a1291762 (a1291762) wrote :

I have narrowed it further to these builds.

3.9.0-7.15 - good
3.10.0-0.6 - bad

I've managed to fetch the saucy repo. git says there's around 14,000 commits between these two tags?!

I'm going to first verify that when I build both of these tags I get the same behaviour as I do with the binary builds. Then I'll start the git bisect properly.

Unfortunately, as slow as my internet is, downloading kernels is quite a bit faster than building them appears to be :(

Revision history for this message
a1291762 (a1291762) wrote :

The good news is that I have identified the commit that breaks my system. It is 313a58e487ab3eb80e7e1f9baddc75968288aad9 (the first of 3 commits to the pegasus driver). I have attached the bisect log in case it's interesting.

Here's some notes on my process...

The bisecting was entirely in commits that lacked the debian directories so I used the patches from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-saucy/ to restore them. Most of the time, the kernel asked about configs that were new. I just accepted the defaults.

My build/run process went like this.

# remove old kernels
COLUMNS= dpkg -l | awk '/linux-image/{print $2}' | grep -v '3\.19' | grep -v lts | xargs sudo apt-get -y remove --purge
# backup old packages
mv ../*.deb ../old
# so the patch works
rm -rf debian*
# depending on the previous test
git bisect good/bad
# so I can build
for file in ../3.9/*.patch; do patch -p1 < $file; done
# set the version to something useful (3.9.0-0+lrx)
vi debian.master/changelog
chmod a+x debian/rules debian/scripts/* debian/scripts/misc/*
fakeroot debian/rules clean
fakeroot debian/rules binary-headers binary-generic
sudo dpkg -i ../linux-image-*
sudo reboot
# test by booting from the kernel, plugging in the USB dongle and pinging the gateway

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
a1291762 (a1291762) wrote :

Another data point for consideration... (I didn't think of this until late last night)

git checkout Ubuntu-3.10.0-0.6
git revert 313a58e487ab3eb80e7e1f9baddc75968288aad9
# resolve a header conflict - nothing major

After building this, I was able to use the network dongle just fine (ie. second and third pegasus-related commits were ok, one of them even removed that warning when inserting the device).

I had a look at the commit and it looks ok...

I'm guessing from the change in code that the "pool" was not necessary, and that the extra skb buffer objects went unused most of the time. However, I believe they hid a buffer overrun.

The skb buffer was previously allocated as PEGASUS_MTU + 2. The new buffer is allocated as PEGASUS_MTU (with optional padding for alignment purposes). From what I can see, this should make the usable buffer consistently smaller than the old structure allowed for (by 2 bytes).

The real problem though, is that when reading into the skb buffer, the driver uses PEGASUS_MTU + 8 as the buffer size, a pretty clear buffer overrun to me. I don't know how the kernel is allocating this memory but I guess the underlying allocation pattern of the "pool" code was able to hide the effect of the overrun and this change merely exposed it by changing the allocation pattern.

Indeed, the attached patch (which passes the correct buffer size to the read function) also fixes things. I suppose this is the fix that should be committed.

I guess this needs to go upstream to the kernel guys though?

tags: added: patch
penalvch (penalvch)
tags: added: bisect-done
removed: needs-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
a1291762 (a1291762) wrote :

I downloaded the Ubuntu 4.6.0 rc2 image (4.6.0-040600rc2-generic) and confirmed the issue remains (as with the last mainline build I tested, it does not show up as quickly, I guess something about the memory layout has changed).
I downloaded current linux HEAD source and verified the code still looks incorrect (it does).
I downloaded a pre-git Linux history repo (from the internet archive) and traced the bad code to commit ef1bba7ed87a4059ee725565689a1706e9f11534. The driver maintainer seems to be the same person, maybe they have some recollection that helps because the change itself does not enlighten me.

Revision history for this message
penalvch (penalvch) wrote :

a1291762, the issue you are reporting is an upstream one. Could you please report this problem following the instructions verbatim at https://wiki.ubuntu.com/Bugs/Upstream/kernel to the appropriate mailing list (TO Petko Manolov CC linux-usb)?

Please provide a direct URL to your post to the mailing list when it becomes available so that it may be tracked.

Thank you for your understanding.

tags: added: kernel-bug-exists-upstream-4.6-rc2
removed: kernel-bug-exists-upstream-4.5
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
a1291762 (a1291762) wrote :

Sorry for the delay in posting this. It took forever to actually show up. Here's my post to the mailing list. http://marc.info/?l=linux-usb&m=145985232019173&w=2

I've also been in contact with the maintainer off-list (he got me to test another change).

Revision history for this message
a1291762 (a1291762) wrote :
Revision history for this message
a1291762 (a1291762) wrote :
penalvch (penalvch)
tags: added: cherry-pick
To post a comment you must log in.