Ubuntu

[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Reported by Chris Jones on 2008-09-01
500
This bug affects 14 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Fedora)
Fix Released
Unknown
linux (Gentoo Linux)
Fix Released
Medium
linux (Mandriva)
Fix Released
Critical
linux (Suse)
Fix Released
Unknown
linux (Ubuntu)
Critical
Tim Gardner
Intrepid
Critical
Tim Gardner
linux-lpia (Ubuntu)
Critical
Amit Kucheria
Intrepid
Critical
Amit Kucheria

Bug Description

In some circumstances it appears possible for the 2.6.27-rc kernels to corrupt the NVRAM used by some Intel network parts to store data such as MAC addresses.
This is limited to the new e1000e driver, and reports have only appeared from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote Intel). The reports seem to be isolated to laptops, but it is not clear if this is because desktop/server parts are not vulnerable, or if use cases simply increase the chances of laptop users being hit.

Once this corruption has occurred, recovery may be possible via a BIOS update, but may well require replacement of the hardware. Use of Intel's IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the point where the network part will no longer appear on the PCI bus.

(this is a new description, the original one was based on too much guesswork. Below are the URLs originally referenced)
(the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases, so if your network is not working, it doesn't have to be damaged, but just disabled in order to prevent any accidents until this bug is solved, don't wary!)
http://www.blahonga.org/~art/rant.html (search for "em0")
http://<email address hidden>/msg00360.html
http://<email address hidden>/msg00398.html

Related branches

Chris Jones (cmsj) on 2008-09-01
Changed in linux:
importance: Undecided → Critical
Chris Jones (cmsj) wrote :

I'm wondering if it would be possible for us to patch out the sections of the driver which write to the NVRAM, assuming Intel are not able to make suitable changes before 2.6.27 is released, which prevent this from being possible (e.g. splitting the writing parts out into a separate module which is not loaded by default?)

Ben Collins (ben-collins) wrote :

Removed the regression-2.6.27 tag from this. The 2.6.26 kernel and 2.6.27 kernel have the exact same e1000e driver (one which we downloaded from Intel's e1000 sf.net project).

Still a serious issue, but I don't want it to be classified as a regression.

Chris Jones (cmsj) wrote :

http://marc.info/?t=122038337000003&r=1&w=2 is another interesting thread about this, on linux-netdev.

Hi Chris,

Just an update here in case you missed chatter in #kernel on Sept 03, tim has already began investigating this issue.

Changed in linux:
assignee: nobody → timg-tpi
status: New → Triaged
Yingying Zhao (yingying-zhao) wrote :

We just met a similar issue in the testing for Intrepid Alpha5. In the beginning, the LAN works fine for x86 system. But after we met a system hangs up in X86_64 system (caused by gfx) in the same machine,we found the Ethernet card can't work any more. "lspci" can't show the correct Ethernet card info. The X86 system which e1000e works before can't recognize the card neither.

Our investigation is underway now.

Changed in linux:
status: Unknown → Incomplete
Changed in linux:
status: Unknown → Confirmed
Changed in linux:
status: Unknown → Confirmed
Jeffrey Baker (jwbaker) wrote :

This is just my humble opinion, but the Alpha CD downloads should be pulled from the archive. This kernel can partially ruin your hardware, and unsuspecting users shouldn't be able to merrily download it.

Steve Langasek (vorlon) wrote :

Jorge brought this bug to my attention just now; this really needs to be fixed one way or another for beta, even if that would mean blacklisting e1000e altogether until this is resolved. Even with as little as I use the wired ethernet on my laptop, I wouldn't enjoy having to RMA it to fix it after a kernel bug. :/

Changed in linux:
milestone: none → ubuntu-8.10-beta
Colin Watson (cjwatson) wrote :

Jeffrey, we can't afford to do that; we need to be able to test with the Alpha CDs on the wide variety of hardware not affected by this bug, or our development schedules for 8.10 will be seriously compromised. However, I'd be happy to add a warning to the cdimage web pages. Can anyone suggest some text?

Alacrityathome (alacrityathome) wrote :

Colin,

Seems that a warning may be insufficient. I would think most of the folks testing a pre-release may not know they have an e1000e driver or affected NIC.

Maybe blacklist e1000e asap and then re-instate e1000e after a fix is found.

Perhaps have the "warning" state something about the e1000e being temporarily withheld from the pre-release with certain Intel NICs affected.

John

Is Ubuntu willing to risk the liability of distributing software known to destroy hardware?

Scruffynerf (scruffynerf) wrote :

Unless Canonical wants liability for
a) Individual user's destroyed hardware
b) Crippling reputation damages, especially against the 'new to linux' groups
I'd echo the suggestion to pull the liveCD's until this is fixed.

When new linux users discovered permanently corrupted hardware after trying Ubuntu, and this gets out in the wider webs, all of Ubuntu's efforts at promoting Ubuntu will also be destroyed.

Breaking known good hardware is a problem greater than keeping to a self-imposed delivery schedule.

eentonig (eentonig) wrote :

It's alpha software, people should be considered as being aware that using it might break stuff.

Furthermore, people should be smart enough to read about the known issues prior to installing it.

Yes a warning and blacklisting the e1000 driver should be done, but revoking an alpha because of a (serious) bug just doesn't seem the answer to me, because it blocks you from finding other issues that might bite people when the official release gets out.

Chris Jones (cmsj) wrote :

Colin: FWIW, I think some kind of warning on cdimage and in the alpha release notes seems highly prudent (not because of the bogus liability claims here, but just because it's the good thing to do). I would suggest:

"Due to an unresolved bug in the Linux kernel currently used in Ubuntu 8.10 users with Intel network hardware supported by the e1000e driver should not download and run these images. Doing so may render your network hardware permanently inoperable.
Older Intel network hardware which uses the e1000 driver is not affected by this, however, use of the e1000 driver in older Ubuntu releases is not a reliable indication of which driver will be used by Ubuntu 8.10. Support for hardware which uses a PCI Express bus has been moved from e1000 to e1000e. If in doubt, do not run these images and subscribe to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555 to receive notifications when the bug is fixed."

Steve: I am not sure exactly where the responsibility for handling this for the Alphas falls (other than being quite sure it's not mine, and suspecting it's yours ;) but I think we should put warnings out fairly prominently, as SuSE has done. The obvious safe default would be to yank e1000e.ko and replace the above warning with something similar which explains why newer Intel network hardware won't work in the Alphas. It's a bit of a nuclear option since there is a lot of this hardware around and the bug mostly seems to be affecting laptops, but since they tend to be doing a lot more "interesting" kernel work (suspending, frequent loading of modules, etc) it could simply be that they are exposing it more easily and server hardware is just as capable of being affected.

For those wishing to discuss this bug, its implications, etc. there is a forum thread which seems more suitable for this, see: http://ubuntuforums.org/showthread.php?t=912666

Arnd (arnd-arndnet) wrote :

> It's alpha software, people should be considered as being aware that using it might break stuff.

That's absolutely ridiculous. I'm being aware that ubuntu alpha or beta can break some stuff (like eating my filesystem or deleting my partitions etc). In fact it already did. However, this is a whole different thing as BREAKING HARDWARE.
To make this clear, we are talking about RMAing laptops and mainboards because of this bug. And we are also talking about reasonable popular hardware. With the statement standing in the room that I should espect my hardware to die when I try ubuntu I will certainly don't try any alpha or beta ubuntu software ever again.

In my opinion the Alpha should be pulled NOW. Then you can discuss what steps have to be make to address this problem. (e.g. one easy sollution: disable e1000 and republish) This won't you cost more than a few days.

Just my 2 cents

Christian Wolf (christianwolf) wrote :

Folks,

I suggest to remove the respective Intrepid AlphaX images from the mirrors ASAP.

Although testing is testing, and everybody knows that there is a risk (I remember a similar issue with Mandrake Linux and CD-Rom drives) and you, as a tester, take a known risk, we also have the responsibility to minimize impact of this issue.

I think only the latest Alpha6 has this flaw?

John Dong (jdong) wrote :

Shall we pull in e1000e-prevent-corruption-of-eeprom-nvm.patch? It seems from the discussion that it isn't a 100% fix (other methods of reaching mmio'ed EEPROM probably exist) but should at least eliminate this disaster scenario of just booting up the distribution causing the card to be hosed.

John Dong wrote:
> Shall we pull in e1000e-prevent-corruption-of-eeprom-nvm.patch? It
> seems from the discussion that it isn't a 100% fix (other methods of
> reaching mmio'ed EEPROM probably exist) but should at least eliminate
> this disaster scenario of just booting up the distribution causing
> the card to be hosed.

no, this patch is for e1000, and has nothing to do with this problem.
Right now, the only reports of this issue are with 82566 and 82567 based
LAN parts (ich8 and ich9).

the eeprom is not MMIO mapped, the registers for accessing it are. I'm
still not clear if a random write to a memory location could corrupt
things, we'll be looking at that today.

Chris Jones (cmsj) on 2008-09-23
description: updated

>http://www.ubuntu.com/testing/intrepid/alpha6

No warning

>http://cdimage.ubuntu.com/releases/intrepid/alpha-6/

No warning

>http://cdimage.ubuntu.com/releases/intrepid/alpha-6/intrepid-desktop-i386.iso

Download works.

How many people test these iso's? How many of them are using an intel motherboard? (10%-40% ?)
How many will ever test again if testing an ISO means you are frying your motherboard.

This isn't a blame game. But if top priority is not removing the alpha, it will very soon be...

We are talking about thousands, if not millions, of laptops and pc's that will be broken beyond repair, if I'm not mistaken...

And some people actually say things like:

>Jeffrey, we can't afford to do that; we need to be able to test with the Alpha CDs on the wide variety of hardware not affected by this bug,

Don't you get it. If you don't pull now, NOBODY WILL TEST THE NEXT VERSION.
There will be no NEXT VERSION because nobody DARES to install it.

>It's alpha software, people should be considered as being aware that using it might break stuff.

Yes, it may corrupt data. [but if it does that beyond its own partition; it should be a big issue as well]
But BREAKING hardware?

I'm quite sure that afterwards some quality control and reflection .. that there will be SOME policy to prevent these mistakes (NOT TAKING THE IMAGE DOWN) ..

But it will be too late.

PULL THE IMAGE: THEN DISCUSS!

abingham (abingham) wrote :

It's been almost a day since discussion on this issue resumed.

The Alpha 6 image are still up with no warning present.

I always assume that an Alpha or Beta release may break things to where I need to reinstall the OS. Battery life could be bad. Etc. But this is literally capable of *destroying* peoples hardware. It's a whole different ball game. Even the LiveCDs are affected, and people testing them can reasonably assume there will be no hardware impact on their system even if it is an Alpha.

These images need to be pulled from availablility *now*. Major mirror sites need to be notified.

If the release becomes '8.11' instead of '8.10' because of it, so what. We are talking about destroying motherboards here. Replacing a laptop motherboard can cost > $500.

If this is not dealt with, I will no longer be able to recommend Ubuntu to friends and family. The attitude of 'release on time at all costs' already caused many issues with 8.04, and now people are seriously suggesting continuing distribution of disc images that literally destroy hardware?

Jeffrey Baker (jwbaker) wrote :

There's no reason to be hysterical, but a re-spin of Alpha 6 CDs without the e1000e module may be called for. This is a separate bug, but the recommended workaround of adding "blacklist e1000e" in /etc/modprobe.d/blacklist doesn't work. Somehow, udev or some other thing manages to load it anyway. I had to unlink it.

abingham (abingham) wrote :

Intel has ~80% CPU market share and >=70% of the chipset market for their own CPU.

So at least 56% of machines sold are Intel CPUs with Intel chipsets that are susceptible to this bug.

1 in 2 of Ubuntu testers could be vulnerable to this.

I strongly recommend if you are going to test for this bug or haven't seen it
yet on your ich8/9 system, that you RIGHT NOW, do ethtool -e ethX >
savemyeep.txt

Having a saved copy of your eeprom means we can help you write it back to your
system.

For those that might be interested in testing the Alpha, but has an at risk
machine, is there an accepted workaround that removes the e1000e driver
without jeopardizing the hardware?

okay, lets just use the data we *have* now. What we know is that some
users have reported a corrupt NVM. Intel networking does not have a
current reproduction but is *fully engaged* on trying to solve this
problem. We have only had reports on 82566 and 82567 based machines, no
others. Trying to extrapolate this out to "1 of 2" users is just fear
mongering.

These kernels being released with this problem are still in alpha/beta,
which means our testing audience is smaller, but so is the potential
impact of any problem.

The process is working as far as I can see, we have a set of users that
is reporting the problem, which will help keep the kernels with the
issue from being promoted to full production status.

If you have some useful data to add to this bug, please comment, we're
listening. I think the discussion about pulling alpha cds or whatever
should go to some mailing list, and not be inside this bug.

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of
abingham
Sent: Tuesday, September 23, 2008 9:55 AM
To: Brandeburg, Jesse
Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel
ICH8and ICH9 gigE chipsets at risk

Intel has ~80% CPU market share and >=70% of the chipset market for
their own CPU.

So at least 56% of machines sold are Intel CPUs with Intel chipsets that
are susceptible to this bug.

1 in 2 of Ubuntu testers could be vulnerable to this.

--
[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets
at risk
https://bugs.launchpad.net/bugs/263555
You received this bug notification because you are a direct subscriber
of the bug.

Status in The Linux Kernel: Confirmed
Status in "linux" source package in Ubuntu: Triaged
Status in linux in Ubuntu Intrepid: Triaged
Status in "linux" source package in Fedora: Confirmed
Status in "linux" source package in Suse: Incomplete

Bug description:
In some circumstances it appears possible for the 2.6.27-rc kernels to
corrupt the NVRAM used by some Intel network parts to store data such as
MAC addresses.
This is limited to the new e1000e driver, and reports have only appeared
from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to
quote Intel). The reports seem to be isolated to laptops, but it is not
clear if this is because desktop/server parts are not vulnerable, or if
use cases simply increase the chances of laptop users being hit.

Once this corruption has occurred, recovery may be possible via a BIOS
update, but may well require replacement of the hardware. Use of Intel's
IABUTIL.EXE is strongly discouraged, as it will worsen the problem to
the point where the network part will no longer appear on the PCI bus.

(this is a new description, the original one was based on too much
guesswork. Below are the URLs originally referenced)

http://www.blahonga.org/~art/rant.html (search for "em0")
http://<email address hidden>/msg00360.h
tml
http://<email address hidden>/msg00398.h
tml

Tim Gardner (timg-tpi) wrote :

Uploaded module-init-tools_3.3-pre11-4ubuntu10 to temporarily blacklist e1000e.

Harry (harry2o) wrote :

Even if it means making myself look like an idiot: May I also suggest publishing (maybe along with the warnings) hints about how to find out whether your hardware is / will be / might be affected. And when does that eeprom writing actually happen - at boot time, when using the lan interface, or elsewhen?

I had the duplicate bug 272630 and consider myself lucky. I had the Intel NIC but had used Alpha 5. I only had the dmesg error and not the hardware eeprom failure. I had Alpha 6 ready to test until I found this bug thread. For folks like me, it will be a good decision to blacklist e1000e pending a resolution. Most 1st time Alpha testers would not be as lucky or have the time to seek out a full bug thread.

>The process is working as far as I can see, we have a set of users that
is reporting the problem, which will help keep the kernels with the
issue from being promoted to full production status.

I'm sorry .. will you buy these a new laptop? Ifso, then the proccess is working.

Try advertising: there is a 50% chance that this will destroy your hardware.
How many testers will you have got left? ZERO.

THE IMPLIED RISK OF TESTING ALPHA SOFTWARE IS DATA CORRUPTION.

What's next? THe machiene blows up and kills people?
Would that be a reason to remove a machiene-destroying cd-image?

YES, its' not Ubuntu's fault .. the alpha is shipped. The procces is working, but the procces is not done until somebody QUICKLY removes the image.

No tester signed up for this; and I for sure will not ever put an alpha disc into one of my machienes. I'll even wait a couple of months after the release.

It's not that this happened. It's that afterwards official developers consider loosing half the hardware of every volunteers that test a sane and good move. Just part of the proccess.

This discussion SHOULD NOT BE MOVED TO A MAILING, until the CD IMAGE IS GONE.

I know this is not the UBUNTU code of conduct. But so is WILLFULLY DAMAGE PEOPLE"S PROPERTY.

You know of the BUG, REMOVE THE IMAGE.

Changed in linux:
status: Incomplete → In Progress
Changed in linux:
status: Confirmed → In Progress
M. Salivar (mfsalivar) wrote :

One alpha tester has already been lost, at least partially. When the problem occurred I sucked it up and said to myself, you know, it is alpha. Things like this shouldn't happen, but they do from time to time. Now that I see the indifference of Ubuntu devs to people losing their hardware, and even worse, to the extreme likelihood of more people losing theirs because of a stupidly strict adherence to release schedules, I'll never test an alpha outside of a virtual machine again (your loss, not mine). I'm not sure yet, but I may be through with Ubuntu, period.

You should pull all the current alphas, and quickly release an alpha 7 with the e1000e module removed or an older kernel. It's the only reasonable thing to do. Pulling the alphas and waiting for a fix will cause too many delays, but leaving up the current alphas is just plain immoral.

Thomas McKay (tom-mckay1) wrote :

I for one second the motion to remove the alpha images until they have been prepared so that there is no risk of hardware damage. I would be seriously concerned about the negative effects of ignoring this issue.

There is a liability on Canonical for the distribution of software which causes permanent and irreparable damage. The problem has been identified, and by not shielding their customers from this issue is nothing less than WILLFUL NEGLECT.

REMOVE THE IMAGES NOW, and re-issue them when the driver has been blacklisted.

-Zeus- (matthew-momjian) wrote :

Thomas, why do you feel that the present warnings are not enough? I for one feel that they are certainly sufficient warning to people running those chipsets to not download them. Also, this isn't a democracy; we don't vote on whether to pull images or not. That's up to the core developers/Canonical.

Thomas McKay (tom-mckay1) wrote :

Why do i not feel the warnings are enough?

for one, because users who download VIA bittorrent will never see those warnings, and like somebody above noted, not all testers know for certain the hardware they are running. Simply saying "if your computer breaks, tough luck, we warned you" will not garner any respect among linux users. The alpha testers are doing canonical a great service, and taking that for granted would be a shame.

I'm not saying it's a democracy, i am just warning of the consequences of ignoring this issue and bricking people's computers.

Ing0R (ing0r) wrote :

I think a warning (with Jesse Brandeburg's advice) should go to *every* tester who is running such a system.
I just read about it on a computer news site and I wish I got this news form canonical (maybe via update manager)

Ing0R

Thomas McKay (tom-mckay1) wrote :

Not to mention the thousands of people who downloaded these images before the warning was posted.

Daniel Kulesz (kuleszdl) wrote :

I was really shocked to see that it took more than one day between the Issue becoming apparent and the warning being placed on the website. This is a very serious issue and can cause severe damage on really expensive hardware (i.e. most recent Lenovo Thinkpads like the X200, X301, T400, T500 and so on).

Also, please be also aware, that some testers might simply change their old download URL from a download manager and increment from 5 to 6 - therefore I really suggest to at least move the images away to some different place and replace the ISO files with textfiles containing the same warning together with the real download location.

Is there any way to issue a warning to all the testers who are already using or began downloading the ISO? Does the installer query some URL through which the warning could be injected?

The only place where I can find official download links to the
BitTorrent is at http://cdimage.ubuntu.com/releases/intrepid/alpha-6/,
where the warning exists.

Chris Jones (cmsj) wrote :

Please listen to what Jesse said in comment #24. This is a bug report, not a discussion forum.

Calls for ISOs to be pulled, legal claims, accusations and use of capslock should be on the ubuntu-devel or ubuntu-devel-discuss mailing list. For one thing, by posting to those lists your opinion will be seen by a much wider audience.

The people subscribed to this bug have either been affected by this bug (such as myself, I filed this bug), or are trying to fix it.

Yelling at those people (which is what a number of you are doing) will solve nothing. Stop it please. The only relevant discussion here is that which is gathering information about the bug, or attempting to fix it.

I appreciate this is a contentious issue (since my laptop was affected by this), but I want to read about progress, I don't want to read lots of ranting. I also don't want this post to be perceived as negative, or whining, or whatever. I fully sympathise with people who are trying to protect their fellow users from harm, and in that respect I apologise for not shouting more about this bug when it was first uncovered. All I did was make sure as many of the people as possible who could fix it, knew about it.
If you would like to argue with me about this, please do not do it here, email me personally (see my Launchpad overview page for my addresses) or via <email address hidden>.

Michael W. (hotdog003-gmail) wrote :

If we don't pull the images (we should, but I won't comment since it's already being discussed), it might be a good idea to at least make the words "permanently inoperable" on the Alpha 6 testing page in big, bold letters so users have less of a chance to skim over that part.

Think about it: How many times do we read warning labels on the stuff we eat? My point exactly. Having a "WARNING" section on a testing page where people are already expecting things not to work perfectly might not be an accurate indicator of exactly how grave this problem really is.

I think we should do everything in our power to at least let users know what they're dealing with here. Somehow, we've managed to produce a stick of dynamite with a lit fuse. A lot of people are expecting testing images to be imperfect and may skip right over the warning section because they already know the typical "This is just alpha software, hopefully nothing major will happen" lecture that warning sections typically give them. Making "permanently inoperable" in bold letters will make it much more eye-catching than it is now.

Changed in linux:
status: Unknown → Confirmed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.27-4.6

---------------
linux (2.6.27-4.6) intrepid; urgency=low

  [ Tim Gardner ]

  * Disable e1000e until the NVRAM corruption problem is found.
    - LP: #263555

  [ Upstream Kernel Changes ]

  * Revert "[Bluetooth] Eliminate checks for impossible conditions in IRQ
    handler"

 -- Ben Collins <email address hidden> Tue, 23 Sep 2008 09:53:57 -0400

Changed in linux:
status: Triaged → Fix Released
Jojo (kuzniarpawel) wrote :

according to http://groups.google.com/group/linux.kernel/browse_thread/thread/a5ef7deff8551186/d05c233ecb430178

this bug might be related to xorg and Intel graphics.

I used e1000e for 5 days with lot of traffic on eth0 and nothing happened (luck?) but I have T61p wit NV Quadro

William Grant (wgrant) on 2008-09-24
Changed in linux:
status: Fix Released → In Progress
Changed in linux:
status: Confirmed → Fix Committed
Changed in linux:
status: Unknown → Confirmed
Changed in linux:
status: In Progress → Incomplete
Shwan (shwan-ciyako) on 2008-09-27
description: updated
Changed in linux:
status: Fix Committed → Confirmed
Tim Gardner (timg-tpi) on 2008-09-30
Changed in linux:
status: In Progress → Fix Committed
Matt Zimmerman (mdz) on 2008-09-30
Changed in linux:
milestone: ubuntu-8.10-beta → none
Steve Langasek (vorlon) on 2008-09-30
Changed in linux:
milestone: none → ubuntu-8.10
Changed in linux:
status: Incomplete → In Progress
Changed in linux:
status: Fix Committed → Fix Released
Michael Losonsky (michl) on 2008-10-03
Changed in linux:
status: Fix Released → In Progress
Changed in linux:
status: Confirmed → Fix Released
Colin Watson (cjwatson) on 2008-10-03
Changed in linux:
status: In Progress → Fix Released
Changed in linux:
status: In Progress → Fix Released
118 comments hidden view all 198 comments

sorry .. i miss to read previous comment, i done it manually now i get from dmesg

[ 25.596061] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4
[ 25.596065] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 25.596142] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 25.596159] e1000e 0000:00:19.0: setting latency timer to 64
[ 25.658013] e1000e 0000:00:19.0: PCI INT A disabled
[ 25.673108] e1000e: probe of 0000:00:19.0 failed with error -5

mrbean71 (m-marti) wrote :

I don't know if open a new bug. I think e1000e have some problem here result of furter investigation.
Same behaviour from 27.3 and 27.5 (in the last driver is not blacklisted).
Sometime network work out of the box, sometime network work out of the box sometimes I need to:

sudo modprobe -r e1000e
sudo modprobe e1000e

now I'm writing with 27.5 kernel and removed added driver two times to make it work.

Benjamin Prosnitz (aetherane) wrote :

For those who got this working on the 2.6.27-5.4 kernel. Does the newer
2.6.27-5.5 kernel that is in the repositories now also work for you?

On Mon, Oct 6, 2008 at 7:10 AM, mrbean71 <email address hidden> wrote:

> I don't know if open a new bug. I think e1000e have some problem here
> result of furter investigation.
> Same behaviour from 27.3 and 27.5 (in the last driver is not blacklisted).
> Sometime network work out of the box, sometime network work out of the box
> sometimes I need to:
>
> sudo modprobe -r e1000e
> sudo modprobe e1000e
>
> now I'm writing with 27.5 kernel and removed added driver two times to
> make it work.
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>

I confirm that the new binaries are working for me.
uname -r
2.6.27-5-generic

Thanks a lot for the great job!

desertoak (danielc-brikks) wrote :

"The fix will be included in the subsequent daily images"

Is the fix avalible now in the daily builds?: http://cdimage.ubuntu.com/daily-live/current/
if not when will it be? Or is there another link?

mrbean71 (m-marti) wrote :

Driver seems to be ok, eth0 come up automagically.
Now I think there are problems with knetwork manager: i wrote manually resolv.conf and i have to restar network from a shell to make dns work.
But probably this is another story.

Andrew Tamoney (tamoneya) wrote :

The fix was not in 20081004 but it should be in 20081005 and is definitely in 20081006. Therefore any of the ISO would have the correct kernel.

> The fix was not in 20081004 but it should be in 20081005 and is
> definitely in 20081006. Therefore any of the ISO would have the correct
> kernel.

No, 20081007 will be the first daily image that includes this module again.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Hi,

I confirm that it works for me. I just update the kernel to 2.6.27-5. In a
first moment it seemed to do nothing, but I just did:

sudo ifconfig eth0 down
sudo ifconfig eth0 up

And that was all, my network started to be ok.

Bye,

Juan David Cuevas Guarnizo
Investigador - Grupo GASURE
Tel: +57 4 219 8548
Bloque 19 - Facultad de Ingeniería
Universidad de Antioquia
Medellín - Colombia

"La actividad social de la gente de la universidad debe ser total y
radicalmente ajena a toda actitud de conformismos con la injusticia social,
la desigualdad económica y la opresión intelectual". - Eduardo Umaña Luna.

On Mon, Oct 6, 2008 at 07:10, mrbean71 <email address hidden> wrote:

> I don't know if open a new bug. I think e1000e have some problem here
> result of furter investigation.
> Same behaviour from 27.3 and 27.5 (in the last driver is not blacklisted).
> Sometime network work out of the box, sometime network work out of the box
> sometimes I need to:
>
> sudo modprobe -r e1000e
> sudo modprobe e1000e
>
> now I'm writing with 27.5 kernel and removed added driver two times to
> make it work.
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Fix Released
> Status in "linux" source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
> Status in "linux" source package in Fedora: Confirmed
> Status in "linux" source package in Gentoo Linux: Confirmed
> Status in "linux" source package in Mandriva: Fix Released
> Status in "linux" source package in Suse: In Progress
>
> Bug description:
> In some circumstances it appears possible for the 2.6.27-rc kernels to
> corrupt the NVRAM used by some Intel network parts to store data such as MAC
> addresses.
> This is limited to the new e1000e driver, and reports have only appeared
> from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote
> Intel). The reports seem to be isolated to laptops, but it is not clear if
> this is because desktop/server parts are not vulnerable, or if use cases
> simply increase the chances of laptop users being hit.
>
> Once this corruption has occurred, recovery may be possible via a BIOS
> update, but may well require replacement of the hardware. Use of Intel's
> IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the
> point where the network part will no longer appear on the PCI bus.
>
> (this is a new description, the original one was based on too much
> guesswork. Below are the URLs originally referenced)
> (the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases,
> so if your network is not working, it doesn't have to be damaged, but just
> disabled in order to prevent any accidents until this bug is solved, don't
> wary!)
> http://www.blahonga.org/~art/rant.html<http://www.blahonga.org/%7Eart/rant.html>(search for "em0")
> http://<email address hidden>/msg00360.html
> http://<email address hidden>/msg00398.html
>

Andrew Tamoney (tamoneya) wrote :

the updated kernel is in the 20081006 manifest:
linux-generic 2.6.27.5.5
linux-headers-2.6.27-5 2.6.27-5.8
linux-headers-2.6.27-5-generic 2.6.27-5.8
linux-headers-generic 2.6.27.5.5
linux-image-2.6.27-5-generic 2.6.27-5.8
linux-image-generic 2.6.27.5.5
linux-libc-dev 2.6.27-5.8
linux-restricted-modules-2.6.27-5-generic 2.6.27-5.7
linux-restricted-modules-common 2.6.27-5.7
linux-restricted-modules-generic 2.6.27.5.5
 That worked fine for me.

any ways to fix/patches/ways for eeprom/NVM problem? after upgrade to 2.6.27-5.8, the Ethernet still wont work ?

$ dmesg | grep e1000e
[ 5.604448] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4
[ 5.604456] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 5.604557] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 5.604586] e1000e 0000:00:19.0: setting latency timer to 64
[ 5.683292] e1000e 0000:00:19.0: PCI INT A disabled
[ 5.683335] e1000e: probe of 0000:00:19.0 failed with error -5

the ethernet works well if booting to windows XP ..

same for me:
under Linux, the e1000e driver refuses to load:

388.961230] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
[ 388.961249] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 388.961346] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 388.961374] e1000e 0000:00:19.0: setting latency timer to 64
[ 389.013108] 0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
[ 389.023427] e1000e 0000:00:19.0: PCI INT A disabled
[ 389.023567] e1000e: probe of 0000:00:19.0 failed with error -5

(Khairul, you missed the most imported line becuase of the grep!)

But it works fine under Windows.

uname -a:
Linux michis-ibm 2.6.27-6-generic #1 SMP Tue Oct 7 04:15:04 UTC 2008 i686 GNU/Linux

Changed in linux:
status: Confirmed → In Progress
Changed in linux:
status: In Progress → Fix Released
Simon Sigre (simon-sigre) wrote :

If it helps i am also having this problem aswell; i have included output of lshw aswell
//
Linux penfold 2.6.27-6-generic #1 SMP Tue Oct 7 04:15:04 UTC 2008 i686 GNU/Linux

[ 2.319073] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
[ 2.319076] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 2.319116] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 2.319126] e1000e 0000:00:19.0: setting latency timer to 64
[ 2.377161] e1000e 0000:00:19.0: PCI INT A disabled
[ 2.377199] e1000e: probe of 0000:00:19.0 failed with error -5

        *-network UNCLAIMED
             product: 82566DC-2 Gigabit Network Connection
           *-network
  *-network DISABLED
\\

Download full text (3.3 KiB)

Bounces,

I am filling in for John S while he is on vacation?

What system is this referring to?

Invoice #?

Linux? What version of Linux?

Please advise.

Thanks,

Dan Cashman
7002 S. Revere Parkway
Ste #90
Centennial, CO 80112
720.488.9800
800.381.1083
F- 720.488.9885
<email address hidden>
   www.microsel.com

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Simon Sigre
Sent: Saturday, October 11, 2008 1:15 AM
To: John B. Sobernheim
Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8and ICH9 gigE chipsets at risk

If it helps i am also having this problem aswell; i have included output of lshw aswell
//
Linux penfold 2.6.27-6-generic #1 SMP Tue Oct 7 04:15:04 UTC 2008 i686 GNU/Linux

[ 2.319073] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
[ 2.319076] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 2.319116] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 2.319126] e1000e 0000:00:19.0: setting latency timer to 64
[ 2.377161] e1000e 0000:00:19.0: PCI INT A disabled
[ 2.377199] e1000e: probe of 0000:00:19.0 failed with error -5

        *-network UNCLAIMED
             product: 82566DC-2 Gigabit Network Connection
           *-network
  *-network DISABLED
\\

--
[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk
https://bugs.launchpad.net/bugs/263555
You received this bug notification because you are a direct subscriber
of the bug.

Status in The Linux Kernel: Fix Released
Status in “linux” source package in Ubuntu: Fix Released
Status in linux in Ubuntu Intrepid: Fix Released
Status in “linux” source package in Fedora: Fix Released
Status in “linux” source package in Gentoo Linux: Confirmed
Status in “linux” source package in Mandriva: Fix Released
Status in “linux” source package in Suse: In Progress

Bug description:
In some circumstances it appears possible for the 2.6.27-rc kernels to corrupt the NVRAM used by some Intel network parts to store data such as MAC addresses.
This is limited to the new e1000e driver, and reports have only appeared from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote Intel). The reports seem to be isolated to laptops, but it is not clear if this is because desktop/server parts are not vulnerable, or if use cases simply increase the chances of laptop users being hit.

Once this corruption has occurred, recovery may be possible via a BIOS update, but may well require replacement of the hardware. Use of Intel's IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the point where the network part will no longer appear on the PCI bus.

(this is a new description, the original one was based on too much guesswork. Below are the URLs originally referenced)
(the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases, so if your network is not working, it doesn't have to be damaged, but just disabled in order to prevent any accidents until this bug is solved, don't wary!)
http://www.blahonga.org/~art/rant.html (search for "em0")
http://w...

Read more...

Craig (candrews-integralblue) wrote :
Download full text (3.8 KiB)

I have been active in that launchpad bug, but I'm not sure who John S is.

This information is about Ubuntu Linux, 8.10 (which is in testing, and has
not been released). This bug has since been resolved.

~Craig

On Mon, October 13, 2008 12:56 pm, Dan Cashman wrote:
> Bounces,
>
>
> I am filling in for John S while he is on vacation?
>
>
> What system is this referring to?
>
>
> Invoice #?
>
>
> Linux? What version of Linux?
>
>
> Please advise.
>
>
> Thanks,
>
>
>
> Dan Cashman
> 7002 S. Revere Parkway
> Ste #90
> Centennial, CO 80112
> 720.488.9800
> 800.381.1083
> F- 720.488.9885
> <email address hidden>    www.microsel.com
> Â
>
>
> -----Original Message-----
> From: <email address hidden> [mailto:<email address hidden>] On Behalf Of
> Simon Sigre
> Sent: Saturday, October 11, 2008 1:15 AM
> To: John B. Sobernheim
> Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel
> ICH8and ICH9 gigE chipsets at risk
>
>
> If it helps i am also having this problem aswell; i have included output
> of lshw aswell //
> Linux penfold 2.6.27-6-generic #1 SMP Tue Oct 7 04:15:04 UTC 2008 i686
> GNU/Linux
>
>
> [ 2.319073] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
> [ 2.319076] e1000e: Copyright (c) 1999-2008 Intel Corporation.
> [ 2.319116] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) ->
> IRQ 20
> [ 2.319126] e1000e 0000:00:19.0: setting latency timer to 64
> [ 2.377161] e1000e 0000:00:19.0: PCI INT A disabled
> [ 2.377199] e1000e: probe of 0000:00:19.0 failed with error -5
>
>
> *-network UNCLAIMED
> product: 82566DC-2 Gigabit Network Connection
> *-network
> *-network DISABLED
> \\
>
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets
> at risk https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Fix Released
> Status in “linux” source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
> Status in “linux” source package in Fedora: Fix Released
> Status in “linux” source package in Gentoo Linux: Confirmed
> Status in “linux” source package in Mandriva: Fix Released
> Status in “linux” source package in Suse: In Progress
>
>
> Bug description:
> In some circumstances it appears possible for the 2.6.27-rc kernels to
> corrupt the NVRAM used by some Intel network parts to store data such as
> MAC addresses.
> This is limited to the new e1000e driver, and reports have only appeared
> from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote
> Intel). The reports seem to be isolated to laptops, but it is not clear
> if this is because desktop/server parts are not vulnerable, or if use
> cases simply increase the chances of laptop users being hit.
>
> Once this corruption has occurred, recovery may be possible via a BIOS
> update, but may well require replacement of the hardware. Use of Intel's
> IABUTIL.EXE is strongly discouraged, as it will worsen the problem to
> the point where the network part will no longer appear on the PCI bus.
>
> (this is a new description, the original one was based on too muc...

Read more...

Changed in linux:
status: Confirmed → Fix Released

I fixed my network card in my x61t with the Vista drivers from intel.com ( http://downloadcenter.intel.com/Filter_Results.aspx?strOSs=All&strTypes=All&ProductID=2775&lang=eng&OSFullName=All%20Operating%20Systems ). Yust installed them and played with their diagnostic tools in the driver.

Both PXE and normal networking under Linux are working again :-)
I BIOS-upgrade alone did not help (actually, it was a downgrade^^)

Please write something about this solution in the error message about the wrong crc, it would help many people.

Thanks,
Michael Fritscher

Simon Sigre (simon-sigre) wrote :

Michael as im having the same issue; i might try and do the same. Quite disappointing that we have to rely on Windows to bail us out. Ive even tried compiling the Intel drivers under Linux with the same error. Perhaps a live Windows Distro like Barts PE? I dont want to have to put the V1sta disks in and blow away my Ubuntu install.

Try to get a Vista DVD from somebody. You can actually boot them in a sort of Live System, perhaps you can install the drivers in it (the installation does not need a restart).

zika (4zika4) wrote :

hello,

a week ago I have upgraded my Hardy to Intrepid beta on 3 machines. the oldest one is still with Intrepid but two new had to be downgraded to Hardy since network did not work.

is it safe now to upgrade them to Intrepid beta or should I wait for official release?

Changed in linux:
status: In Progress → Fix Released
Yingying Zhao (yingying-zhao) wrote :
Download full text (4.7 KiB)

This is the patch which has passed Intel's testing and with this patch that issue can't be reproduced again now. It looks to be a work-around and with the .28 a fix for the root cause of the problem.

> Date: Wed, 15 Oct 2008 18:21:44 -0400 (EDT)
> From: Steven Rostedt <email address hidden>
> To: LKML <email address hidden>, <email address hidden>
> cc: Linus Torvalds <email address hidden>,
> Andrew Morton <email address hidden>,
> Arjan van de Ven <email address hidden>, <email address hidden>,
> <email address hidden>, Thomas Gleixner <email address hidden>,
> Ingo Molnar <email address hidden>
> Subject: [PATCH -stable] disable CONFIG_DYNAMIC_FTRACE due to possible memory
> corruption on module unload
>
>
> While debugging the e1000e corruption bug with Intel, we discovered
> today that the dynamic ftrace code in mainline is the likely source of
> this bug.
>
> For the stable kernel we are providing the only viable fix patch: labeling
> CONFIG_DYNAMIC_FTRACE as broken. (see the patch below)
>
> We will follow up with a backport patch that contains the fixes. But since
> the fixes are not a one liner, the safest approach for now is to
> disable the code in question.
>
> The cause of the bug is due to the way the current code in mainline
> handles dynamic ftrace. When dynamic ftrace is turned on, it also
> turns on CONFIG_FTRACE which enables the -pg config in gcc that places
> a call to mcount at every function call. With just CONFIG_FTRACE this
> causes a noticeable overhead. CONFIG_DYNAMIC_FTRACE works to ease this
> overhead by dynamically updating the mcount call sites into nops.
>
> The problem arises when we trace functions and modules are unloaded.
> The first time a function is called, it will call mcount and the mcount
> call will call ftrace_record_ip. This records the calling site and
> stores it in a preallocated hash table. Later on a daemon will
> wake up and call kstop_machine and convert any mcount callers into
> nops.
>
> The evolution of this code first tried to do this without the kstop_machine
> and used cmpxchg to update the callers as they were called. But I
> was informed that this is dangerous to do on SMP machines if another
> CPU is running that same code. The solution was to do this with
> kstop_machine.
>
> We still used cmpxchg to test if the code that we are modifying is
> indeed code that we expect to be before updating it - as a final
> line of defense.
>
> But on 32bit machines, ioremapped memory and modules share the same
> address space. When a module would load its code into memory and execute
> some code, that would register the function.
>
> On module unload, ftrace incorrectly did not zap these functions from
> its hash (this was the bug). The cmpxchg could have saved us in most
> cases (via luck) - but with ioremap-ed memory that was exactly the wrong
> thing to do - the results of cmpxchg on device memory are undefined.
> (and will likely result in a write)
>
> The pending .28 ftrace tree does not have this bug anymore, as a general push
> towards more robustness of code patching, this is done differently: we do not
> use cmpxchg and we do a ...

Read more...

jagdfalke (mathias-javafalke) wrote :

Is this Problem resolved now? I think I read somewhere that there already is a patch that you guys from Ubuntu just need to integrate. Is that true? If yes why is it taking so long? (no offense, just curious)

Chris Jones (cmsj) wrote :

jagdfalke: Please see the top of the page, it's marked as "Fix Released"

I have downloaded LiveCD for AMD 64 and for x86 64 yesterday and when
I tried AMD 64 version as a LiveSession I was not able to use network
on a computer that I use now to write this message .... :)) (now I an
writing in Hardy) so it is not yet released AFAIAC.

On 10/17/08, Chris Jones <email address hidden> wrote:
> jagdfalke: Please see the top of the page, it's marked as "Fix Released"
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Fix Released
> Status in "linux" source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
> Status in "linux" source package in Fedora: Fix Released
> Status in "linux" source package in Gentoo Linux: Fix Released
> Status in "linux" source package in Mandriva: Fix Released
> Status in "linux" source package in Suse: Fix Released
>
> Bug description:
> In some circumstances it appears possible for the 2.6.27-rc kernels to
> corrupt the NVRAM used by some Intel network parts to store data such as MAC
> addresses.
> This is limited to the new e1000e driver, and reports have only appeared
> from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote
> Intel). The reports seem to be isolated to laptops, but it is not clear if
> this is because desktop/server parts are not vulnerable, or if use cases
> simply increase the chances of laptop users being hit.
>
> Once this corruption has occurred, recovery may be possible via a BIOS
> update, but may well require replacement of the hardware. Use of Intel's
> IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the
> point where the network part will no longer appear on the PCI bus.
>
> (this is a new description, the original one was based on too much
> guesswork. Below are the URLs originally referenced)
> (the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases,
> so if your network is not working, it doesn't have to be damaged, but just
> disabled in order to prevent any accidents until this bug is solved, don't
> wary!)
> http://www.blahonga.org/~art/rant.html (search for "em0")
> http://<email address hidden>/msg00360.html
> http://<email address hidden>/msg00398.html
>

hefeweiz3n (philschmidt) wrote :

For your Information: Launchpad is NOT a support forum, in future situations please refer to the forums. As for your Problem: The driver is fixed in the current kernel-release, but as the live-cd still ships with the old kernel, networking is of course disabled with this card. I recommend waiting for the final release or downloading the new kernel-packages onto a usb-stick or similar and installing them by hand. As of HOW to do that, please refer to the forums.

zika (4zika4) wrote :

I am very sorry that I have missused this place. I hope that You will
be able to forgive and forget. I am just an old guy ... ;)

I will wait for official release.

Thank You very much.
Once again sorry for the noise.

On 10/17/08, hefeweiz3n <email address hidden> wrote:
> For your Information: Launchpad is NOT a support forum, in future
> situations please refer to the forums. As for your Problem: The driver
> is fixed in the current kernel-release, but as the live-cd still ships
> with the old kernel, networking is of course disabled with this card. I
> recommend waiting for the final release or downloading the new kernel-
> packages onto a usb-stick or similar and installing them by hand. As of
> HOW to do that, please refer to the forums.
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Fix Released
> Status in "linux" source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
> Status in "linux" source package in Fedora: Fix Released
> Status in "linux" source package in Gentoo Linux: Fix Released
> Status in "linux" source package in Mandriva: Fix Released
> Status in "linux" source package in Suse: Fix Released
>
> Bug description:
> In some circumstances it appears possible for the 2.6.27-rc kernels to
> corrupt the NVRAM used by some Intel network parts to store data such as MAC
> addresses.
> This is limited to the new e1000e driver, and reports have only appeared
> from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote
> Intel). The reports seem to be isolated to laptops, but it is not clear if
> this is because desktop/server parts are not vulnerable, or if use cases
> simply increase the chances of laptop users being hit.
>
> Once this corruption has occurred, recovery may be possible via a BIOS
> update, but may well require replacement of the hardware. Use of Intel's
> IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the
> point where the network part will no longer appear on the PCI bus.
>
> (this is a new description, the original one was based on too much
> guesswork. Below are the URLs originally referenced)
> (the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases,
> so if your network is not working, it doesn't have to be damaged, but just
> disabled in order to prevent any accidents until this bug is solved, don't
> wary!)
> http://www.blahonga.org/~art/rant.html (search for "em0")
> http://<email address hidden>/msg00360.html
> http://<email address hidden>/msg00398.html
>

So in the interests of adding some closure to this bug. The issue turns out to
have never been the e1000e driver's fault. The fault lies with the
CONFIG_DYNAMIC_FTRACE option. So specifically when the FTRACE code was
enabled, it was doing a locked cmpxchg instruction on memory that had been
previously used as __INIT code from some other module.

a) some other module loads
b) that module's init code calls into ftrace which stores the EIP
c) that module discards its init code
d) e1000e loads
e) e1000e asks the kernel for memory to ioremap onto, and gets the memory
location of the code at b) and maps the flash/NVM control registers there.
f) ftraced runs and rewrites onto bytes 4-8 of the memory location from b/e
g) since the lock/cmpxchg instruction is undefined for memory mapped registers,
random junk is written to the b/e location
h) depending on the contents of the junk in g) the NVM is either byte corrupted
or block erased, which is detected the next time the e1000e driver is loaded.

a short term workaround is in 2.6.27.1 (disable CONFIG_DYNAMIC_FTRACE) and the
longer term fix is rewrites of the cmpxchg code (which is already done and will
be in 2.6.28-rc1)

I strongly recommend that 2.6.27.1 be picked up in ubuntu immediately

Changed in linux:
status: Fix Released → Confirmed
rostedt (rostedt) wrote :

> So in the interests of adding some closure to this bug. The issue turns out to
> have never been the e1000e driver's fault.

Just to clarify. there were two bugs here. Yes the ftrace code should have been more careful in using cmpxchg, and tried harder to not write into code that might have swapped out (note, 2.6.28 has this fixed).

But the e1000e driver absolutely did have a bug. The driver should never had left open that a random write into it could brick the board. I'm actually glad that ftrace was the culprit. Because it allowed for a consistent reproducer. Just imagine if ftrace did not cause this. Any little bug in the kernel could have brick you card. And guess what? You would be out of luck, because it would be extremely hard to ever reproduce it again.

I'm not denying that ftrace did not have a bug. I just want the record to state, that ftrace was not the only one at fault here.

Amit Kucheria (amitk) on 2008-10-24
Changed in linux-lpia:
assignee: nobody → amitk
importance: Undecided → Critical
milestone: none → ubuntu-8.10
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-lpia - 2.6.27-4.9

---------------
linux-lpia (2.6.27-4.9) intrepid; urgency=low

  [ Amit Kucheria ]

  * SAUCE: Start new release Ignore: yes
  * SAUCE: Add LPIA keyword in front of all our tags
  * SAUCE: Disable DYANMIC_FTRACE
    - LP: #263555
  * SAUCE: Disable ath5k from configs
    - LP: #288148
  * SAUCE: Fix rebase script some more
  * SAUCE: Change default TCP congestion algorithm to cubic
    - LP: #278801
  * SAUCE: Enable vesafb module

 -- Amit Kucheria <email address hidden> Thu, 23 Oct 2008 20:07:26 +0000

Changed in linux-lpia:
status: Fix Committed → Fix Released
Changed in linux:
status: Confirmed → Fix Released

unsubscribe

-John
John B. Sobernheim
Microsel Of Colorado
7002 South Revere Parkway, Suite 90
Centennial, CO 80112
http://www.microsel.com
Phn 720.488.9800 x213 or 800.381.1083
Fax 720.488.9885 Cel 720-317-7587
email:<email address hidden>

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Bug Watch Updater
Sent: Friday, October 24, 2008 8:51 AM
To: John B. Sobernheim
Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8and ICH9 gigE chipsets at risk

** Changed in: linux (Gentoo Linux)
       Status: Confirmed => Fix Released

--
[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk
https://bugs.launchpad.net/bugs/263555
You received this bug notification because you are a direct subscriber
of the bug.

Status in The Linux Kernel: Fix Released
Status in “linux” source package in Ubuntu: Fix Released
Status in “linux-lpia” source package in Ubuntu: Fix Released
Status in linux in Ubuntu Intrepid: Fix Released
Status in linux-lpia in Ubuntu Intrepid: Fix Released
Status in “linux” source package in Fedora: Fix Released
Status in “linux” source package in Gentoo Linux: Fix Released
Status in “linux” source package in Mandriva: Fix Released
Status in “linux” source package in Suse: Fix Released

Bug description:
In some circumstances it appears possible for the 2.6.27-rc kernels to corrupt the NVRAM used by some Intel network parts to store data such as MAC addresses.
This is limited to the new e1000e driver, and reports have only appeared from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote Intel). The reports seem to be isolated to laptops, but it is not clear if this is because desktop/server parts are not vulnerable, or if use cases simply increase the chances of laptop users being hit.

Once this corruption has occurred, recovery may be possible via a BIOS update, but may well require replacement of the hardware. Use of Intel's IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the point where the network part will no longer appear on the PCI bus.

(this is a new description, the original one was based on too much guesswork. Below are the URLs originally referenced)
(the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases, so if your network is not working, it doesn't have to be damaged, but just disabled in order to prevent any accidents until this bug is solved, don't wary!)
http://www.blahonga.org/~art/rant.html (search for "em0")
http://<email address hidden>/msg00360.html
http://<email address hidden>/msg00398.html

Basilisk (bluebal-1) on 2008-11-01
Changed in linux:
assignee: timg-tpi → nobody
assignee: nobody → bluebal-1
William Grant (wgrant) on 2008-11-01
Changed in linux:
assignee: bluebal-1 → timg-tpi
dave graham (david-graham) wrote :

While we should expect no further reports of flash corruptions due to this bug, I would like to know of any systems which did fall foul of the bug, and have not yet had their flash restored. Pleae let me know if you have system that had proper (e1000e) LAN functionality proor to installing a 2.6.27-rc kernel, and lost it while running the rc kernel.

So as not to confuse this bug report, please contact me offline and I will try to help you restore your LAN.

david.graham_at_intel_dot_com

Pieter (diepes) wrote :

Is there a link as to how to recover ?

BeigeGenius (beigegenius) wrote :

Prior to upgrading to kernel 2.6.27-9 No kernel panics were experienced.
However kernel panics seem "random" they seem to occur during high network activity when using the Ethernet device (Intel Corporation 82573L Gigabit Ethernet Controller) which uses the e1000e driver. No reports of hardware dieing yet and hopefully this bug will be limited to kernel panics!

dave graham (david-graham) wrote :

I am still contacted about once per month for instructions on how to recover ethernet functionality on systems that have had their 1Gb flash content corrupted, possibly by this defect.

If you believe that you are affected by this isssue, you can safely perform steps 1 through 5 from the bullet list below, then contact me with the result, from which I will prepare a fully repaired image, and post it back to you. You can then continue with steps 6,7 & 8.

1) Download a CD image of the recovery program (originally created by Karsten Keil formerly of SuSE) from http://e1000.sourceforge.net/e1000e_recover.iso. Please type the address in your browser window and choose "save to fle"- you cannot search for this file.

2) Burn the iso to CD, & boot the CD. When prompted, select “Rescue System”
Linux will load, you’ll see an openSUSE splashscreen, and eventually a login prompt.

3) Log on as root. There's no password, so just hit return.

4) Read the current eeprom and save it to file. Be patient !

       e1000e_nvm -r –d eth0 -o ethtool.dmp

5) mount a USB disk to save the file, and send the file to me david_dot_graham_at_intel_dot_com

I will then fix up the image, and mail it back to you as ethtoola.dmp.
When you receive the updated file:

6) Write the new eeprom content back to your system NVM

            e1000e_nvm –d eth0 -P 108C8086 ethtoola.dmp

7) You will see some warnings, select YES when prompted.

8) You should then be able to remove the recovery CD, and successfully boot back to a working ethernet using Linux, Windows, OpenSolaris, or anything else.

Burned the iso, followed your instructions, but:

            eth0 EEprom len 4096
            checksum ed0e wrong should be 830e

So I can't send you my ethtool.dmp

My notebook is a Fujitsu Siemens Lifebook E8410

Thank you Dave!!!

Download full text (3.3 KiB)

This is unusual, as the

 e1000e_nvm -r –d eth0 -o ethtool.dmp

command normally dumps out the 1Gb portion of the system flash even if it _does_ have a bad checksum, and then I've been fixing the checksum & content. Are you sure that there isn't an ethtool.dmp file created in the local directory from which you ran e1000e_nvm ?

If there really is no ethtool.dmp, please send me

1) lspci -tv
2) lspci -xxx
3) dmesg (that includes the failure of the e1000e driver to load)

and I'll send you an instrumented driver that will dump out the 1Gb flash content, and we may be able to fix it from there.

At least I hope so. As I say, I've fixed a lot of these corruptions, but do not recall seeing this particular failure mode before.

Dave

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of bonsiware
Sent: Wednesday, October 07, 2009 3:26 AM
To: Graham, David
Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Burned the iso, followed your instructions, but:

            eth0 EEprom len 4096
            checksum ed0e wrong should be 830e

So I can't send you my ethtool.dmp

My notebook is a Fujitsu Siemens Lifebook E8410

Thank you Dave!!!

--
[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk
https://bugs.launchpad.net/bugs/263555
You received this bug notification because you are a direct subscriber
of the bug.

Status in The Linux Kernel: Fix Released
Status in “linux” package in Ubuntu: Fix Released
Status in “linux-lpia” package in Ubuntu: Fix Released
Status in linux in Ubuntu Intrepid: Fix Released
Status in linux-lpia in Ubuntu Intrepid: Fix Released
Status in “linux” package in Fedora: Fix Released
Status in “linux” package in Gentoo Linux: Fix Released
Status in “linux” package in Mandriva: Fix Released
Status in “linux” package in Suse: Fix Released

Bug description:
In some circumstances it appears possible for the 2.6.27-rc kernels to corrupt the NVRAM used by some Intel network parts to store data such as MAC addresses.
This is limited to the new e1000e driver, and reports have only appeared from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote Intel). The reports seem to be isolated to laptops, but it is not clear if this is because desktop/server parts are not vulnerable, or if use cases simply increase the chances of laptop users being hit.

Once this corruption has occurred, recovery may be possible via a BIOS update, but may well require replacement of the hardware. Use of Intel's IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the point where the network part will no longer appear on the PCI bus.

(this is a new description, the original one was based on too much guesswork. Below are the URLs originally referenced)
(the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases, so if your network is not working, it doesn't have to be damaged, but just disabled in order to prevent any accidents until this bug is solved, don't wary!)
http://www.blahonga.org/~art/rant.html (search fo...

Read more...

lspci -tv:

+-19.0 Intel Corporation 82566DC Gigabit Network Connection

lspci -xxx:
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 03)
00: 86 80 4b 10 03 01 10 00 03 00 00 02 00 00 00 00
10: 00 00 40 fe 00 40 42 fe 21 18 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00

dmesg:
[ 1.652371] e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
[ 1.652374] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 1.652440] e1000e 0000:00:19.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.652447] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[ 1.652456] e1000e 0000:00:19.0: setting latency timer to 64
[ 1.652628] alloc irq_desc for 28 on node -1
[ 1.652630] alloc kstat_irqs on node -1
[ 1.652647] e1000e 0000:00:19.0: irq 28 for MSI/MSI-X
[ 1.739443] ohci1394 0000:1c:03.4: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.757784] 0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
[ 1.787486] e1000e 0000:00:19.0: PCI INT A disabled
[ 1.787495] e1000e: probe of 0000:00:19.0 failed with error -5

dave graham (david-graham) wrote :
Download full text (4.7 KiB)

Thanks,

I still don't understand how it is that e1000e_recover didn't work for you, but I admit that
I have been using it as a tool, and don't understand its inner workings.

Let's try another approach to get the invalid NVM content listed, this time
by the driver when it reads the data. I attach a patch "e1000e-1.0.15.shownvm.patch"
which can be applied our latest e1000e sourceforge release.
To install the driver, and collect that result , please proceeed as follows

1) Copy this patch to a local directory
2) Download e1000e-1.0.15.tar.gz from http://sourceforge.net/projects/e1000/files/
3) Untar the tarball to a local directory,
         tar xvzf e1000e-1.0.15.tar.gz
4) cd e1000e-1.0.15/src
5) Apply the patch
        patch -p2 <../../e1000e-1.0.15.shownvm.patch
6) Remove the old driver, build & install the new one
        rmmod e1000e
        make
        insmod e1000e.ko
7) The system message log should have the NVM content that was read.

The driver should also load even in the presence of the errored NVM. Please let me know whether it does load
and work, and .send me the dmesg log that includes the NVM dump, and I will see if I can fix it up and return
it to you with instructions on how to apply the fixed-up version,

Thanks
Dave

________________________________________
From: <email address hidden> [<email address hidden>] On Behalf Of bonsiware [<email address hidden>]
Sent: Wednesday, October 07, 2009 10:28 AM
To: Graham, David
Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

lspci -tv:

+-19.0 Intel Corporation 82566DC Gigabit Network Connection

lspci -xxx:
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 03)
00: 86 80 4b 10 03 01 10 00 03 00 00 02 00 00 00 00
10: 00 00 40 fe 00 40 42 fe 21 18 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00

dmesg:
[ 1.652371] e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
[ 1.652374] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 1.652440] e1000e 0000:00:19.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.652447] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[ 1.652456] e1000e 0000:00:19.0: setting latency timer to 64
[ 1.652628] alloc irq_desc for 28 on node -1
[ 1.652630] alloc kstat_irqs on node -1
[ 1.652647] e1000e 0000:00:19.0: irq 28 for MSI/MSI-X
[ 1.739443] ohci1394 0000:1c:03.4: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.757784] 0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
[ 1.787486] e1000e 0000:00:19.0: PCI INT A disabled
[ 1.787495] e1000e: probe of 0000:00:19.0 failed with error -5

--
[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk
https://bugs.launchpad.net/bugs/263555
You received this bug notification because you are a direct subscriber
of the bug.

Status in The Linux Kernel: Fix Released
Status in “linux” package in Ubuntu: Fix Released
Status in “linux-lpia” package in Ubuntu: Fix Released
Status in linux in Ubuntu Intrepid: Fix Released
Status in linux-lpia in Ubuntu Intrepid: Fi...

Read more...

Changed in linux (Ubuntu):
status: Fix Released → Confirmed
Steve Langasek (vorlon) wrote :

Please don't change bug statuses without explanation

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Changed in linux:
importance: Unknown → Medium
Changed in linux (Gentoo Linux):
importance: Unknown → Medium
Changed in linux (Mandriva):
importance: Unknown → Critical
Troex Nevelin (troex) wrote :
Download full text (4.8 KiB)

I have ThinkPad X60 with 82573L, and after upgrading to 11.04 beta with latest kernel it stop working almost at all.
Tested e1000e 1.2.20-k2 (in stock kernel) and 1.3.10a driver with no luck, booting with option "pcie_aspm=force" doesn't help, I've tried e1000e_recover.iso but it does not boot on my 32bit processor, and the last what strange I cannot read eeprom:

@tpx60:~# ifconfig eth0 up
@tpx60:~# ethtool -e eth0
Cannot get driver information: No such device
@tpx60:~# ifconfig eth0 down
@tpx60:~# ethtool -e eth0
Offset Values
------ ------
0x0000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0060 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

@tpx60:~# dmesg | grep e1000e
[ 1.231354] e1000e: Intel(R) PRO/1000 Network Driver - 1.2.20-k2
[ 1.231358] e1000e: Copyright(c) 1999 - 2011 Intel Corporation.
[ 1.231392] e1000e 0000:02:00.0: Disabling ASPM L1
[ 1.231410] e1000e 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.231435] e1000e 0000:02:00.0: setting latency timer to 64
[ 1.231639] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 1.232563] e1000e 0000:02:00.0: Disabling ASPM L0s
[ 1.392249] e1000e 0000:02:00.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:16:d3:3a:47:ae
[ 1.392253] e1000e 0000:02:00.0: eth0: Intel(R) PRO/1000 Network Connection
[ 1.392332] e1000e 0000:02:00.0: eth0: MAC: 2, PHY: 2, PBA No: 005302-003
[ 27.120351] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 27.176320] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 28.762602] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
[ 28.762610] e1000e 0000:02:00.0: eth0: 10/100 speed: disabling TSO
[ 32.855747] e1000e 0000:02:00.0: PCI INT A disabled
[ 32.855761] e1000e 0000:02:00.0: PME# enabled
[ 89.756091] e1000e 0000:02:00.0: BAR 0: set to [mem 0xee000000-0xee01ffff] (PCI address [0xee000000-0xee01ffff])
[ 89.756109] e1000e 0000:02:00.0: BAR 2: set to [io 0x2000-0x201f] (PCI address [0x2000-0x201f])
[ 89.756154] e1000e 0000:02:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[ 89.756208] e1000e 0000:02:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[ 89.756278] e1000e 0000:02:00.0: PME# disabled
[ 89.756372] e1000e 0000:02:00.0: Disabling ASPM L1
[ 89.756474] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 89.758786] e1000e 0000:02:00.0: eth0: MAC Wakeup cause - Link Status Change
[ 89.828165] e1000e 0000:02:00.0: PME# enabled
[ 89.976110] e1000e 0000:02:00.0: BAR 0: set to [mem 0xee000000-0xee01ffff] (PCI address [0xee000000-0xee01ffff])
[ 89.976128] e1000e 0000:02:00.0: BAR 2: set to [io 0x2000-0x201f] (PCI address [0x2000-0x201f])
[ 89.976184] e1000e 0000:02:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[ 89.976234] e1000e 0000:02:00.0: restoring config space at offs...

Read more...

this bug is not a catchall for all e1000e issues, the original issue this bug was filed against is fixed and will be highly unlikely to reoccur. If you're having e1000e issues please file a new bug.

Displaying first 40 and last 40 comments. View all 198 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.