wired ethernet not available on resume from sleep

Bug #111282 reported by Albert Cardona
8
Affects Status Importance Assigned to Milestone
acpi (Ubuntu)
Fix Released
Undecided
Unassigned
network-config (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: acpi

1 - Boot up the laptop (Thinkpad T60p with Feisty)
2 - plug and unplug the ethernet cable. NetworkManager does a good job in acquiring a connection when the cable is in, and reporting it unplugged when so.
3 - Put the laptop to sleep and unplug the cable.
4 - Resume the laptop, and use the wireless (optional) which works just fine
5 - Put the latop to sleep
6 - Wake up the laptop with the ethernet cable plugged in: NetworkManager does NOT list "wired network" as an option.

Killing and restarting NetworkManager, and sudo /etc/init.d/networking stop/start, and a combination, and ifdown -a, did not help.

Also sudo ifup eth0 did not help either: reports on ethernet pid present, but resource not available (claims that eth0 is using ath0, which is my wireless, for some reason beyond me).

The only solution so far is to reboot the laptop, which is very undesirable.

Revision history for this message
Albert Cardona (cardona) wrote :

This bug is still present as of today, with 7.04 dist-upgraded to the latest.

Is there any partial cure? For example, is there anyway to reload the proper network kernel modules, for instance, that may fix the problem?

On my system:
- I have ipv6 blacklisted, but commenting it out and rebooting ends up with the same problems.
- I use fglrx binary driver from ATI
- I have bluetooth disabled in the BIOS

The rest is the standard ubuntu install for Thinkpad T60p:

$ lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller AHCI (rev 02)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc Unknown device 71d4
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
03:00.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01)
15:00.0 CardBus bridge: Texas Instruments PCI1510 PC card Cardbus Controller

Revision history for this message
nullack (nullack) wrote :

I can confirm this bug exists on Hardy Heron AMD64. In my view, this is a critical issue preventing me from having a quality experience on Ubuntu. I have spent alot of time puzzling over this bug and testing out various things. I have now exhausted all possible avenues I know how to do myself and before creating a new bug I searched, finding this one. I can confirm:

1. Attempting to come back from a period away from the machine results in no internet - "sleep" bug. Importantly, Im on a workstation and in power management I have set the machine to never sleep and only my CRT display to sleep after 15 minutes. Nonetheless, even though my machine never sleeps it somehow looses internet connectivity.
2. The problem is not effected by DHCP or statically allocated IPs - its universal
3. sudo ifconfig eth0 looks right when it is queried as per below:

eth0 Link encap:Ethernet HWaddr 00:1a:92:3f:45:48
          inet addr:192.168.10.2 Bcast:192.168.10.255 Mask:255.255.255.0
          inet6 addr: fe80::21a:92ff:fe3f:4548/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:2634 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2451 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3056584 (2.9 MB) TX bytes:264316 (258.1 KB)
          Interrupt:23 Base address:0xe800

4. sudo ifconfig eth0 down then up does not fix the problem
5. Only a reboot fixes the problem
6. When the bug condition is in place, I get the following when I sudo route:

Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.10.0 * 255.255.255.0 U 0 0 0 eth0

7. When the internet is working and I have not allowed the machine to "sleep" even though it doesnt sleep when I sudo route I get:

Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.10.0 * 255.255.255.0 U 0 0 0 eth0
link-local * 255.255.0.0 U 1000 0 0 eth0
default www.routerlogin 0.0.0.0 UG 100 0 0 eth0

So it seems to "loose" my NAT gateway router at 192.168.10.1.

8. When the bug condition is in place, I can ping my workstation and the gateway router. However when I try to ping an external address such as microsoft.com it says "connect: Network is unreachable"

9. When the bug condition is in place, if do an nslookup query on say microsoft.com I get a proper dns answer back.

EASY BUG REPLICATION STEPS:

1. Have a freshly booted Hardy machine using ethernet
2. Confirm internet works
3. Open terminal
4. sudo ifconfig eth0 down
5. sudo ifconfig eth0 up
6. Observe bug behaviour

I have confirmed the bug report. Can I please appeal for some help on this - lets fix it so no one else has to suffer it :)

Changed in acpi:
status: New → Confirmed
Revision history for this message
nullack (nullack) wrote :
Revision history for this message
Erik Andrén (erik-andren) wrote :

Instead of rebooting, have you tried just reloading the kernel module?
I.e
sudo rmmod e1000
sudo modprobe e1000

Revision history for this message
nullack (nullack) wrote :

Thanks for the suggestion Erik :)

Ok so I figured out Ive got a Via Rhine II VT6102 version 7c ethernet nic. Apparently that uses the via-rhine module. So I did my fast way of testing this instead of having to wait for it to "sleep":

ppp@ppp:~$ sudo ifconfig eth0 down
ppp@ppp:~$ sudo ifconfig eth0 up
ppp@ppp:~$ sudo rmmod via-rhine
ppp@ppp:~$ sudo modprobe via-rhine
ppp@ppp:~$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
ppp@ppp:~$

As you can see, the route table seems worse using this method. The problem exists. Seems only a reboot fixes it, and this is a repetitive difficult problem thats preventing me from feeling that Ubuntu love :(

Revision history for this message
nullack (nullack) wrote :
Revision history for this message
nullack (nullack) wrote :
Revision history for this message
nullack (nullack) wrote :
Revision history for this message
nullack (nullack) wrote :
Revision history for this message
Albert Cardona (cardona) wrote :

I thought this bug thread was dead. I posted a solution some time ago:

http://www.mcdb.ucla.edu/research/hartenstein/acardona/ubuntu_tips.html#Fixing%20the%20disappearing%20ethernet%20network%20interface%20for%20Thinkpads

.. which says:

sudo modprobe -r e1000
sudo modprobe e1000
sudo idown eth0
sudo ifup eth0

Also, if one is not using the gnome desktop, it helps to:
$ sudo killall NetworkManager
$ sudo killall NetworkManagerDispatcher

because otherwise the network keeps getting disconnected every few minutes, and particularly the wireless network.

Revision history for this message
nullack (nullack) wrote :

Albert this problem is not isolated to your hardware. I dont have an Intel Pro 1000 nic, Ive got a Via RHINE II nic (via-rhine). Anyway I tried the workaround Erik suggested which youve also suggested, and it does not fix the problem. So far the only solution is a reboot which is totally impractical. I have processing jobs that take awhile to run and I cant just reboot all the time to get internet back.

I dont want to go back to using Vista....Im trying to stick with Ubuntu. However Vista actually did work unlike the experience Im having here.

Someone help me out here please.

Revision history for this message
nullack (nullack) wrote :

Ok, finally I have managed to find an obscure workaround to this frustrating bug. Please note, this is not a fix, and it requires manual commandline intervention which is not supportive of the "it just works" Ubuntu ideal. There is a root cause problem here that needs code level fixing. Ubuntu urgently needs to fix this sort of stuff because plainly, it doesnt "just work" and there is countless people on the Ubuntu forums actually begging for someone to help them, and having to go through obscure things to work around the problems.

Root cause problem: eth0 is loosing the gateway after a period of time in the routing table

Workaround: manually add the gateway address each time it is lost in the command line

Example:

ppp@ppp:~$ sudo ifconfig eth0 down
ppp@ppp:~$ sudo ifconfig eth0 192.168.10.2 netmask 255.255.255.0 up
ppp@ppp:~$ sudo ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:1a:92:3f:45:48
          inet addr:192.168.10.2 Bcast:192.168.10.255 Mask:255.255.255.0
          inet6 addr: fe80::21a:92ff:fe3f:4548/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:8298 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6985 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:9124303 (8.7 MB) TX bytes:882153 (861.4 KB)
          Interrupt:23 Base address:0xe800

ppp@ppp:~$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.10.0 * 255.255.255.0 U 0 0 0 eth0
ppp@ppp:~$ sudo route add default gw 192.168.10.1
ppp@ppp:~$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.10.0 * 255.255.255.0 U 0 0 0 eth0
default www.routerlogin 0.0.0.0 UG 0 0 0 eth0

Revision history for this message
Cappy-chan (cappy-chan) wrote :

This can be fixed by running:
gksudo gedit /etc/default/acpi-support

and changing the line

STOP_SERVICES=""

to

STOP_SERVICES="networking"

Then save the file.

Revision history for this message
nullack (nullack) wrote :

Thankyou for Cappy for the advice. I have triple checked your instructions and its my displeasure to advise that does not fix it. Also while doing this, I observed that it too far longer for the default gateway to come back from the route command - maybe it was auto adding itself?. Worse, I was not able to resolve the problem by doing my workaround of manually adding the default gw didnt work as it did not get lost from the route table. So now Im back to the unusable situation of having to manually reboot every time I loose the internet :( :(

I tried enabling hardy proposed but like hardy it did not fix it.

ppp@ppp:~$ sudo ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:1a:92:3f:45:48
          inet addr:192.168.10.2 Bcast:192.168.10.255 Mask:255.255.255.0
          inet6 addr: fe80::21a:92ff:fe3f:4548/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:103876 errors:0 dropped:0 overruns:0 frame:0
          TX packets:55224 errors:20 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:150008082 (143.0 MB) TX bytes:4536834 (4.3 MB)
          Interrupt:23 Base address:0xe800

ppp@ppp:~$ sudo route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.10.0 * 255.255.255.0 U 0 0 0 eth0
link-local * 255.255.0.0 U 1000 0 0 eth0
default 192.168.10.1 0.0.0.0 UG 100 0 0 eth0

ppp@ppp:~$ ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.
From 192.168.10.2 icmp_seq=1 Destination Host Unreachable
From 192.168.10.2 icmp_seq=4 Destination Host Unreachable
From 192.168.10.2 icmp_seq=5 Destination Host Unreachable

--- 192.168.10.1 ping statistics ---
8 packets transmitted, 0 received, +3 errors, 100% packet loss, time 7003ms
, pipe 2

ppp@ppp:~$ sudo route add default gw 192.168.10.1
ppp@ppp:~$ ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.
From 192.168.10.2 icmp_seq=1 Destination Host Unreachable

--- 192.168.10.1 ping statistics ---
3 packets transmitted, 0 received, +1 errors, 100% packet loss, time 1999ms

Jun 21 11:23:18 ppp kernel: [29992.941218] NETDEV WATCHDOG: eth0: transmit timed out
Jun 21 11:23:18 ppp kernel: [29992.941367] eth0: Transmit timed out, status 0003, PHY status 786d, resetting...
Jun 21 11:23:18 ppp kernel: [29992.942018] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1

Revision history for this message
nullack (nullack) wrote :
Download full text (5.3 KiB)

So I have been doing more research and I think Im narrowing it down now. In summary I think:

After a period of no network use, ACPI thinks IRQ 23 isnt needed
ACPI turns off IRQ 23
eth0 times out and wont come back without reboot
ifdown/ifup wont fix it

Jun 21 03:04:07 ppp kernel: [ 57.447218] eth0: no IPv6 routers present
Jun 21 04:29:46 ppp kernel: [ 5193.747505] irq 23: nobody cared (try booting with the "irqpoll" option)
Jun 21 04:29:46 ppp kernel: [ 5193.747514] Pid: 0, comm: swapper Tainted: P 2.6.24-19-generic #1
Jun 21 04:29:46 ppp kernel: [ 5193.747516]
Jun 21 04:29:46 ppp kernel: [ 5193.747517] Call Trace:
Jun 21 04:29:46 ppp kernel: [ 5193.747519] <IRQ> [__report_bad_irq+0x1e/0x80] __report_bad_irq+0x1e/0x80
Jun 21 04:29:46 ppp kernel: [ 5193.747550] [note_interrupt+0x2ad/0x2e0] note_interrupt+0x2ad/0x2e0
Jun 21 04:29:46 ppp kernel: [ 5193.747562] [handle_fasteoi_irq+0xa1/0x110] handle_fasteoi_irq+0xa1/0x110
Jun 21 04:29:46 ppp kernel: [ 5193.747571] [do_IRQ+0x7b/0x100] do_IRQ+0x7b/0x100
Jun 21 04:29:46 ppp kernel: [ 5193.747577] [ret_from_intr+0x0/0x0a] ret_from_intr+0x0/0xa
Jun 21 04:29:46 ppp kernel: [ 5193.747583] [pci_conf1_read+0x0/0x100] pci_conf1_read+0x0/0x100
Jun 21 04:29:46 ppp kernel: [ 5193.747596] [__do_softirq+0x60/0xe0] __do_softirq+0x60/0xe0
Jun 21 04:29:46 ppp kernel: [ 5193.747609] [call_softirq+0x1c/0x30] call_softirq+0x1c/0x30
Jun 21 04:29:46 ppp kernel: [ 5193.747614] [do_softirq+0x35/0x90] do_softirq+0x35/0x90
Jun 21 04:29:46 ppp kernel: [ 5193.747618] [irq_exit+0x88/0x90] irq_exit+0x88/0x90
Jun 21 04:29:46 ppp kernel: [ 5193.747621] [do_IRQ+0x80/0x100] do_IRQ+0x80/0x100
Jun 21 04:29:46 ppp kernel: [ 5193.747624] [default_idle+0x0/0x40] default_idle+0x0/0x40
Jun 21 04:29:46 ppp kernel: [ 5193.747628] [default_idle+0x0/0x40] default_idle+0x0/0x40
Jun 21 04:29:46 ppp kernel: [ 5193.747630] [ret_from_intr+0x0/0x0a] ret_from_intr+0x0/0xa
Jun 21 04:29:46 ppp kernel: [ 5193.747633] <EOI> [lapic_next_event+0x0/0x10] lapic_next_event+0x0/0x10
Jun 21 04:29:46 ppp kernel: [ 5193.747648] [default_idle+0x29/0x40] default_idle+0x29/0x40
Jun 21 04:29:46 ppp kernel: [ 5193.747654] [cpu_idle+0x6f/0xc0] cpu_idle+0x6f/0xc0
Jun 21 04:29:46 ppp kernel: [ 5193.747662] [start_kernel+0x2c5/0x350] start_kernel+0x2c5/0x350
Jun 21 04:29:46 ppp kernel: [ 5193.747670] [x86_64_start_kernel+0x12e/0x140] _sinittext+0x12e/0x140
Jun 21 04:29:46 ppp kernel: [ 5193.747678]
Jun 21 04:29:46 ppp kernel: [ 5193.747679] handlers:
Jun 21 04:29:46 ppp kernel: [ 5193.747680] [usbcore:usb_hcd_irq+0x0/0x60] (usb_hcd_irq+0x0/0x60 [usbcore])
Jun 21 04:29:46 ppp kernel: [ 5193.747702] [via_rhine:rhine_interrupt+0x0/0x7f0] (rhine_interrupt+0x0/0x7f0 [via_rhine])
Jun 21 04:29:46 ppp kernel: [ 5193.747710] Disabling IRQ #23
Jun 21 04:34:46 ppp kernel: [ 5493.104588] NETDEV WATCHDOG: eth0: transmit timed out
Jun 21 04:34:46 ppp kernel: [ 5493.104738] eth0: Transmit timed out, status 0003, PHY status 786d, resetting...
Jun 21 04:34:46 ppp kernel: [ 5493.105384] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
Jun 21 05:05:02 ppp kernel: [ 7308.203455] NETDEV WATCHDOG: eth0: transmit timed out
Jun 21 05:05:02 ppp kernel: [ 7308.20...

Read more...

Changed in acpi:
status: New → Confirmed
Revision history for this message
nullack (nullack) wrote :

To report on some additional findings:

1. I compiled a vanilla 2.6.25.8 kernel and the problem was replicated
2. I upgraded to Intrepid pre-alpha, and I am very pleased to report that the problem no longer occurs. I dont know if it is due to new ACPI / ACPID or kernel changes though.

I intend to stick with Intrepid as it progresses.

However can I please make the point that anyone else with this problem who may want to stick with Hardy thinking it is "stable", will not have a quality Ubuntu experience. I dont know what the root cause is but if anyone needs further details from me and can provide instructions on how to get them I will be happy to spend time on this to fix Hardy for anyone else suffering this bug.

Revision history for this message
nullack (nullack) wrote :

Unfortunately the problem has proven to replicate itself in Intrepid. I even tried a different router to see if that fixed it, but it did not. I am certain this is a software error.

Running the kernel option of "irqpoll" appears to prevent the bug from occuring however.

Revision history for this message
nullack (nullack) wrote :

Updates to Intrepid has fixed this problem! No more irqpoll workaround is needed on boot :)

Changed in network-config:
status: Confirmed → Fix Released
Changed in acpi:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.