Wireless drops outside world, but seems alive internally
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
wicd |
New
|
Undecided
|
Unassigned |
Bug Description
For the first couple of months after installation, wicd wrote nice details to /var/log/
I ask because twice now, while running unattended updating a cloud server with occasional sensor data, the system loses access to the server, and when I try to SSH or VNC in to investigate, I'm ignored. But sniffing the network shows it is successfully requesting and receiving NTP updates... I doubt this has anything to do with wicd, but I'm digging for any log info I can get!
In the old logs, it looks like wicd does its own DHCP, or at least logged it in detail. I do notice in my router logs that there seem to be gaps in the DHCP renew sequence, and that recovery from the network problem involves a new DHCPDISCOVER instead of just the REQUEST/ACK.
Is there some way to probe into wicd historical events from the command line or python?
More Information:
I've captured info on six of these network dropouts now. A few details vary (as noted below), but the pattern is pretty clear. The Wi-Fi is convenient, but not essential to my project, so I've plugged in the wired ethernet and moved on. Still, it would be nice to have wireless under control, and maybe someone else could be helped by my sharing this information.
This is a BeagleBone Black with Ubuntu 12.04 and wicd 1.7.2.3 (bzr-r761).
Wi-Fi is the MediaTek (Ralink) UWN100 Wi-Fi adapter with MT7601 chip and RT2870STA.
Wireless connection drops for no obvious reason, typically overnight.
It is being used at least every five minutes to update a remote cloud site.
Router logs show it has plenty of DHCP lease left.
Netstat sometimes shows a remnant of the proper IP addresses:
udp 0 0 10.1.1.121:123 0.0.0.0:* <-- missing from some earlier logs
udp6 0 0 fe80::20c:
But all other entries have changed to 0.0.0.0.
Most ps displays show all of the wicd python modules running, but one event is missing this one:
ubuntu 21845 1 0 May15 ? 00:05:05 /usr/bin/python -O /usr/share/
Ifconfig and iwconfig always seem normal, and report increasing byte counts even though nothing is getting through to the outside world.
"iwlist ra0 s" shows the other local Wi-Fi AP, but not the one this board was previously connected to.
"wicd-cli --wireless -l" and wicd-curses always show both Wi-Fi APs
"iwlist ra0 ap" shows only the "other" AP, and gives it a strangely bogus MAC address, that changes somewhat between the different failures:
---
$ iwlist ra0 ap
ra0 Peers/Access-Points in range:
48:
---
--> Similar strange MAC appears in next day's report as well, but nowhere else in any document relating to my networks
2A:
--> previous day:
50:
In wicd-curses, the 'R' refresh command produces the "Connected" message, but no connection.
The 'C' connect command restores the network connection properly.
Next level of detail...
stopped updating emoncms
5/17/2014 8:55:00 PM GMT-7
Router connections:
eth1 00:21:6A:74:C4:A4
[oui] [static] [wfilter] 10.1.1.108 J3400 -83 dBm 16 0 days, 22:32:16
br0 00:0C:43:00:7D:7F
[oui] [static] 10.1.1.121 ubuntu-armhf 0 days, 12:56:15
Router log:
May 16 23:24:09 Linksys3929 daemon.info dnsmasq-dhcp[99]: DHCPREQUEST(br0) 10.1.1.121 00:0c:43:00:7d:7f
May 16 23:24:09 Linksys3929 daemon.info dnsmasq-dhcp[99]: DHCPACK(br0) 10.1.1.121 00:0c:43:00:7d:7f ubuntu-armhf
May 17 08:50:15 Linksys3929 daemon.info dnsmasq-dhcp[99]: DHCPREQUEST(br0) 10.1.1.121 00:0c:43:00:7d:7f
May 17 08:50:15 Linksys3929 daemon.info dnsmasq-dhcp[99]: DHCPACK(br0) 10.1.1.121 00:0c:43:00:7d:7f ubuntu-armhf
May 17 11:30:06 Linksys3929 daemon.info dnsmasq-dhcp[99]: DHCPDISCOVER(br0) 10.1.1.121 00:0c:43:00:7d:7f
May 17 11:30:06 Linksys3929 daemon.info dnsmasq-dhcp[99]: DHCPOFFER(br0) 10.1.1.121 00:0c:43:00:7d:7f
May 17 11:30:06 Linksys3929 daemon.info dnsmasq-dhcp[99]: DHCPREQUEST(br0) 10.1.1.121 00:0c:43:00:7d:7f
May 17 11:30:06 Linksys3929 daemon.info dnsmasq-dhcp[99]: DHCPACK(br0) 10.1.1.121 00:0c:43:00:7d:7f ubuntu-armhf
[nothing beyond there]
Last login: Sat May 17 11:31:03 PDT 2014 from j3400.crr on pts/0
Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.8.13-bone30 armv7l)
* Documentation: https:/
ubuntu@
ubuntu ttyO0 2014-05-17 22:40
--> in earlier tests, (BBB Ubuntu 12_04 uSD 140126 CanopyOut, wicd-curses log.txt),
when "Link is up", ==>rt_ioctl_
--> Just realized that 140126 event was similar - the upstream Canopy node between me and the internet dropped offline, and the BBB Wi-Fi was unable to recover on its own when connectivity returned. In that case, "sudo ifconfig ra0 down" and then back up worked. In the current cases, "sudo ifconfig ra0 down" does not appear to take the interface down (according to ifconfig), but does fix the connection.
ubuntu@
May 17 22:41:38 ubuntu-armhf kernel: [40442.089562] ==>rt_ioctl_
May 17 22:41:38 ubuntu-armhf kernel: [40442.089618] ===>rt_
May 17 22:41:38 ubuntu-armhf kernel: [40442.089806] rt28xx_
May 17 22:41:38 ubuntu-armhf kernel: [40442.089843] <--- rt28xx_
May 17 22:41:43 ubuntu-armhf kernel: [40447.085611] ==>rt_ioctl_giwfreq 1
May 17 22:41:43 ubuntu-armhf kernel: [40447.085797] ===>rt_
May 17 22:41:43 ubuntu-armhf kernel: [40447.085874] ==>rt_ioctl_
May 17 22:41:43 ubuntu-armhf kernel: [40447.085944] ===>rt_
May 17 22:41:43 ubuntu-armhf kernel: [40447.086138] rt28xx_
May 17 22:41:43 ubuntu-armhf kernel: [40447.086175] <--- rt28xx_
ubuntu@
--> Unless noted, results are the same in all events:
ubuntu@
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:1880 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:2120 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:2121 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:4304 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp6 0 0 :::80 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
udp 0 0 10.1.1.121:123 0.0.0.0:* <-- missing from some earlier logs
udp 0 0 127.0.0.1:123 0.0.0.0:*
udp 0 0 0.0.0.0:123 0.0.0.0:*
udp 0 0 0.0.0.0:68 0.0.0.0:*
udp 0 0 0.0.0.0:68 0.0.0.0:*
udp6 0 0 fe80::20c:
udp6 0 0 ::1:123 :::*
udp6 0 0 :::123 :::*
ubuntu@
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 1 0 127.0.0.1:50255 127.0.0.1:4304 CLOSE_WAIT -
tcp 1 0 127.0.0.1:50263 127.0.0.1:4304 CLOSE_WAIT -
ubuntu@
--> During first event, when it seemed Node-RED really had crashed:
tcp 358 0 10.1.1.121:1880 10.1.1.108:14505 ESTABLISHED -
--> Killed and restarted it (before fixing network), still not receiving:
tcp 440 0 10.1.1.121:1880 10.1.1.108:14502 CLOSE_WAIT -
ubuntu@
root 649 1 1 11:29 ? 00:13:03 /usr/bin/python -O /usr/share/
root 731 649 0 11:29 ? 00:04:49 /usr/bin/python -O /usr/share/
ubuntu 26159 25997 0 22:44 ttyO0 00:00:00 grep --color=auto .py
ubuntu@
--> Included in previous logs, missing here:
ubuntu 21845 1 0 May15 ? 00:05:05 /usr/bin/python -O /usr/share/
ubuntu@
ra0 Link encap:Ethernet HWaddr 00:0c:43:00:7d:7f
inet addr:10.1.1.121 Bcast:10.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::20c:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:411667 errors:0 dropped:0 overruns:0 frame:0
TX packets:5331 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:39706150 (39.7 MB) TX bytes:373956 (373.9 KB)
--> Even when network seems unresponsive from outside, the byte counts here increase!
ubuntu@
ra0 Ralink STA ESSID:"101Netlink2" Nickname:
Mode:Auto Frequency=2.412 GHz Access Point: 00:14:BF:05:4A:4D
Bit Rate=1 Mb/s <-- goes to 54 when working
RTS thr:off Fragment thr:off
Link Quality=90/100 Signal level:-78 dBm Noise level:-78 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0
ubuntu@
ra0 Scan completed :
Cell 01 - Address: 00:14:BF:05:4A:50
ubuntu@
ubuntu@
ra0 Peers/Access-Points in range:
48:
--> ??? Similar strange MAC appears in next day's report as well, but nowhere else in any document relating to my networks ???
2A:
--> previous day:
50:
ubuntu@
# BSSID Channel ESSID
0 00:14:BF:05:4A:50 6 Talisman
1 00:14:BF:05:4A:4D 1 101Netlink2
ubuntu@
Essid: 101Netlink2
Bssid: 00:14:BF:05:4A:4D
Encryption: Off
Quality: 31
Mode: Managed
Channel: 1
Bit Rates: 54 Mb/s
ubuntu@
Wired Networks
0 W-Static-10.1.1.4 vvv
Wireless Networks
C STR ESSID ENCRYPT BSSID MODE CHNL
63% Talisman Unsecured 00:14:BF:05:4A:50 Managed 6
31% 101Netlink2 Unsecured 00:14:BF:05:4A:4D Managed 1
H:Help ->:ConfiK:
g t n h
Connected to 101Netlink2" Nickname:"MT7601STA at 61% (IP: 10.1.1.121)
--> 'R' refresh command produces same "Connected" message, but no connection.
--> 'C' connect command works.
First event top display:
===
top - 20:35:46 up 4 days, 4:39, 1 user, load average: 1.02, 1.07, 1.12
Tasks: 112 total, 2 running, 110 sleeping, 0 stopped, 0 zombie
Cpu(s): 96.7%us, 3.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 507444k total, 494500k used, 12944k free, 37768k buffers
Swap: 0k total, 0k used, 0k free, 159276k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15923 ubuntu 20 0 251m 183m 4552 R 93.7 37.0 1264:09 node
656 root 20 0 23036 6796 1564 S 1.6 1.3 109:21.63 wicd <-- seems a lot of time?
1413 root 20 0 0 0 0 S 0.7 0.0 6:57.74 w1_bus_master1
6403 ubuntu 20 0 2152 1020 760 R 0.7 0.2 0:00.17 top
317 syslog 20 0 29452 1556 784 S 0.3 0.3 35:14.34 rsyslogd
318 messageb 20 0 2600 1056 616 S 0.3 0.2 46:00.28 dbus-daemon
491 mysql 20 0 315m 29m 1388 S 0.3 5.9 25:55.39 mysqld
738 root 20 0 12664 6168 2216 S 0.3 1.2 42:19.33 wicd-monitor
===
ubuntu@
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
...
syslog 317 0.5 0.3 29612 1612 ? Sl May08 35:40 rsyslogd -c5
103 318 0.7 0.2 2600 1056 ? Ss May08 46:25 dbus-daemon --s
...
root 656 1.7 1.3 23168 6968 ? S May08 110:20 /usr/bin/python
root 738 0.6 1.2 12664 6188 ? S May08 42:41 /usr/bin/python
...
ubuntu 5937 0.0 0.1 2872 596 ? S May08 0:00 /usr/bin/dbus-l
ubuntu 5938 0.0 0.1 2748 844 ? Ss May08 0:00 //bin/dbus-daem
...
ubuntu 5954 0.0 2.2 25572 11208 ? S May08 2:18 /usr/bin/python
...
ubuntu 15923 27.5 36.9 257728 187500 pts/1 Tl May09 1350:40 node --max-old <-- looks like Node really was maxed out...
...
ubuntu@