Network access broken to some hosts after some usage...

Bug #1255516 reported by gadLinux
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
netbase (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

Hi,

I'm suffering something that's strange. I think it's related to dhcp but really don't know.

I configured my server to use dhcp so it can receive dns updates and ip on each boot. Among using PXE with maas for booting.

The problem is that I found that it looses network access from time to time. The only way to recover the network is by doing ifdown br0 && ifup br0.

Since this is a virtualization server everytime I do this, the virtual machines loose also the network so I have to restart each one.

I tried to change to another switch/router because I suspected that link is broken from time to time and this caused dhcpd to rerun, broking the configuration given at boot. It made no change.

I changed the interface to which br0 is bridged from eth0 (internal asus motherboard link) to eth2 (intel PRO card pci/e). It made no change.

So I'm lost on what's the problem. I've just changed the configuration to static ip and it seems to work now, but I have to test it more, and with heavy load.

This is old log, with eth0 in place

Nov 26 05:29:11 red1 kernel: [29995.675232] br0: port 1(eth0) entered disabled state
Nov 26 05:29:11 red1 avahi-daemon[1508]: Interface br0.IPv6 no longer relevant for mDNS.
Nov 26 05:29:11 red1 avahi-daemon[1508]: Leaving mDNS multicast group on interface br0.IPv6 with address fe80::224:8cff:fe3b:c700.
Nov 26 05:29:11 red1 avahi-daemon[1508]: Withdrawing address record for fe80::224:8cff:fe3b:c700 on br0.
Nov 26 05:29:11 red1 charon: 02[KNL] fe80::224:8cff:fe3b:c700 disappeared from br0
Nov 26 05:29:11 red1 charon: 02[KNL] interface eth0 deactivated
Nov 26 05:29:11 red1 charon: 02[KNL] interface eth0 deleted
Nov 26 05:29:11 red1 kernel: [29995.723958] device eth0 left promiscuous mode
Nov 26 05:29:11 red1 kernel: [29995.723971] br0: port 1(eth0) entered disabled state
Nov 26 05:29:11 red1 kernel: [29995.730646] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Nov 26 05:29:11 red1 charon: 02[KNL] interface vnet1 deleted
Nov 26 05:29:11 red1 charon: 02[KNL] interface vnet0 deleted
Nov 26 05:29:11 red1 charon: 02[KNL] interface br0 deleted
Nov 26 05:29:11 red1 avahi-daemon[1508]: Withdrawing workstation service for br0.
Nov 26 05:29:11 red1 kernel: [29995.732532] device vnet1 left promiscuous mode
Nov 26 05:29:11 red1 kernel: [29995.732546] br0: port 3(vnet1) entered disabled state
Nov 26 05:29:11 red1 kernel: [29995.733670] device vnet0 left promiscuous mode
Nov 26 05:29:11 red1 kernel: [29995.733680] br0: port 2(vnet0) entered disabled state
Nov 26 05:29:11 red1 kernel: [29995.734800] device br0 left promiscuous mode
Nov 26 05:29:11 red1 NetworkManager[3392]: SCPlugin-Ifupdown: devices removed (path: /sys/devices/virtual/net/br0, iface: br0)
Nov 26 05:29:12 red1 kernel: [29995.793763] device eth0 entered promiscuous mode
Nov 26 05:29:12 red1 charon: 02[KNL] interface eth0 activated
Nov 26 05:29:12 red1 kernel: [29995.804671] r8169 0000:05:00.0 eth0: link down
Nov 26 05:29:12 red1 kernel: [29995.804688] r8169 0000:05:00.0 eth0: link down
Nov 26 05:29:12 red1 kernel: [29995.805141] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
:

Any idea on what can it be?

Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

Hi,

The DHCP issue has nothing to do with this. I have just done static ip config as I said and found that I'm loosing connection over time.

Strange thing is that I do not loose connection to hosts if I maintain a ping on an open terminal. But if I stop ping, connection is lost after a while.

summary: - Network access broken when interface is configured with dhcp
+ Network access broken to some hosts after some usage...
Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

This is last log of faillure:

[20173.368191] qbrd915d7fa-eb: port 2(tapd915d7fa-eb) entered disabled state
[20173.372029] device tapd915d7fa-eb left promiscuous mode
[20173.372035] qbrd915d7fa-eb: port 2(tapd915d7fa-eb) entered disabled state
[20174.564351] type=1400 audit(1385569885.340:100): apparmor="STATUS" operation="profile_remove" parent=9268 profile="unconfined" name="libvirt-d522bc4a-8a7a-4530-963a-d706361d9cf6" pid=9269 comm="apparmor_parser"
[20189.388200] type=1400 audit(1385569900.148:101): apparmor="STATUS" operation="profile_load" parent=9297 profile="unconfined" name="libvirt-d522bc4a-8a7a-4530-963a-d706361d9cf6" pid=9298 comm="apparmor_parser"
[20189.506371] device tapd915d7fa-eb entered promiscuous mode
[20189.536371] qbrd915d7fa-eb: port 2(tapd915d7fa-eb) entered forwarding state
[20189.536403] qbrd915d7fa-eb: port 2(tapd915d7fa-eb) entered forwarding state
[20191.932260] kvm [9491]: vcpu0 unhandled rdmsr: 0xc001100d
[20191.932277] kvm [9491]: vcpu0 unhandled rdmsr: 0xc0010112
[20192.086417] kvm [9491]: vcpu0 unhandled rdmsr: 0xc0010001
[22254.219798] init: neutron-metadata-agent main process (3190) killed by TERM signal
[22254.289118] init: neutron-plugin-openvswitch-agent main process (3198) killed by TERM signal
[22633.070206] qbrd915d7fa-eb: port 2(tapd915d7fa-eb) entered disabled state
[22633.071270] device tapd915d7fa-eb left promiscuous mode
[22633.071274] qbrd915d7fa-eb: port 2(tapd915d7fa-eb) entered disabled state
[22634.156716] type=1400 audit(1385572341.864:102): apparmor="STATUS" operation="profile_remove" parent=16946 profile="unconfined" name="libvirt-d522bc4a-8a7a-4530-963a-d706361d9cf6" pid=16947 comm="apparmor_parser"
[22716.762951] br0: port 3(vnet1) entered disabled state
[22716.763058] br0: port 2(vnet0) entered disabled state
[22716.763092] br0: port 1(eth2) entered disabled state
[22716.874182] device eth2 left promiscuous mode
[22716.874195] br0: port 1(eth2) entered disabled state
[22716.879092] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
[22716.881038] device vnet1 left promiscuous mode
[22716.881050] br0: port 3(vnet1) entered disabled state
[22716.881973] device vnet0 left promiscuous mode
[22716.881985] br0: port 2(vnet0) entered disabled state
[22716.883144] device br0 left promiscuous mode
[22716.942674] device eth2 entered promiscuous mode
[22717.024067] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
[22717.024080] 8021q: adding VLAN 0 to HW filter on device eth2
[22717.036589] br0: port 1(eth2) entered forwarding state
[22717.036678] br0: port 1(eth2) entered forwarding state
[22717.772845] br0: port 1(eth2) entered disabled state
[22718.702856] e1000e: eth2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[22718.703174] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[22718.703290] br0: port 1(eth2) entered forwarding state
[22718.703373] br0: port 1(eth2) entered forwarding state

Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

I'm really lost with this.

root@red1:~# ping controller
PING jb4ja.cloud.level2crm.com (172.16.0.115) 56(84) bytes of data.
64 bytes from jb4ja.cloud.level2crm.com (172.16.0.115): icmp_seq=1 ttl=64 time=0.624 ms
64 bytes from jb4ja.cloud.level2crm.com (172.16.0.115): icmp_seq=2 ttl=64 time=0.339 ms
^C
--- jb4ja.cloud.level2crm.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.339/0.481/0.624/0.144 ms

.... And After a while ....

root@red1:~# ping controller
PING jb4ja.cloud.level2crm.com (172.16.0.115) 56(84) bytes of data.
From red1 (172.16.0.100) icmp_seq=1 Destination Host Unreachable
From red1 (172.16.0.100) icmp_seq=2 Destination Host Unreachable
From red1 (172.16.0.100) icmp_seq=3 Destination Host Unreachable

Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

It seems that system has same problem if I reboot the machine.

This is the current log on boot.

Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

It seems that system has same problem if I reboot the machine.

This is the current log on boot.

Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

Maybe has something to do. But I was using a bridged interface to connect internet. I explain:

I did a bridge and added eth2 to this bridge. Then configured IP to bridge created br0:

#auto br0
##iface br0 inet dhcp
#iface br0 inet static
# address 172.16.0.100
# network 172.16.0.0
# netmask 255.255.0.0
# broadcast 172.16.255.255
# gateway 172.16.0.1
# bridge_ports eth2
# bridge_stp off
# bridge_fd 0
# bridge_maxwait 0

auto eth2
#iface br0 inet dhcp
iface eth2 inet static
        address 172.16.0.100
        network 172.16.0.0
        netmask 255.255.0.0
        broadcast 172.16.255.255
        gateway 172.16.0.1

I removed the bridge and got back to initial configuration and for now it seems to work. I mean for plain ethernet interface it seems to work. I'm testing, but just wanted to point to one direction...

May have this and the virtual machines (currently 2) I'm running something to do with the problem?

I have to say that this is the configuration I have in another host exactly equal and with this bridged configuration done and it has no problems at all.

Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

Hi,

I want to cross reference this bug, since it seems it has something to do with this issue.

https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978

Maybe at least.

Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

I've switched to macvtap and trying to see if it works better... For now looks like it's working, thought I've found another possible bug that I will test before reporting.

Revision history for this message
Jeff Ahrenholz (jeffrey-m-ahrenholz) wrote :

Does this bug just have to do with Ubuntu networking? Are you using the CORE emulator at all?

Revision history for this message
Eriberto (eriberto) wrote :

Forwarding because it isn't a core-network problem.

Thanks.

affects: core-network (Ubuntu) → netbase (Ubuntu)
Revision history for this message
Paul Tagliamonte (paultag) wrote :

I'm setting to medium. Netbase folks feel free to change to what the situation actually is, but this appears to cause some damage.

Changed in netbase (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

Hi,

I'm not sure if this is ubuntu only. I suspect that's kernel problem. Or driver problem but I can't diagnose.

What do you need?

I realized that I was using a bridge done with brctl and the added this bridge to an ovs bridge. It caused the system to fail faster... Few seconds...

Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

I'm having less problems with macvtap driver.

It seems that brctl bridges has some kind of problems. I migrated everything from brctl to macvtap and now it's working without problems. I had to add a new interface to let host and virtualization communicate but that's all.

I'm pretty sure that there is a problem on this driver. I have another machine with precise ubuntu and had much less problems. I wanted to update but now I don't think it's a good idea. Since it will be installed as a Compute node of Openstack cloud.

Revision history for this message
gadLinux (gad-aguilardelgado) wrote :

It seems macvtap driver works better. I had no more problems since I use it instead the normal bridge tools.

Someone should take a look to this problem since macvtap has it's own problems, like host is not seeing hosted machine. I had to do the KVM subnet trick to enable a subnet for communications between host and vm.

At least network gone no down anyway.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers