VLAN over balance bondig doesn't work

Bug #1631908 reported by Luca
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Expired
Undecided
Unassigned
curtin
Invalid
Undecided
Unassigned

Bug Description

When balance-rr, balance-alb, balance-tlb or 802.3ad bonding is configured VLANs configured over the bond interface doesn't work.

I used 4 interfaces for the bonding. The untagged bond0 always work correctly but not all VLANs created over the bond.
With balance-alb or balance-tlb, with no switch configuration, no one VLANs is working. Bring the vlan interface down and up solve the problem.
With balance-rr or 802.3ad, with switch appropriately configured, only VLANs with mtu 1500 receive traffic (it lost the first 8 packet), others vlans with mtu of 9000 doesn't receive anythings. Bring down and up the interface doesn't solve the problem.

Ping MTU 1500 interfaces:
$ ping 10.15.6.205
PING 10.15.6.205 (10.15.6.205) 56(84) bytes of data.
64 bytes from 10.15.6.205: icmp_seq=9 ttl=64 time=0.364 ms
64 bytes from 10.15.6.205: icmp_seq=10 ttl=64 time=0.197 ms
64 bytes from 10.15.6.205: icmp_seq=11 ttl=64 time=0.207 ms

$ ping 10.15.2.205
PING 10.15.2.205 (10.15.2.205) 56(84) bytes of data.
64 bytes from 10.15.2.205: icmp_seq=9 ttl=64 time=0.500 ms
64 bytes from 10.15.2.205: icmp_seq=10 ttl=64 time=0.284 ms
64 bytes from 10.15.2.205: icmp_seq=11 ttl=64 time=0.268 ms
64 bytes from 10.15.2.205: icmp_seq=12 ttl=64 time=0.262 ms

Ping MTU 9000 interface:
$ ping 10.15.4.4
PING 10.15.4.4 (10.15.4.4) 56(84) bytes of data.
^C
--- 10.15.4.4 ping statistics ---
436 packets transmitted, 0 received, 100% packet loss, time 435011ms

Node's config:
http://paste.ubuntu.com/23302581/

I attached the rsyslog and the cloud-init log

Revision history for this message
Luca (l-dellefemmine) wrote :
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Luca,

So let me understand this correctly
1. You configured bonds/vlans correctly in MAAS
2. You deploy the machine and this configuration correctly reflects in /etc/network/interfaces
3. In some situations, bringing up and down the interface works, and in other situation it doesn't work.

Is this correct? If so, this seems that it is not an issue with MAAS or curtin, but rather either with:

1. Problems with ifupdown
2. Your configuration.

Changed in maas:
status: New → Incomplete
Revision history for this message
Luca (l-dellefemmine) wrote :

Hi Andres,
for better understanding I did more tests.

Bonding with LACP and balance-rr is working, issue was an incorrect configuration from my side.

Bonding with balance-tlb and balance-alb still not working, bringing up and down the interface work.

Pinging interface bond0.1501 increase bond0 rx dropped packet

$ ifconfig bond0
bond0 Link encap:Ethernet HWaddr 94:57:a5:5b:74:64
          inet addr:192.168.2.54 Bcast:192.168.2.255 Mask:255.255.255.0
          inet6 addr: fe80::9657:a5ff:fe5b:7464/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:9000 Metric:1
          RX packets:440118 errors:0 dropped:12289 overruns:0 frame:0
          TX packets:113422 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:107691781 (107.6 MB) TX bytes:7505919 (7.5 MB)
$ ifconfig bond0.1501
bond0.1501 Link encap:Ethernet HWaddr 16:dc:48:62:9c:92
          inet addr:10.15.1.206 Bcast:10.15.1.255 Mask:255.255.255.0
          inet6 addr: fe80::14dc:48ff:fe62:9c92/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:377 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:92842 (92.8 KB) TX bytes:1180 (1.1 KB)

After bringing down and up the MAC address changed and the rx work

$ ifconfig bond0.1501
bond0.1501 Link encap:Ethernet HWaddr 94:57:a5:5b:74:64
          inet addr:10.15.1.206 Bcast:10.15.1.255 Mask:255.255.255.0
          inet6 addr: fe80::9657:a5ff:fe5b:7464/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:7 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3958 (3.9 KB) TX bytes:3574 (3.5 KB)

I attached the server's logs and the network configuration. Maybe something is missing in /etc/network/interfaces, maybe not. If It will result not MAAS/curtin related I will open other bug to the correct team.

Thanks

Revision history for this message
Ryan Harper (raharper) wrote :

Hi,

It would be interested in the following information

1) maas <session> node get-curtin-config <system-id>
2) The output from 'ip a' after first boot where some vlans are failing
3) The output from 'ip a' after you've brough the interfaces down and up again and vlans now work.

Also, VLANs and MTUs have had issues, it may be related to, which has some changes to address mismatches between underlying device MTU and vlans.

https://bugs.launchpad.net/debian/+source/vlan/+bug/1224007

Revision history for this message
Luca (l-dellefemmine) wrote :

Hi,
I attached the get-curtin-config output.

- ip a after first boot:

$ ip a show bond0.1504
15: bond0.1504@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 16:dc:48:62:9c:92 brd ff:ff:ff:ff:ff:ff
    inet 10.15.4.5/24 brd 10.15.4.255 scope global bond0.1504
       valid_lft forever preferred_lft forever
    inet6 fe80::14dc:48ff:fe62:9c92/64 scope link
       valid_lft forever preferred_lft forever

- ip a after brough the interfaces down and up

$ ip a show bond0.1504
19: bond0.1504@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 94:57:a5:5b:74:64 brd ff:ff:ff:ff:ff:ff
    inet 10.15.4.5/24 brd 10.15.4.255 scope global bond0.1504
       valid_lft forever preferred_lft forever
    inet6 fe80::9657:a5ff:fe5b:7464/64 scope link
       valid_lft forever preferred_lft forever

The MAC address changed after brough the interfaces down and up. This bug affect all vlans on top of bond0. It's not mtu size depended.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1631908] Re: VLAN over balance bondig doesn't work

On Thu, Oct 13, 2016 at 8:49 AM, Luca <email address hidden> wrote:

> Hi,
> I attached the get-curtin-config output.
>

Thanks.

>
> - ip a after first boot:
>
> $ ip a show bond0.1504
> 15: bond0.1504@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc
> noqueue state UP group default qlen 1000
> link/ether 16:dc:48:62:9c:92 brd ff:ff:ff:ff:ff:ff
>

Can you run 'ip a' without the show? I want all of the ip a output for
all interfaces.

> inet 10.15.4.5/24 brd 10.15.4.255 scope global bond0.1504
> valid_lft forever preferred_lft forever
> inet6 fe80::14dc:48ff:fe62:9c92/64 scope link
> valid_lft forever preferred_lft forever
>
> - ip a after brough the interfaces down and up
>
> $ ip a show bond0.1504
> 19: bond0.1504@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc
> noqueue state UP group default qlen 1000
> link/ether 94:57:a5:5b:74:64 brd ff:ff:ff:ff:ff:ff
> inet 10.15.4.5/24 brd 10.15.4.255 scope global bond0.1504
> valid_lft forever preferred_lft forever
> inet6 fe80::9657:a5ff:fe5b:7464/64 scope link
> valid_lft forever preferred_lft forever
>

Can you run 'ip a' without the show? I want all of the ip a output for
all interfaces.

Revision history for this message
Luca (l-dellefemmine) wrote :

Hi,

- ip a after first boot:

http://paste.ubuntu.com/23318343/

- ip a after brough all the vlan interfaces down and up

http://paste.ubuntu.com/23318350/

Revision history for this message
Ryan Harper (raharper) wrote :

On Thu, Oct 13, 2016 at 10:07 AM, Luca <email address hidden> wrote:

> Hi,
>
> - ip a after first boot:
>
> http://paste.ubuntu.com/23318343/
>
> - ip a after brough all the vlan interfaces down and up
>
> http://paste.ubuntu.com/23318350/

bond0.1504 isn't present in this, strangely.

The other thing I'm noticing is that when things first come up
and networking doesn't work (ping), the MAC address of the bond
 is NOT one of the slaves; And after a bounce, then you get one
of the slave MACs in the bond and it works.

IIUC, balance-alb does use ARP discovery with the switch to determine
which slave to use; so there may be some interplay with the switch ARP
cache.

>
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1631908
>
> Title:
> VLAN over balance bondig doesn't work
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1631908/+subscriptions
>

Revision history for this message
Ryan Harper (raharper) wrote :

Looking at your config, I see that the updelay is set to 0; this is likely
too low;

https://www.kernel.org/doc/Documentation/networking/bonding.txt

That suggests that updelay should be configured to be greater than your
switch's
forwarding delay, the relevant text is included here:

balance-alb or 6:
...
                When a link is reconnected or a new slave joins the
bond the receive traffic is redistributed among all
active slaves in the bond by initiating ARP Replies
with the selected MAC address to each of the
clients. The updelay parameter (detailed below) must
be set to a value equal or greater than the switch's
forwarding delay so that the ARP Replies sent to the
peers will not be blocked by the switch.

updelay

Specifies the time, in milliseconds, to wait before enabling a
slave after a link recovery has been detected. This option is
only valid for the miimon link monitor. The updelay value
should be a multiple of the miimon value; if not, it will be
rounded down to the nearest multiple. The default value is 0.

On Thu, Oct 13, 2016 at 10:44 AM, Ryan Harper <email address hidden>
wrote:

>
>
> On Thu, Oct 13, 2016 at 10:07 AM, Luca <email address hidden> wrote:
>
>> Hi,
>>
>> - ip a after first boot:
>>
>> http://paste.ubuntu.com/23318343/
>>
>> - ip a after brough all the vlan interfaces down and up
>>
>> http://paste.ubuntu.com/23318350/
>
>
> bond0.1504 isn't present in this, strangely.
>
>
> The other thing I'm noticing is that when things first come up
> and networking doesn't work (ping), the MAC address of the bond
> is NOT one of the slaves; And after a bounce, then you get one
> of the slave MACs in the bond and it works.
>
> IIUC, balance-alb does use ARP discovery with the switch to determine
> which slave to use; so there may be some interplay with the switch ARP
> cache.
>
>
>>
>>
>> --
>> You received this bug notification because you are subscribed to curtin.
>> Matching subscriptions: curtin-bugs-all
>> https://bugs.launchpad.net/bugs/1631908
>>
>> Title:
>> VLAN over balance bondig doesn't work
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/curtin/+bug/1631908/+subscriptions
>>
>
>

Revision history for this message
Luca (l-dellefemmine) wrote :

Hi,
I set updelay to 100 ms into /etc/network/interfaces (miimon is 100 ms) and reboot the server. Now the network is working with no issue.
Here the ip a after reboot:

http://paste.ubuntu.com/23338185/

May it will be a good idea set the updelay and downdelay same as miimon by default.

Revision history for this message
Ryan Harper (raharper) wrote :

Great news that it's working.

Changed in curtin:
status: New → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.