[R2.20] Bond interface flap can break haproxy/vrouter networking

Bug #1454420 reported by amit surana
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Ignatious Johnson Christopher
Trunk
Fix Committed
High
Ignatious Johnson Christopher

Bug Description

R2.20 build 13 Ubuntu 14.0.4.

Flapping a bond interface can cause its MAC address to change. Other
services that rely on this MAC address, should also be restarted so
that they can attach to the new MAC address. Details below.

Consider a case where the controller/compute node is connected to the IP fabric via a 2 interface bond (LAG). The bond assumes the mac address of one of the slave interfaces (usually the first one that comes up).

On the compute, the vhost0 interface also gets assigned the same MAC address as that of the bond interface. This MAC address is also used by keepalived/haproxy as the MAC address corresponding to the VIP.

Now, if the bond interface flaps (networking service was restarted for instance) and the MAC address of the bond interface changes to that of a different slave interface, it is seen that vhost0 interface still points to the old MAC and this breaks all the connectivity to the compute. As far as the KA/HAproxy is concerned, though the services are running, the VIP isn't owned by any of the nodes, and so this functionality also breaks.

The above scenario was simulated on the solution testbed by having the fab script add a static route to all the nodes on a working cluster. After adding the static route the fab add_static_route script restarts the networking service; this flaps the bond interface and causes the MAC address of the bond interface to change, which in turn leads to the above noted issues.

If vrouter/keepalive services are restarted, the issue is resolved.

bond0 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a
em1 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a
em2 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a
vhost0 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a

==restart networking==

bond0 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8b
em1 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8b
em2 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8b
vhost0 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a

amit surana (asurana-t)
description: updated
information type: Proprietary → Public
Revision history for this message
Ashish Ranjan (aranjan-n) wrote :

Bhushan will try on a setup.

tags: added: vrouter
tags: added: provisioning
removed: vrouter
Revision history for this message
chhandak (chhandak) wrote : Re: [Bug 1454420] [NEW] [R2.20] Bond interface flap can break haproxy/vrouter networking
Download full text (11.8 KiB)

Hi Hari,

I can recreate the problem. Once the setup has reached problematic state,
observed all the issues Amit has mentioned.
But to reach this state I have to remove one slave interface from bond and
restart network services.

With normal flap of interface and network restart never able to reproduce
the problem(change of bond mac). Tried running fab add_static_route
multiple time. Which is original trigger.

Can we see the bonding configuration once ? Is it done through fab task ?

root@nodei9:~# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:25:90:e4:08:e5>>>>>>>>>>>>
          inet6 addr: fe80::225:90ff:fee4:8e5/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
          RX packets:66991876 errors:0 dropped:38 overruns:0 frame:0
          TX packets:40657426 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:14582541218 (14.5 GB) TX bytes:15205158719 (15.2 GB)

root@nodei9:~# ifconfig vhost0
vhost0 Link encap:Ethernet HWaddr 00:25:90:e4:08:e4>>>>>>>>>>>>> vhost
and bond having different config
          inet addr:192.168.22.4 Bcast:192.168.22.255 Mask:255.255.255.0
          inet6 addr: fe80::225:90ff:fee4:8e4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:10414086 errors:0 dropped:912 overruns:0 frame:0
          TX packets:10211823 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:13980201687 (13.9 GB) TX bytes:17861939531 (17.8 GB)

In bonding with fab
‹‹‹‹‹‹‹‹‹‹------------------
root@nodei9:~# ifconfig p6p1
p6p1 Link encap:Ethernet HWaddr 00:25:90:e4:08:e4 >>>>>>>>>>>>>>
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:53299835 errors:0 dropped:2425 overruns:0 frame:0
          TX packets:16579042 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:13166191099 (13.1 GB) TX bytes:12684156223 (12.6 GB)

root@nodei9:~# ifconfig p6p2
p6p2 Link encap:Ethernet HWaddr 00:25:90:e4:08:e4>>>>>>>>>>>>>>>
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:67287029 errors:0 dropped:2477 overruns:0 frame:0
          TX packets:40981312 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:14712314080 (14.7 GB) TX bytes:15627041363 (15.6 GB)

root@nodei9:~# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:25:90:e4:08:e4>>>>>>>>>Both the
slave has mac of bond0
          inet6 addr: fe80::225:90ff:fee4:8e4/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
          RX packets:120613044 errors:0 dropped:4902 overruns:0 frame:0
          TX packets:57560635 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:27881390442 (27.8 GB) TX bytes:28311340765 (28.3 GB)

root@nodei9:~# ifconfig p6p1 down
root@nodei9:~# ifconfig p6p1 bond0

root@nodei9:~# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:25:90:e4:08:e4 >>>> Even when we
shut the master interface bond mac is not changing
          inet6 add...

Revision history for this message
amit surana (asurana-t) wrote :
Download full text (12.8 KiB)

bond interface was created via fab scripts; however, this shouldn't matter.

The bond gets its MAC address from the slave that comes up first. This itself could depend on h/w (maybe they bring them up based on lspci address etc). I have a quanta server.

Also, the user/fab/SM, at the time of configuring the bond might add the interfaces to the bond in an order which is different from what is done when network services are restarted. IOW, you don't have to delete an interface from the bond to recreate. Just see which interface comes up first when network services are restarted. Once this is figured, just configure the bond in a way such that the other interface is first added so that its MAC becomes the bond MAC.

You can check 10.87.26.150 (root:contrail123).

Amit.
________________________________________
From: Chhandak Mukherjee
Sent: Thursday, May 28, 2015 3:26 AM
To: Bug 1454420; Hari Prasad Killi; Praveen K V; Amit Surana
Cc: Vedamurthy Ananth Joshi; Nagabhushana R
Subject: Re: [Bug 1454420] [NEW] [R2.20] Bond interface flap can break haproxy/vrouter networking

Hi Hari,

I can recreate the problem. Once the setup has reached problematic state,
observed all the issues Amit has mentioned.
But to reach this state I have to remove one slave interface from bond and
restart network services.

With normal flap of interface and network restart never able to reproduce
the problem(change of bond mac). Tried running fab add_static_route
multiple time. Which is original trigger.

Can we see the bonding configuration once ? Is it done through fab task ?

root@nodei9:~# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:25:90:e4:08:e5>>>>>>>>>>>>
          inet6 addr: fe80::225:90ff:fee4:8e5/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
          RX packets:66991876 errors:0 dropped:38 overruns:0 frame:0
          TX packets:40657426 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:14582541218 (14.5 GB) TX bytes:15205158719 (15.2 GB)

root@nodei9:~# ifconfig vhost0
vhost0 Link encap:Ethernet HWaddr 00:25:90:e4:08:e4>>>>>>>>>>>>> vhost
and bond having different config
          inet addr:192.168.22.4 Bcast:192.168.22.255 Mask:255.255.255.0
          inet6 addr: fe80::225:90ff:fee4:8e4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:10414086 errors:0 dropped:912 overruns:0 frame:0
          TX packets:10211823 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:13980201687 (13.9 GB) TX bytes:17861939531 (17.8 GB)

In bonding with fab
‹‹‹‹‹‹‹‹‹‹------------------
root@nodei9:~# ifconfig p6p1
p6p1 Link encap:Ethernet HWaddr 00:25:90:e4:08:e4 >>>>>>>>>>>>>>
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:53299835 errors:0 dropped:2425 overruns:0 frame:0
          TX packets:16579042 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:13166191099 (13.1 GB) TX bytes:12684156223 (12.6 GB)

root@nodei9:~# ifconfig p6p2
p6p2 Link encap:Ethernet HWaddr 00:25:90:e4:08:e4>>>...

Revision history for this message
chhandak (chhandak) wrote :
Download full text (8.8 KiB)

Hi Amit,

We can recreate the problem in your setup.

It is known problem in ubuntu 14.04 :
https://bugs.launchpad.net/ubuntu/+source/ifenslave/+bug/1288196

Followed similar workaround suggested. Added hwaddress in bond config
which will make sure the consistency of bond mac.

With this change tried restarting network multiple time and never hit the
problem.

We should able to push the same through our provisioning script. Leaving
that open for further discussion.

For now you should not hit the problem with the workaround.

Thanks and Regards,
Chhandak

On 5/13/15, 3:04 AM, "amit surana" <email address hidden> wrote:

>Private bug reported:
>
>R2.20 build 13 Ubuntu 14.0.4.
>
>
>Flapping a bond interface can cause its MAC address to change. Other
>services that rely on this MAC address, should also be restarted so
>that they can attach to the new MAC address. Details below.
>
>
>Consider a case where the controller/compute node is connected to the IP
>fabric via a 2 interface bond (LAG). The bond assumes the mac address of
>one of the slave interfaces (usually the first one that comes up).
>
>On the compute, the vhost0 interface also gets assigned the same MAC
>address as that of the bond interface. This MAC address is also used by
>keepalived/haproxy as the MAC address corresponding to the VIP.
>
>Now, if the bond interface flaps (networking service was restarted for
>instance) and the MAC address of the bond interface changes to that of a
>different slave interface, it is seen that vhost0 interface still points
>to the old MAC and this breaks all the connectivity to the compute. As
>far as the KA/HAproxy is concerned, though the services are running, the
>VIP isn't owned by any of the nodes, and so this functionality also
>breaks.
>
>The above scenario was simulated on the solution testbed by having the
>fab script add a static route to all the nodes on a working cluster.
>After adding the static route the fab add_static_route script restarts
>the networking service; this flaps the bond interface and causes the
>MAC address of the bond interface to change, which in turn leads to the
>above noted issues.
>
>If vrouter/keepalive services are restarted, the issue is resolved.
>
>
>bond0 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a
>em1 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a
>em2 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a
>vhost0 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a
>
>==restart networking==
>
>bond0 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8b
>em1 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8b
>em2 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8b
>vhost0 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a
>
>** Affects: juniperopenstack
> Importance: High
> Status: New
>
>** Affects: juniperopenstack/r2.20
> Importance: High
> Status: New
>
>** Affects: juniperopenstack/trunk
> Importance: High
> Status: New
>
>** Also affects: juniperopenstack/r2.20
> Importance: Undecided
> Status: New
>
>** Also affects: juniperopenstack/trunk
> Importance: High
> Status: New
>
>** Changed in: juniperopenstack/r2.20
> Importance: Undecided => High
>
...

Read more...

Revision history for this message
chhandak (chhandak) wrote :
Download full text (9.8 KiB)

Sorry for multiple email.

Even we can add a "pre-up sleep 5" to the secondary interfaces. This will
make sure that the always other interface should come up first.
This workaround is also working fine in your setup and should be fairly
simple to implement in our provisioning code

Thanks and Regards,
Chhandak

auto p1p2
iface p1p2 inet manual
    down ip addr flush dev p1p2
    bond-master bond0

auto p1p1
iface p1p1 inet manual
    down ip addr flush dev p1p1
    bond-master bond0
    pre-up sleep 5

auto bond0
iface bond0 inet manual
    pre-up ifconfig bond0 up
    post-down ifconfig bond0 down
    bond-slaves none
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit_hash_policy layer3+4

On 6/10/15, 12:15 AM, "Chhandak Mukherjee" <email address hidden> wrote:

>Hi Amit,
>
>We can recreate the problem in your setup.
>
>It is known problem in ubuntu 14.04 :
>https://bugs.launchpad.net/ubuntu/+source/ifenslave/+bug/1288196
>
>Followed similar workaround suggested. Added hwaddress in bond config
>which will make sure the consistency of bond mac.
>
>With this change tried restarting network multiple time and never hit the
>problem.
>
>We should able to push the same through our provisioning script. Leaving
>that open for further discussion.
>
>For now you should not hit the problem with the workaround.
>
>Thanks and Regards,
>Chhandak
>
>
>
>On 5/13/15, 3:04 AM, "amit surana" <email address hidden> wrote:
>
>>Private bug reported:
>>
>>R2.20 build 13 Ubuntu 14.0.4.
>>
>>
>>Flapping a bond interface can cause its MAC address to change. Other
>>services that rely on this MAC address, should also be restarted so
>>that they can attach to the new MAC address. Details below.
>>
>>
>>Consider a case where the controller/compute node is connected to the IP
>>fabric via a 2 interface bond (LAG). The bond assumes the mac address of
>>one of the slave interfaces (usually the first one that comes up).
>>
>>On the compute, the vhost0 interface also gets assigned the same MAC
>>address as that of the bond interface. This MAC address is also used by
>>keepalived/haproxy as the MAC address corresponding to the VIP.
>>
>>Now, if the bond interface flaps (networking service was restarted for
>>instance) and the MAC address of the bond interface changes to that of a
>>different slave interface, it is seen that vhost0 interface still points
>>to the old MAC and this breaks all the connectivity to the compute. As
>>far as the KA/HAproxy is concerned, though the services are running, the
>>VIP isn't owned by any of the nodes, and so this functionality also
>>breaks.
>>
>>The above scenario was simulated on the solution testbed by having the
>>fab script add a static route to all the nodes on a working cluster.
>>After adding the static route the fab add_static_route script restarts
>>the networking service; this flaps the bond interface and causes the
>>MAC address of the bond interface to change, which in turn leads to the
>>above noted issues.
>>
>>If vrouter/keepalive services are restarted, the issue is resolved.
>>
>>
>>bond0 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a
>>em1 Link encap:Ethernet HWaddr 08:9e:01:d9:27:8a
>>em2 ...

Revision history for this message
amit surana (asurana-t) wrote :

this isn't specific to 14.0.4.

----

I'm seeing the same behavior on my 12.04 and 14.04 systems.

I wanted to deploy a fairly large (50 servers) IPv6 only setup using SLAAC but I had to revert to static IPv6 configuration due to this issue with Ubuntu.

Configuring static mac addresses isn't something I want either, so for now I'm sticking with a static IPv6 configuration, but that's not something I want to keep.

Would like to see this resolved

---

Revision history for this message
Nagabhushana R (bhushana) wrote :

I prefer to go with option1 of using "Pin the Bond mac to one of the member interfaces mac.” so that even if the first interface never comes back up the bond will just use its mac address and come up.

Thanks,
Ignatious

tags: added: blocker
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11820
Submitter: Ignatious Johnson Christopher (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11821
Submitter: Ignatious Johnson Christopher (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11820
Committed: http://github.org/Juniper/contrail-provisioning/commit/65f2a0f685325f7f3bd4cf980e3e7e6c93232dd7
Submitter: Zuul
Branch: R2.20

commit 65f2a0f685325f7f3bd4cf980e3e7e6c93232dd7
Author: Ignatious Johnson Christopher <email address hidden>
Date: Thu Jun 18 23:59:44 2015 -0700

Configuring static mac address for the bond interface with the
first member interface mac address
Closes-Bug: 1454420

Change-Id: I3102ce3edec6ef698ee3a1f2c5dba7f3dfe257d5

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11821
Committed: http://github.org/Juniper/contrail-provisioning/commit/4b9059bbeaa3f787315992026c62d14f58eaebbe
Submitter: Zuul
Branch: master

commit 4b9059bbeaa3f787315992026c62d14f58eaebbe
Author: Ignatious Johnson Christopher <email address hidden>
Date: Fri Jun 19 00:04:29 2015 -0700

Configuring static mac address for the bond interface with the
first member interface mac address
Closes-Bug: 1454420

Change-Id: Ia680c45d0328ef705b76c53e49d62270bcac848b

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/11879
Submitter: prasad miriyala (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/11880
Submitter: prasad miriyala (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/11879
Committed: http://github.org/Juniper/contrail-server-manager/commit/5c038e98d511a3864ca813fd29bfb953164d9c26
Submitter: Zuul
Branch: master

commit 5c038e98d511a3864ca813fd29bfb953164d9c26
Author: Prasad Miriyala <email address hidden>
Date: Fri Jun 19 16:46:55 2015 -0700

Closes-Bug: #1454420, Bond interface flap can break haproxy/vrouter networking
- Server manager has copy of interface_setup.py with additional hooks for servers manager,
modifying to address the above bond interface flap issue

Change-Id: Iff64ed1cf6b9a93b8f05f5d8a2211c78d5e35727

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/11880
Committed: http://github.org/Juniper/contrail-server-manager/commit/71db7ec963e89869411ffd10faaf44fd4c333514
Submitter: Zuul
Branch: R2.20

commit 71db7ec963e89869411ffd10faaf44fd4c333514
Author: Prasad Miriyala <email address hidden>
Date: Fri Jun 19 17:00:30 2015 -0700

Closes-Bug: #1454420, Bond interface flap can break haproxy/vrouter networking
- Server manager has copy of interface_setup.py with additional hooks for servers manager,
modifying to address the above bond interface flap issue

Change-Id: I0374bb20d9c615fa09f7924ab224082ee5dacaab

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.