Bonding mode Balance-ALB stomps VM MACs

Bug #1098302 reported by liquidhorse
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

This issue was fixed in the 3.8 kernel series; I have backported the appropriate patch (as directed by the netdev kernel list) to the 3.0 kernel and it worked successfully on several machines. I have built but NOT tested this patch against 3.2 and 3.4 kernels. The patch appears to require minor modification for inclusion in the 3.7 kernel series. The issue resolved is as follows:

This issue affects (at the least) virtual machines running under KVM and using bridging to connect to a network, when that bridge communicates over a bond using mode-6 (balance-alb). To replicate, configure a bond of 1 or more adapters with mode-6 and bridge over that bond. Configure a virtual machine (in my instance, I used KVM) to place its vnet adapter under the same bridge. With the VM running, ping the virtual machine from a remote host and check the ARP cache on the remote host. You will find that the MAC reported for the virtual machine will not be its configured MAC, but the MAC of one of the bond's slaves. This causes intermittent and significant connectivity losses (both to and from the virtual machine).

The patch modifies the balance-alb bonding driver to leave non-local MACs unmodified. After application, a repeat of the above test should result in the virtual machine's correct MAC being reported in the ARP cache of the remote host.

Revision history for this message
liquidhorse (liquidhorse) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1098302

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: patch
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you provide some information on the status of the patch with regards to getting it merged in upstream stable? Has it been sent to the upstream stable mailing list?

People affected by this bug are probably wondering why the kernel team doesn't just apply the patch and fix it. The reason is that the kernel team is reluctant (not opposed) to apply any patch to a stable kernel that is not from upstream. Applying patches that don't come from upstream add greatly to the support of the kernel as other upstream patches may touch the same area as the non-upstream patch and may prevent them from applying cleanly.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
tags: added: kernel-da-key
Revision history for this message
liquidhorse (liquidhorse) wrote : Re: [Bug 1098302] Re: Bonding mode Balance-ALB stomps VM MACs

Hi! I have not sent it to the upstream stable mailing list. If that
would be the proper place, I'd be happy to submit it there instead.
This is my first patch submission and since the original patch came
from kernel.org (but for the 3.8+ kernel series only) I wasn't sure
where would be the correct place to submit a backport for 3.0 thru
3.4. From reading the submission docs, I thought this would be
correct.

Please advise. Thanks!!

On Thu, Jan 10, 2013 at 3:45 PM, Joseph Salisbury
<email address hidden> wrote:
> Can you provide some information on the status of the patch with regards
> to getting it merged in upstream stable? Has it been sent to the
> upstream stable mailing list?
>
> People affected by this bug are probably wondering why the kernel team
> doesn't just apply the patch and fix it. The reason is that the kernel
> team is reluctant (not opposed) to apply any patch to a stable kernel
> that is not from upstream. Applying patches that don't come from
> upstream add greatly to the support of the kernel as other upstream
> patches may touch the same area as the non-upstream patch and may
> prevent them from applying cleanly.
>
> ** Changed in: linux (Ubuntu)
> Importance: Undecided => Medium
>
> ** Changed in: linux (Ubuntu)
> Status: Incomplete => Triaged
>
> ** Tags added: kernel-da-key
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1098302
>
> Title:
> Bonding mode Balance-ALB stomps VM MACs
>
> Status in “linux” package in Ubuntu:
> Triaged
>
> Bug description:
> This issue was fixed in the 3.8 kernel series; I have backported the
> appropriate patch (as directed by the netdev kernel list) to the 3.0
> kernel and it worked successfully on several machines. I have built
> but NOT tested this patch against 3.2 and 3.4 kernels. The patch
> appears to require minor modification for inclusion in the 3.7 kernel
> series. The issue resolved is as follows:
>
> This issue affects (at the least) virtual machines running under KVM
> and using bridging to connect to a network, when that bridge
> communicates over a bond using mode-6 (balance-alb). To replicate,
> configure a bond of 1 or more adapters with mode-6 and bridge over
> that bond. Configure a virtual machine (in my instance, I used KVM)
> to place its vnet adapter under the same bridge. With the VM running,
> ping the virtual machine from a remote host and check the ARP cache on
> the remote host. You will find that the MAC reported for the virtual
> machine will not be its configured MAC, but the MAC of one of the
> bond's slaves. This causes intermittent and significant connectivity
> losses (both to and from the virtual machine).
>
> The patch modifies the balance-alb bonding driver to leave non-local
> MACs unmodified. After application, a repeat of the above test should
> result in the virtual machine's correct MAC being reported in the ARP
> cache of the remote host.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1098302/+subscriptions

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Do you happen to have the SHA1 for the commit in 3.8 that fixes the bug? If so, we can look and see if upstream stable was CC'd. If it was, then it will automatically be applied to stable. However, it sounds like you had to do some work to backport the patch, so it may be best for you to submit your changes upstream.

To submit your patch, send your patch with the detailed description/changelog and your Signoff (ending with Signed-off-by: your name <email>), to the emails listed from ./scripts/get_maintainer.pl drivers/SUBSYSTEM-DETAILS (the get_maintainer.pl is from the kernel sources) as well as <email address hidden>. Once you have sent the patch upstream and it's accepted, please drop a note here so that we can cherry-pick/include the patch into Ubuntu kernel.

Revision history for this message
liquidhorse (liquidhorse) wrote :

Yes, it was 567b871e503316b0927e54a3d7c86d50b722d955. If it does
appear that upstream did not receive it, I'd be happy to submit it to
them and let you know if it's accepted.

Thanks!

On Fri, Jan 11, 2013 at 2:54 PM, Joseph Salisbury
<email address hidden> wrote:
> Do you happen to have the SHA1 for the commit in 3.8 that fixes the bug?
> If so, we can look and see if upstream stable was CC'd. If it was, then
> it will automatically be applied to stable. However, it sounds like you
> had to do some work to backport the patch, so it may be best for you to
> submit your changes upstream.
>
> To submit your patch, send your patch with the detailed
> description/changelog and your Signoff (ending with Signed-off-by: your
> name <email>), to the emails listed from ./scripts/get_maintainer.pl
> drivers/SUBSYSTEM-DETAILS (the get_maintainer.pl is from the kernel
> sources) as well as <email address hidden>. Once you have sent the patch
> upstream and it's accepted, please drop a note here so that we can
> cherry-pick/include the patch into Ubuntu kernel.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1098302
>
> Title:
> Bonding mode Balance-ALB stomps VM MACs
>
> Status in “linux” package in Ubuntu:
> Triaged
>
> Bug description:
> This issue was fixed in the 3.8 kernel series; I have backported the
> appropriate patch (as directed by the netdev kernel list) to the 3.0
> kernel and it worked successfully on several machines. I have built
> but NOT tested this patch against 3.2 and 3.4 kernels. The patch
> appears to require minor modification for inclusion in the 3.7 kernel
> series. The issue resolved is as follows:
>
> This issue affects (at the least) virtual machines running under KVM
> and using bridging to connect to a network, when that bridge
> communicates over a bond using mode-6 (balance-alb). To replicate,
> configure a bond of 1 or more adapters with mode-6 and bridge over
> that bond. Configure a virtual machine (in my instance, I used KVM)
> to place its vnet adapter under the same bridge. With the VM running,
> ping the virtual machine from a remote host and check the ARP cache on
> the remote host. You will find that the MAC reported for the virtual
> machine will not be its configured MAC, but the MAC of one of the
> bond's slaves. This causes intermittent and significant connectivity
> losses (both to and from the virtual machine).
>
> The patch modifies the balance-alb bonding driver to leave non-local
> MACs unmodified. After application, a repeat of the above test should
> result in the virtual machine's correct MAC being reported in the ARP
> cache of the remote host.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1098302/+subscriptions

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks so much for submitting the commit to stable!

Revision history for this message
penalvch (penalvch) wrote :

liquidhorse, would utilizing the Raring enablement stack in Precise via https://wiki.ubuntu.com/Kernel/LTSEnablementStack work for you?

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
liquidhorse (liquidhorse) wrote :

Yes, that would probably do just fine. Raring is using the 3.8 kernel,
yes? I have been building 3.8 kernels for some of my 12.04 deploys, and
with few exceptions it has worked well, as it has the necessary fix. So,
if Raring is on 3.8, the enablement stack would work for me. Thanks!

-- Matthew

On Fri, Jan 3, 2014 at 1:13 PM, Christopher M. Penalver <
<email address hidden>> wrote:

> liquidhorse, would utilizing the Raring enablement stack in Precise via
> https://wiki.ubuntu.com/Kernel/LTSEnablementStack work for you?
>
> ** Changed in: linux (Ubuntu)
> Status: Triaged => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1098302
>
> Title:
> Bonding mode Balance-ALB stomps VM MACs
>
> Status in “linux” package in Ubuntu:
> Incomplete
>
> Bug description:
> This issue was fixed in the 3.8 kernel series; I have backported the
> appropriate patch (as directed by the netdev kernel list) to the 3.0
> kernel and it worked successfully on several machines. I have built
> but NOT tested this patch against 3.2 and 3.4 kernels. The patch
> appears to require minor modification for inclusion in the 3.7 kernel
> series. The issue resolved is as follows:
>
> This issue affects (at the least) virtual machines running under KVM
> and using bridging to connect to a network, when that bridge
> communicates over a bond using mode-6 (balance-alb). To replicate,
> configure a bond of 1 or more adapters with mode-6 and bridge over
> that bond. Configure a virtual machine (in my instance, I used KVM)
> to place its vnet adapter under the same bridge. With the VM running,
> ping the virtual machine from a remote host and check the ARP cache on
> the remote host. You will find that the MAC reported for the virtual
> machine will not be its configured MAC, but the MAC of one of the
> bond's slaves. This causes intermittent and significant connectivity
> losses (both to and from the virtual machine).
>
> The patch modifies the balance-alb bonding driver to leave non-local
> MACs unmodified. After application, a repeat of the above test should
> result in the virtual machine's correct MAC being reported in the ARP
> cache of the remote host.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1098302/+subscriptions
>

Revision history for this message
penalvch (penalvch) wrote :

liquidhorse, this bug report is being closed due to your last comment https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1098302/comments/9 regarding this being fixed with an update. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.