The second LACP bond wasn't set up properly

Bug #1469746 reported by Andrey Danin on 2015-06-29
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Sergey Vasilenko
6.1.x
High
Stanislav Makar
7.0.x
High
Sergii Golovatiuk
8.0.x
High
Sergey Vasilenko

Bug Description

Affects 6.1 GA.

Ubuntu, Neutron, linux bonds.

A node should have two LACP bonds set up, but the second one has a type round-robin instead 802.3ad.

Steps to reproduce:
1) Create a new env.
2) Add a node with 5 NICs - 1 for PXE and 4 others form two LACP bonds with 2 NICs in each (an appropriate setup on the switch side should be done). In my case I added 3 such nodes with Ceph role assigned.
3) Assign Public, Management to bond0 and Storage, Private to bond1. Using API call mark bonds as linux (not OVS), mode=802.3ad, xmit_hash_policy=layer3+4, and rate=fast (the code I use to do this http://paste.openstack.org/show/442567/)
4) Add other nodes required for deployment.
5) Deploy the env.

Expected result:
he deployment should pass.
All bonds on Ceph nodes should be configured as 802.3ad with rate=fast.

Actual result:
The result is intermittent. For 3 such Ceph nodes I have 80% chance of broken env.
The deployment fails while executing /etc/puppet/modules/osnailyfacter/modular/astute/ceph_ready_check.rb because of lack of connectivity through the bonds.
One or more nodes have the second bond (bond1) configured in round-robin mode.
root@node-20:~# grep 'Bonding Mode' /proc/net/bonding/bond*
/proc/net/bonding/bond0:Bonding Mode: IEEE 802.3ad Dynamic link aggregation
/proc/net/bonding/bond1:Bonding Mode: load balancing (round-robin)
Even if a bond is configured in 802.3ad mode it always have lacp_rate=slow.
However in /etc/network/interfaces.d bond configurations are correct and almost equal with the only difference in "post-up sleep 45" for bond1.
(See http://paste.openstack.org/show/442570/ for real example). If a bond is restarted (ifdown -a;ifup -a) it gets configured in the right way.

Workaround:
If a deployment fails go to Ceph nodes, restart broken bonds (ifdown bond1;ifup eth2 eth3; # If I run ifup bond1 it freezes forever, so I have to run ifup <nics-belonged-to-bond1>, see http://paste.openstack.org/show/442583/), check that ceph osds are ok, then run deploy again.

So, I see three bugs here:
1) lacp_rate always slow for runtime config right after deployment.
2) sometimes the second bond doesn't get the right mode.
3) It's not possible to bring bonds up by ifup command.

Andrey Danin (gcon-monolake) wrote :
description: updated
Mike Scherbakov (mihgen) wrote :

I talked to Greg E. about this one, is there a way to have package update in proposed by end of this week?

tags: added: customer-found
Sergey Vasilenko (xenolog) wrote :

I waiting for re-deploy the same configuration with normal ('bondNNN') interface names.
for confirm this issue.

Changed in fuel:
status: New → Confirmed
Oleksiy Molchanov (omolchanov) wrote :

Marking as incomplete, as we need a reproduce.

Changed in fuel:
status: Confirmed → Incomplete
Andrey Danin (gcon-monolake) wrote :

The problem occurs from time to time with any bond names. I have 1 successful deployment from 4. 3 others failed because of the bug.

Also, if I reboot a node all bonds set up correctly.

Changed in fuel:
status: Incomplete → Confirmed
Changed in fuel:
status: Confirmed → Triaged

Fix proposed to branch: master
Review: https://review.openstack.org/209084

Changed in fuel:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/209084
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=f35587c716519334f8b2469f80eeaab35dd1b10b
Submitter: Jenkins
Branch: master

commit f35587c716519334f8b2469f80eeaab35dd1b10b
Author: Sergey Vasilenko <email address hidden>
Date: Tue Aug 4 17:08:46 2015 +0300

    Put slave interfaces to UP state while bond assembling

    Closes-bug: #1469746

    Change-Id: I11fd2177166454e4f3f85670f30bea638adc67f9

Changed in fuel:
status: In Progress → Fix Committed
Andrey Danin (gcon-monolake) wrote :

The proposed fixes don't work. Even if I use l23network module from 7.0 it doesn't work too. The same symptoms: bond1 is in round-robin mode instead of 802.1ad.

Changed in fuel:
status: Fix Committed → Confirmed
description: updated
Vladimir Kuklin (vkuklin) wrote :

Andrey, could you please provide more diagnostic info?

Stanislav Makar (smakar) on 2015-09-03
Changed in fuel:
assignee: Sergey Vasilenko (xenolog) → Stanislav Makar (smakar)
status: Confirmed → In Progress
Andrey Danin (gcon-monolake) wrote :

Vladimir, I've updated a bug description yesterday and put all the info I have at this moment.

Changed in fuel:
assignee: Stanislav Makar (smakar) → Sergey Vasilenko (xenolog)
assignee: Sergey Vasilenko (xenolog) → Stanislav Makar (smakar)
Changed in fuel:
assignee: Stanislav Makar (smakar) → Sergey Vasilenko (xenolog)

Reviewed: https://review.openstack.org/220201
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=e84fb82f37c2c2dca17c5a86384f5ed0830840b3
Submitter: Jenkins
Branch: master

commit e84fb82f37c2c2dca17c5a86384f5ed0830840b3
Author: Stanislav Makar <email address hidden>
Date: Thu Sep 3 15:28:32 2015 +0000

    Fix problem with bonds configuration

    * Make bond mode and xmit_hash_policy bond_properties processed firstly.
      Due to if it goes after lacp_rate
      than lacp_rate is not set correctly.
    * Change the way to write into file because old way does not rise an
      error if "Operation not permitted".
    * Move type tests into correct place hence enable them.
    * Add wrapper for get/set /sys/class/... properties
    * don't change bond property if no required

    Change-Id: Ib76b88a8ed272d3465b1871db9f4b7b267888d64
    Closes-bug: #1469746

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
status: Fix Committed → Triaged

Fix proposed to branch: master
Review: https://review.openstack.org/220839

Changed in fuel:
assignee: Sergey Vasilenko (xenolog) → Vladimir Kuklin (vkuklin)
status: Triaged → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/220840

Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Sergey Vasilenko (xenolog)

Reviewed: https://review.openstack.org/220840
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=09696007c094653fcde86d002c42fdac6b0cfc5b
Submitter: Jenkins
Branch: master

commit 09696007c094653fcde86d002c42fdac6b0cfc5b
Author: Sergey Vasilenko <email address hidden>
Date: Sun Sep 6 13:29:11 2015 -0500

    Fix problem with bonds configuration

    * Make bond mode and xmit_hash_policy bond_properties processed firstly.
      Due to if it goes after lacp_rate
      than lacp_rate is not set correctly.
    * Change the way to write into file because old way does not rise an
      error if "Operation not permitted".
    * Move type tests into correct place hence enable them.
    * Add wrapper for get/set /sys/class/... properties
    * don't change bond property if no required
    * filter undefined properties

    Co-Authored: Stanislav Makar <email address hidden>

    This patch is a refactor of commit e84fb82f37c2c2dca17c5a86384f5ed0830840b3.

    Closes-bug: #1469746
    Сloses-Bug: #1492781

    Change-Id: I72b95851b5d4addc1b0e826cbdcf74b720d377ae

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/221105
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=6effd35b79308a1936bcac1ff0289020338ff726
Submitter: Jenkins
Branch: stable/7.0

commit 6effd35b79308a1936bcac1ff0289020338ff726
Author: Sergey Vasilenko <email address hidden>
Date: Sun Sep 6 13:29:11 2015 -0500

    Fix problem with bonds configuration

    * Make bond mode and xmit_hash_policy bond_properties processed firstly.
      Due to if it goes after lacp_rate
      than lacp_rate is not set correctly.
    * Change the way to write into file because old way does not rise an
      error if "Operation not permitted".
    * Move type tests into correct place hence enable them.
    * Add wrapper for get/set /sys/class/... properties
    * don't change bond property if no required
    * filter undefined properties

    Co-Authored: Stanislav Makar <email address hidden>

    This patch is a refactor of commit e84fb82f37c2c2dca17c5a86384f5ed0830840b3.

    Closes-bug: #1469746
    Сloses-Bug: #1492781

    Change-Id: I72b95851b5d4addc1b0e826cbdcf74b720d377ae

Stanislav Makar (smakar) on 2015-09-11
tags: added: on-verification
Stanislav Makar (smakar) wrote :

  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "288"
  build_id: "288"
  nailgun_sha: "93477f9b42c5a5e0506248659f40bebc9ac23943"
  python-fuelclient_sha: "1ce8ecd8beb640f2f62f73435f4e18d1469979ac"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "a717657232721a7fafc67ff5e1c696c9dbeb0b95"
  fuel-library_sha: "121016a09b0e889994118aa3ea42fa67eabb8f25"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "6b83d6a6a75bf7bca3177fcf63b2eebbf1ad0a85"

tags: removed: on-verification
Dmitry Pyzhov (dpyzhov) on 2015-10-22
tags: added: area-library

Change abandoned by Sergey Vasilenko (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/209086

Dmitry Pyzhov (dpyzhov) on 2015-11-30
Changed in fuel:
milestone: 7.0 → 8.0

Change abandoned by Vladimir Kuklin (<email address hidden>) on branch: master
Review: https://review.openstack.org/220839

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments