LACP interfaces are configured later than ifup exits

Bug #1441435 reported by Andrey Grebennikov
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Sergey Vasilenko
6.0.x
Invalid
High
MOS Maintenance
6.1.x
Fix Committed
High
Sergey Vasilenko

Bug Description

I have servers with network interfaces with LACP bonding option configured on the switches.

I install Fuel 6.0 with RedHat, Neutron with OVS VLANS.

When server boots up, LACP negotiation requires some time, during this time all other controllers are unavailable for it.
The node creates its own Corosync cluster, RabbitMQ cluster. When the network is ready, it is necessary to stop corosync server on the node and start it back.

We need to add some check into init script so that before starting corosync and API services the node checked its network connectivity to external resources.

Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I believe the Corosync runlevels should ensure it will be started after the networking have started

Revision history for this message
Sergey Vasilenko (xenolog) wrote :

This happens, because bond with LACP can take so much time for assembling.
We should implement post-up script for control this behavior.

Revision history for this message
Sergey Vasilenko (xenolog) wrote :

This didn't affect 6.0.* because 6.0 has no non-experimental bonding.
Andrey found this bug on 6.0, because making highly customized deployment solution.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This bug can be a dup of https://bugs.launchpad.net/fuel/+bug/1440723, please check

tags: added: release-notes
summary: - Corosync starts improperly because of LACP
+ LACP interfaces are configured later than ifup exits
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I set the 6.0.2 milestone confirmed in the case we decided to make bonding non-experimental for the next maint. release. If we won't, lets put it won't fix

Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: There is no connectivity on management network after node reboot if placed on LACP bond

Reproduced the issue with Ubuntu HA VLAN + LACP 802.3ad layer2 slow: 3 controllers with sys tests (e1000 interfaces) (it was successfull, OSTF tests passed). After node-1 got rebooted, it appeared to be isolated via management network.

summary: - LACP interfaces are configured later than ifup exits
+ There is no connectivity on management network after node reboot if
+ placed on LACP bond
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

bonds configuration

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
summary: - There is no connectivity on management network after node reboot if
- placed on LACP bond
+ There is no any network connectivity after node rebooted, if its
+ management net was placed on LACP bond
tags: added: to-be-covered-by-tests
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This connectivity issue looks not exactly as described in this bug - there was no "network ready" state, so I better submit a separate bug

summary: - There is no any network connectivity after node rebooted, if its
- management net was placed on LACP bond
+ LACP interfaces are configured later than ifup exits
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Currently we cannot reproduce this issue as blocked by https://bugs.launchpad.net/fuel/+bug/1453139

Revision history for this message
Pavel Vaylov (pvaylov) wrote :

Hey guys,
It it possible that we have "LACP rate: slow" out of the box and therefor it takes at least 30 seconds to negotiate LACP bonding ?

Revision history for this message
Sergey Vasilenko (xenolog) wrote :

An adding NETWORKDELAY=NN to /etc/sysconfig/network
is a possible solution for this.

tags: added: l23network
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/183669

Changed in fuel:
assignee: Peter Zhurba (pzhurba) → Sergey Vasilenko (xenolog)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/184176

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/183669
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=1fd24d32f0d7fb0a03ccdf0e3c91b868d4c4fa64
Submitter: Jenkins
Branch: master

commit 1fd24d32f0d7fb0a03ccdf0e3c91b868d4c4fa64
Author: Sergey Vasilenko <email address hidden>
Date: Fri May 15 12:38:30 2015 -0700

    add property 'delay_while_up' for port or bond

    In some cases (for example slow LACP bonds or optical links)
    system administrator need make delay between interface stay UP and
    continue of boot process. This option allow make delay after interface UP.

    Closes-bug: #1441435
    Change-Id: I5013edc915b78687e1eebe5c61fe5a9befd222f6

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Here is the test cases:
0) given bond config:
TYPE=Bond
BONDING_OPTS="mode=802.3ad miimon=100 lacp_rate=slow xmit_hash_policy=layer2"
ONBOOT=yes
BOOTPROTO=none
BRIDGE=br-aux
DEVICE=bond0

1) delay conf check: should be not less than 45 seconds for LACP bond conf
# grep -r sleep /etc/sysconfig/network-scripts/int*
/etc/sysconfig/network-scripts/interface-up-script-bond1:echo "post-up sleep up to 15. " ; sleep 15
/etc/sysconfig/network-scripts/interface-up-script-br-aux:echo "post-up sleep up to 45. " ; sleep 45

2) LACP bond ifup time check w/o reboot: should be not less than 45 seconds
#!/bin/bash
BND=bond0
ip=<some-ip-from-the-network-put-at-the-LACP-bond>
cat /proc/net/bonding/$BND
ifdown $BND
sleep 10
date
echo "Start $BND"
ifup $BND
time until ping -q -c 1 $ip -w 1 >/dev/null
do
    echo -n "+"
    sleep 1
done

3) LACP bond ifup time check /w reboot: should be not less than 45 seconds

Revision history for this message
Sergey Vasilenko (xenolog) wrote :

This case not correct.

ifup for bonds has unpleasant feature.

how to work bond:
firstly bond interface stay UP
after this stay UP his slave eth interfaces.
after this ifup script attach interfaces to bond
after all interfaces attached ifup pass bond properties to bond.
after this, if IP address assigned to bond, ip address was configured.

If bond has IP address -- post-up script was executed
If bond only member of bridge -- no post-up script executed

taking into account all this I propose following seq:

if bond has IP address -- his configure file will contain sleep N (N=15 for non-LACP and 45 for LACP)
if bond is member of bridge -- bond config file has no any sleep. But bridge config file contains sleep N (45 or 15).

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Thank you, @Sergey, we will exclude the test case #2 when

Revision history for this message
Peter Zhurba (pzhurba) wrote :

CentOS env seems working on hardware env

Performed steps:

Check if present config files and what does it contens

[root@node-1 ~]# ls -l /etc/sysconfig/network-scripts/interface-up-script-*
-rwxr-xr-x 1 root root 58 May 20 08:11 /etc/sysconfig/network-scripts/interface-up-script-bond1
-rwxr-xr-x 1 root root 58 May 20 08:11 /etc/sysconfig/network-scripts/interface-up-script-br-aux
[root@node-1 ~]# cat /etc/sysconfig/network-scripts/interface-up-script-*
#!/bin/sh
echo "post-up sleep up to 15. " ; sleep 15
true
#!/bin/sh
echo "post-up sleep up to 45. " ; sleep 45
true
[root@node-1 ~]#

Check boot log output. If post up scripts is performed

root@node-1 ~]# cat /var/log/boot.log
  Welcome to CentOS
……………………………
……………………………
Bringing up interface bond1: post-up sleep up to 15.
                                                           [ OK ]
Bringing up interface eth0: [ OK ]
Bringing up interface eth1: [ OK ]
Bringing up interface p_br-floating-0: [ OK ]
Bringing up interface p_br-prv-0: [ OK ]
Bringing up interface bond0.387: [ OK ]
Bringing up interface bond1.388: [ OK ]
Bringing up interface eth1.386: [ OK ]
Bringing up interface br-aux: post-up sleep up to 45.
                                                           [ OK ]
……………………………
……………………………

Reboot node and check how does ifup take time and how fast we get connectivity

[root@node-1 /]# cat /root/bin/b_chk
#!/bin/bash
BND=${1:-bond0}
ip=${2:-172.16.38.71}
echo $BND $ip
# cat /proc/net/bonding/$BND
ifdown $BND
sleep 10
date
echo "Start $BND"
time ifup $BND
echo "up is done "
time until ping -q -c 1 $ip -w 1 >/dev/null
do
    echo -n "+"
    sleep 1
done

[root@node-1 ~]# b_chk br-aux 172.16.38.71
br-aux 172.16.38.71
Wed May 20 11:39:43 UTC 2015
Start br-aux
post-up sleep up to 45.

real 0m45.154s
user 0m0.066s
sys 0m0.068s
up is done

real 0m0.002s
user 0m0.000s
sys 0m0.000s
[root@node-1 ~]# b_chk bond1 10.30.6.2
bond1 10.30.6.2
Wed May 20 11:40:54 UTC 2015
Start bond1
post-up sleep up to 15.

real 0m15.564s
user 0m0.103s
sys 0m0.307s
up is done

real 0m0.001s
user 0m0.000s
sys 0m0.000s
[root@node-1 ~]#

Revision history for this message
Peter Zhurba (pzhurba) wrote :

root@node-1:~# grep sleep /etc/network/interfaces.d/ifcfg-*
/etc/network/interfaces.d/ifcfg-bond1:post-up sleep 15
/etc/network/interfaces.d/ifcfg-br-aux:post-up sleep 45
root@node-1:~#

root@node-1:~# ./b_chk br-aux
br-aux 172.16.38.71
Ubuntu seems working

Thu May 21 16:41:48 UTC 2015
Start br-aux

Waiting for br-aux to get ready (MAXWAIT is 32 seconds).

real 0m45.386s
user 0m0.018s
sys 0m0.146s
up is done

real 0m0.002s
user 0m0.000s
sys 0m0.001s
root@node-1:~#

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/184176
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=114b470f4a31cc98b4724d0b0fc0a7b334b6f545
Submitter: Jenkins
Branch: master

commit 114b470f4a31cc98b4724d0b0fc0a7b334b6f545
Author: Sergey Vasilenko <email address hidden>
Date: Mon May 18 20:09:41 2015 -0700

    Specify default delay while boot for LACP bonds.

    Hardcode 45s delay for LACP bonds and 15s for non-LACP. Only at node boot time.
    System administrator can re-define this value by CLI

    This patchset is a workaround and should be reverted in 7.0
    after implementing this feature in UI

    Change-Id: I329e26a0b4da1b2be676dd7f8e6eb39e89eb11f4
    Related-bug: #1441435
    Related-bug: #1456436

Revision history for this message
Sergey Vasilenko (xenolog) wrote :
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Invalid for 6.0-updates per comment #3

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.