Bond failures with QLogic NICs

Bug #1516098 reported by Andrey Grebennikov
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Library (Deprecated)

Bug Description

Fuel 7.0, Ubuntu, Neutron+GRE, bond+LACP

I deployed the environment on Dell R610 with q-logic combined lan+san adapters.
The adapters are qle8152 and the switch is Cisco Nexus 5596 with 2232 FEX.

We got a lot of issues during the installation when the bond didn't come up correctly.

When the deployment was finished, some nodes didn't come up with healthy bond interface after the reboot showing problems with slave interfaces.

The Ubuntu related bug was found:
https://bugs.launchpad.net/ubuntu/+source/ifenslave-2.6/+bug/1415302

The workaround was applied:
bond-downdelay 200
bond-updelay 200

After that all the servers could be successsfully rebooted several times, as well as when those options are removed - nodes started to experience the same problems.

tags: added: customer-found
description: updated
Revision history for this message
Artem Roma (aroma-x) wrote :

Could you, please, provide diagnostic snapshot from failed env?

Changed in fuel:
status: New → Incomplete
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Note, the issue looks Triaged, but +1 for logs

Changed in fuel:
milestone: none → 7.0-updates
tags: added: area-library l23network
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
Revision history for this message
Alexey Shtokolov (ashtokolov) wrote :

Andrey Grebennikov, could you provide diagnostic snapshot from failed env?

Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

No, no way anymore - the is a customer's environment...
We can try to request the diag snapshot from the customer directly though.

Revision history for this message
Ivan Ponomarev (ivanzipfer) wrote :

Please reopen when snapshot will be added

Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
James Hayner (jhayner) wrote :
Download full text (3.6 KiB)

This bug report was opened for our environment. We are experiencing similar issues while deploying a second environment in another datacenter. Same network topology, server hardware, firmware versions etcetera. Deployment failed this time on the first controller. The last entry in the puppet-apply logs show:

2016-01-25T22:18:11.642695+00:00 debug: (L3_ifconfig[br-storage](provider=lnx)) CREATE resource: L3_ifconfig[br-storage]
2016-01-25T22:18:11.643067+00:00 notice: (/Stage[main]/Main/L23network::L3::Ifconfig[br-storage]/L3_ifconfig[br-storage]/ensure) created
2016-01-25T22:18:11.643067+00:00 debug: (L3_ifconfig[br-storage](provider=lnx)) FLUSH properties: {:interface=>"br-storage", :ensure=>:present, :ipaddr=>["10.186.170.17/24"], :before=>[Sysfs_config_value[rps_cpus]{:name=>"rps_cpus"}, Sysfs_config_value[xps_cpus]{:name=>"xps_cpus"}], :gateway=>:absent, :gateway_metric=>:absent, :use_ovs=>:true, :loglevel=>:notice}
2016-01-25T22:18:11.643847+00:00 debug: Executing '/usr/bin/arping -D -c 32 -w 5 -I br-storage 10.186.170.17'
2016-01-25T22:18:17.756596+00:00 debug: Executing '/sbin/ip addr add 10.186.170.17/24 dev br-storage'
2016-01-25T22:18:17.868534+00:00 debug: Executing '/sbin/ip -o addr show dev br-storage to 10.186.170.17/32'
2016-01-25T22:18:17.979946+00:00 debug: Executing '/usr/bin/arping -U -c 32 -w 5 -I br-storage 10.186.170.17'
2016-01-25T22:18:24.093018+00:00 debug: Executing '/sbin/ip route del default dev br-storage'
2016-01-25T22:18:24.203333+00:00 debug: (/Stage[main]/Main/L23network::L3::Ifconfig[br-storage]/L3_ifconfig[br-storage]) The container L23network::L3::Ifconfig[br-storage] will propagate my refresh event
2016-01-25T22:18:24.203606+00:00 info: (/Stage[main]/Main/L23network::L3::Ifconfig[br-storage]/L3_ifconfig[br-storage]) Evaluated in 12.56 seconds
2016-01-25T22:18:24.204496+00:00 info: (L23network::L3::Ifconfig[br-storage]) Starting to evaluate the resource
2016-01-25T22:18:24.206711+00:00 debug: (L23network::L3::Ifconfig[br-storage]) The container Class[Main] will propagate my refresh event
2016-01-25T22:18:24.206932+00:00 info: (L23network::L3::Ifconfig[br-storage]) Evaluated in 0.00 seconds

the Last entries on the node's /var/log/messages show:

<6>Jan 25 22:17:24 phx-s0701-d kernel: [ 699.734929] gre: GRE over IPv4 demultiplexor driver
<5>Jan 25 22:17:24 phx-s0701-d kernel: [ 699.736228] openvswitch: module verification failed: signature and/or required key missing - tainting kernel
<6>Jan 25 22:17:24 phx-s0701-d kernel: [ 699.742622] openvswitch: Open vSwitch switching datapath 2.3.1, built Jan 25 2016 22:16:58
<6>Jan 25 22:17:26 phx-s0701-d kernel: [ 701.223522] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
<5>Jan 25 22:17:26 phx-s0701-d kernel: [ 701.309860] Bridge firewalling registered
<6>Jan 25 22:17:26 phx-s0701-d kernel: [ 701.367139] 8021q: 802.1Q VLAN Support v1.8
<6>Jan 25 22:17:26 phx-s0701-d kernel: [ 701.487085] device eth0 entered promiscuous mode
<6>Jan 25 22:17:26 phx-s0701-d kernel: [ 701.491292] br-fw-admin: port 1(eth0) entered forwarding state
<6>Jan 25 22:17:26 phx-s0701-d kernel: [ 701.491311] br-fw-admin: port 1(eth0) entered forwardin...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.