Deployment task 'netconfig' incorrectly configures SR-IOV nics on second run

Bug #1558427 reported by Artem Panchenko on 2016-03-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Vladimir Eremin

Bug Description

Fuel version info (9.0 liberty): http://paste.openstack.org/show/490838/

Re-deployment fails on 'sriov_iommu_check' task, because 'netconfig' doesn't configure NICs with enabled SR-IOV properly:

root@node-2:~# echo 63 > /sys/class/net/enp1s0f0/device/sriov_numvfs
root@node-2:~# echo 63 > /sys/class/net/enp1s0f1/device/sriov_numvfs
root@node-2:~# ruby /etc/puppet/modules/osnailyfacter/modular/netconfig/sriov_iommu_check.rb
OK: SR-IOV and IOMMU are properly configured for enp1s0f0 interface
OK: SR-IOV and IOMMU are properly configured for enp1s0f1 interface
root@node-2:~#
root@node-2:~# puppet apply -d /etc/puppet/modules/osnailyfacter/modular/netconfig/netconfig.pp &> /tmp/puppet.log
root@node-2:~# echo $?
0
root@node-2:~# cat /sys/class/net/enp1s0f0/device/sriov_numvfs /sys/class/net/enp1s0f1/device/sriov_numvfs
0
0
root@node-2:~# ifdown enp1s0f1
root@node-2:~# ifup enp1s0f1
root@node-2:~# cat /sys/class/net/enp1s0f1/device/sriov_numvfs
63

I added some debug logs to '/etc/puppet/modules/l23network/lib/puppet/provider/l2_port/sriov.rb' and got this:

root@node-2:~# grep -E 'Setting numvfs for|Value of numvfs for' /tmp/puppet.log
Debug: L2_port[enp1s0f0](provider=sriov): Value of numvfs for 'enp1s0f0' is different '63' != '0'
Debug: L2_port[enp1s0f0](provider=sriov): Setting numvfs for 'enp1s0f0' to '0'
Debug: L2_port[enp1s0f0](provider=sriov): Setting numvfs for 'enp1s0f0' to '0'
Debug: L2_port[enp1s0f1](provider=sriov): Value of numvfs for 'enp1s0f1' is different '63' != '0'
Debug: L2_port[enp1s0f1](provider=sriov): Setting numvfs for 'enp1s0f1' to '0'
Debug: L2_port[enp1s0f1](provider=sriov): Setting numvfs for 'enp1s0f1' to '0'

If I set 'sriov_numvfs' to 0 for all SR-IOV NICs and run netconfig.pp again then VFs are configured properly, because the following code isn't executed:

https://github.com/openstack/fuel-library/blob/master/deployment/puppet/l23network/lib/puppet/provider/l2_port/sriov.rb#L23-L24
https://github.com/openstack/fuel-library/blob/master/deployment/puppet/l23network/lib/puppet/provider/l2_port/sriov.rb#L28-L30

Steps to reproduce:

1. Create cluster with VLAN segmentation
2. Add 1 controller node and 1 compute node with NICs which support SR-IOV
3. Enable SR-IOV on some compute's NICs, set VFs number to max value (sriov_numvfs == sriov_totalvfs)
4. Deploy environment
5. SSH to compute and run 'puppet apply /etc/puppet/modules/osnailyfacter/modular/netconfig/netconfig.pp'

Expected result: sriov_numvfs value for SR-IOV enabled NICs isn't changed

Actual result: sriov_numvfs is set to 0 for all SR-IOV enabled NICs

Diagnostic snapshot: https://drive.google.com/file/d/0BzaZINLQ8-xkUEo2Ym5YOWJwdDQ/view?usp=sharing

Artem Panchenko (apanchenko-8) wrote :

Also looks like this issue affect configure_default_route.pp task too:

2016-03-16 18:58:26 +0000 Scope(Class[main]) (notice): MODULAR: netconfig.pp
2016-03-16 18:58:27 +0000 /Stage[main]/Main/L23network::L2::Port[enp1s0f0]/L23_stored_config[enp1s0f0]/sriov_numvfs (notice): sriov_numvfs changed '63' to '63'
2016-03-16 18:58:29 +0000 /Stage[main]/Main/L23network::L2::Port[enp1s0f1]/L23_stored_config[enp1s0f1]/sriov_numvfs (notice): sriov_numvfs changed '63' to '63'
2016-03-16 18:59:08 +0000 Scope(Class[main]) (notice): MODULAR: connectivity_tests.pp
2016-03-16 18:59:15 +0000 Scope(Class[main]) (notice): MODULAR: sriov_iommu_check.pp
2016-03-16 18:59:21 +0000 Scope(Class[main]) (notice): MODULAR: firewall.pp
2016-03-16 18:59:32 +0000 Scope(Class[main]) (notice): MODULAR: hosts.pp
2016-03-16 19:11:13 +0000 Scope(Class[main]) (notice): MODULAR: compute.pp
2016-03-16 19:13:58 +0000 Scope(Class[main]) (notice): MODULAR: openstack-network/common-config.pp
2016-03-16 19:14:34 +0000 Scope(Class[main]) (notice): MODULAR: openstack-network/plugins/ml2.pp
2016-03-16 19:14:44 +0000 Scope(Class[main]) (notice): MODULAR: openstack-network/agents/l3.pp
2016-03-16 19:14:50 +0000 Scope(Class[main]) (notice): MODULAR: openstack-network/agents/sriov.pp
2016-03-16 19:15:06 +0000 Scope(Class[main]) (notice): MODULAR: openstack-network/agents/metadata.pp
2016-03-16 19:15:13 +0000 Scope(Class[main]) (notice): MODULAR: openstack-network/compute-nova.pp
2016-03-16 19:15:22 +0000 Scope(Class[main]) (notice): MODULAR: enable_compute.pp
2016-03-16 19:21:28 +0000 Scope(Class[main]) (notice): MODULAR: dns-client.pp
2016-03-16 19:21:36 +0000 Scope(Class[main]) (notice): MODULAR: cgroups.pp
2016-03-16 19:21:49 +0000 Scope(Class[main]) (notice): MODULAR: configure_default_route.pp
2016-03-16 19:21:51 +0000 /Stage[main]/Main/L23network::L2::Port[enp1s0f0]/L23_stored_config[enp1s0f0]/sriov_numvfs (notice): sriov_numvfs changed '63' to '63'
2016-03-16 19:21:54 +0000 /Stage[main]/Main/L23network::L2::Port[enp1s0f1]/L23_stored_config[enp1s0f1]/sriov_numvfs (notice): sriov_numvfs changed '63' to '63'
2016-03-16 19:22:00 +0000 Scope(Class[main]) (notice): MODULAR: hosts.pp
2016-03-16 19:22:02 +0000 Scope(Class[main]) (notice): MODULAR: ntp-client.pp

So currently after successful deployment I have 'sriov_numvfs' set to 0 on all compute nodes with SR-IOV enabled NICs.

Changed in fuel:
importance: Undecided → High
tags: added: l23network
Changed in fuel:
status: New → Confirmed
description: updated
Vladimir Eremin (yottatsa) wrote :

I'm not sure it's duplicate for https://bugs.launchpad.net/fuel/+bug/1557322

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Vladimir Eremin (yottatsa)
Vladimir Eremin (yottatsa) wrote :

This is failed because vendor_specific is empty after re-apply (no changes)

Debug: L2_port[enp1s0f0](provider=sriov): FLUSH properties: L2_port[enp1s0f0] {:vendor_specific=>{}}

And there https://github.com/openstack/fuel-library/blob/master/deployment/puppet/l23network/lib/puppet/provider/l2_port/sriov.rb#L75 it's converted to 0.

Fix proposed to branch: master
Review: https://review.openstack.org/293961

Changed in fuel:
status: Confirmed → In Progress
tags: added: area-library

Reviewed: https://review.openstack.org/293961
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=7f9bebc073c01c8cefcd9483024905c33b535c0b
Submitter: Jenkins
Branch: master

commit 7f9bebc073c01c8cefcd9483024905c33b535c0b
Author: Vladimir Eremin <email address hidden>
Date: Thu Mar 17 14:06:57 2016 +0300

    Fix SR-IOV re-apply

    * vendor_specific update fixed
    * check new sriov_numvfs for nil fixed
    * prefetch fixed
    * also, conventional "FLUSH properties" debug added

    Change-Id: I8ebbbff278e448ee20fad7826bf5b1f7c11ea610
    Closes-Bug: #1558427

Changed in fuel:
status: In Progress → Fix Committed
Artem Panchenko (apanchenko-8) wrote :

verified

cat /etc/fuel_build_id:
 303
cat /etc/fuel_build_number:
 303
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6344.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8680.noarch
 fuel-mirror-9.0.0-1.mos133.noarch
 shotgun-9.0.0-1.mos88.noarch
 fuel-openstack-metadata-9.0.0-1.mos8680.noarch
 fuel-notify-9.0.0-1.mos8337.noarch
 fuel-ostf-9.0.0-1.mos933.noarch
 python-fuelclient-9.0.0-1.mos313.noarch
 fuel-9.0.0-1.mos6344.noarch
 fuel-utils-9.0.0-1.mos8337.noarch
 fuel-nailgun-9.0.0-1.mos8680.noarch
 rubygem-astute-9.0.0-1.mos742.noarch
 fuel-library9.0-9.0.0-1.mos8337.noarch
 network-checker-9.0.0-1.mos72.x86_64
 fuel-agent-9.0.0-1.mos276.noarch
 fuel-ui-9.0.0-1.mos2676.noarch
 fuel-setup-9.0.0-1.mos6344.noarch
 nailgun-mcagents-9.0.0-1.mos742.noarch
 fuel-misc-9.0.0-1.mos8337.noarch
 python-packetary-9.0.0-1.mos133.noarch
 fuelmenu-9.0.0-1.mos269.noarch
 fuel-bootstrap-cli-9.0.0-1.mos276.noarch
 fuel-migrate-9.0.0-1.mos8337.noarch

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers