Deploy is unsuccessful when SR-IOV is configured only on part of nodes

Bug #1561018 reported by Mikhail Chernik on 2016-03-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Critical
Sergey Kolekonov

Bug Description

Currently if SR-IOV feature is turned on at least one node, supported_pci_vendor_devs list is populated in quantum_settings of astute.yaml. This triggers SR-IOV configuration on all nodes, including those without SR-IOV enabled. As a result deployment fails with message "Deployment has failed. Critical nodes failed: Node[1]. Stopping the deployment process!"

Environment: ISO 97, hardware lab

Steps to reproduce:
* Create new cluster
* Add at least 1 controller and 2 computes to the cluster
* Enable SR-IOV on one compute
* Run deployment

Expected results:
* Cluster is deployed, SR-IOV in enabled on one compute and disabled on the other

Actual result:
* Deploy failed with following error in puppet log:

2016-03-23 13:42:53 ERR /usr/lib/ruby/vendor_ruby/puppet/parser/functions.rb:164:in `block (2 levels) in newfunction'
2016-03-23 13:42:53 ERR /etc/puppet/modules/osnailyfacter/lib/puppet/parser/functions/nic_whitelist_to_mappings.rb:13:in `block in <top (required)>'
2016-03-23 13:42:53 ERR undefined method `map' for "":String at /etc/puppet/modules/osnailyfacter/modular/openstack-network/agents/sriov.pp:15 on node node-2.domain.tld

Dmitry Klenov (dklenov) on 2016-03-23
tags: added: area-library
Changed in fuel:
importance: Undecided → High
status: New → Confirmed
Mikhail Chernik (mchernik) wrote :

Env is passed to developer for investigation

Fix proposed to branch: master
Review: https://review.openstack.org/296541

Changed in fuel:
status: Confirmed → In Progress
Changed in fuel:
assignee: Sergey Kolekonov (skolekonov) → Vladimir Eremin (yottatsa)
Changed in fuel:
assignee: Vladimir Eremin (yottatsa) → Sergey Kolekonov (skolekonov)
Changed in fuel:
assignee: Sergey Kolekonov (skolekonov) → Vladimir Eremin (yottatsa)
Changed in fuel:
assignee: Vladimir Eremin (yottatsa) → Sergey Kolekonov (skolekonov)
Dmitry Klenov (dklenov) wrote :

SR-IOV breaks whole cluster. Raising to critical.

Changed in fuel:
importance: High → Critical

Reviewed: https://review.openstack.org/296541
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=ed36c837e042d715827c607c831528caeef2c97b
Submitter: Jenkins
Branch: master

commit ed36c837e042d715827c607c831528caeef2c97b
Author: Sergey Kolekonov <email address hidden>
Date: Wed Mar 23 18:25:35 2016 +0300

    Fix the way SRIOV is checked on compute nodes

    supported_pci_vendor_devs can't be used as SRIOV indicator as it's a cluster
    wide variable and it's possible that some compute nodes uses SRIOV and others
    don't. nic_whitelist_to_mappings function is a better option as it relies on
    a network scheme of a current node

    Change-Id: Idd4f9b5e1bf142713b4d849e4406778ad411b3ac
    Closes-bug: #1561018

Changed in fuel:
status: In Progress → Fix Committed

Verify on iso #155 (04.04.16) during first acceptance testing

I created cluster, add 1 controller 1 compute node with SR-IOV and 2 compute nodes without Sr-IOV. Deployment finished with successful result

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers