Compute re-deployment with network bonding and DPDK fails: Puppet (err): Can't add bond 'bond0'
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Aleksey Kasatkin | ||
Mitaka |
Fix Released
|
High
|
Aleksey Kasatkin |
Bug Description
Fuel version info (9.0 build #303): http://
When I try to re-deploy environment after failure (due to bug #1571763) it fails because compute node with network bonds goes down:
"Deployment has failed. All nodes are finished. Failed tasks: Task[firewall/9] Stopping the deployment process!"
2016-05-11 09:11:48 WARNING [27942] Puppet agent 9 didn't respond within the allotted time
...
2016-05-11 09:24:31 WARNING [27942] Validation of node:
{"uid"=>nil,
"status"=>"error",
"error_
"error_msg"=>
"All nodes are finished. Failed tasks: Task[firewall/9] Stopping the deployment process!"}
for report failed: Node uid is not provided
Steps to reproduce:
0. Enable 'experimental' feature group for nailgun
1. Create cluster with VLAN and KVM
2. Add 1 controller and 1 compute node
3. Enable HugePages (256MB) for DPDK on compute node
4. Configure active-backup bond on compute using 2 NICs
5. Assign 'private' network to the bond
6. Enable DPDK for the bond
7. Verify networks
8. Deploy changes
9. Deployment should fail on non-primary controllers after netconfig on computes is done
10. Try to re-deploy cluster w/o reset (deploy changes again)
11. Deployment fails with the same error on controllers, but compute node also has erros in puppet.log http://
12. Reset environment
13. Run network verification
14. Deploy environment
Expected result:
deployment is done or fails with error on controller nodes
Actual:
deployment fails on compute node with DPDK for bond and the node becomes inaccessible via network
Diagnostic snapshot: https:/
Changed in fuel: | |
status: | New → Confirmed |
tags: | added: area-python |
Changed in fuel: | |
assignee: | nobody → Networking (l23-network) |
tags: | added: on-verification |
Looks like that astute.yaml contains neither DPDK interfaces, nor interfaces in bond. This is why deployment fails. Diag. snapshot/ node-9/ etc/astute. yaml:
interfaces: specific: {bus_info: '0000:03:00.0', driver: igb} specific: {bus_info: '0000:0a:00.0', driver: igb} specific: {bus_info: '0000:0a:00.1', driver: igb} specific: {bus_info: '0000:03:00.1', driver: igb} properties: {mode: active-backup} properties: specific: {disable_ offloading: true}
eno1:
vendor_
enp10s0f0:
vendor_
enp10s0f1:
vendor_
ens3f1:
vendor_
...
- action: add-bond
bond_
bridge: br-prv
interface_
vendor_
interfaces: []
name: bond0
provider: dpdkovs
Database dump shows that no interfaces assigned to bond too.