Deployment fails if network bonding for admin network is configured and offloading settings are changed

Bug #1543242 reported by Artem Panchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Aleksey Kasatkin
8.0.x
Won't Fix
High
Fuel Python (Deprecated)
Mitaka
Invalid
High
Aleksey Kasatkin

Bug Description

System test 'offloading_bond_neutron_vlan' fails on deployment step, because after applying 'netconfig.pp' task slave loses connectivity to admin/pxe network:

2016-02-08 01:28:00 ERROR [831] Task '{"priority"=>700, "type"=>"puppet", "id"=>"netconfig", "parameters"=>{"puppet_modules"=>"/etc/puppet/modules", "puppet_manifest"=>"/etc/puppet/modules/osnailyfacter/modular/netconfig/netconfig.pp", "timeout"=>3600, "cwd"=>"/"}, "uids"=>["2"]}' failed on node 2
2016-02-08 01:28:00 DEBUG [831] Task time summary: netconfig with status error on node 2 took 00:09:04
2016-02-08 01:28:00 INFO [831] Casting message to Nailgun:{"method"=>"deploy_resp", "args"=> {"task_uuid"=>"0915d344-b679-46fc-ab93-a24d2f6149d8", "nodes"=> [{"uid"=>"2", "status"=>"error", "error_type"=>"deploy", "role"=>"primary-controller", "task"=> {"priority"=>700, "type"=>"puppet", "id"=>"netconfig", "parameters"=> {"puppet_modules"=>"/etc/puppet/modules", "puppet_manifest"=> "/etc/puppet/modules/osnailyfacter/modular/netconfig/netconfig.pp", "timeout"=>3600, "cwd"=>"/"}, "uids"=>["2"]}}]}}
2016-02-08 01:28:00 INFO [831] Casting message to Nailgun:{"method"=>"deploy_resp", "args"=> {"task_uuid"=>"0915d344-b679-46fc-ab93-a24d2f6149d8", "status"=>"error", "error"=> "Method granular_deploy. Deployment failed on nodes 2. Inspect Astute logs for the details"}}

The last transformation from network scheme misses 'bridge' parameter, so bond1 is created, but not added to 'br-fw-admin':

  - action: add-bond
    interfaces:
    - enp0s3
    - enp0s4
    bond_properties:
      mode: active-backup
    name: bond1
    interface_properties:
      ethtool:
        offload:
          rx-all: true
          rx-vlan-offload: true

Steps to reproduce:
            1. Create cluster with neutron VLAN
            2. Add 1 node with controller role
            3. Add 1 node with compute role and 1 node with cinder role
            4. Configure offloading modes for bonded interfaces
            5. Setup offloading types
            6. Run network verification
            7. Deploy the cluster

Expected result: deployment is successful
Actual result: deployment fails

I guess the problem is related to incorrect network-to-interface assignments. Here is result of `fuel node --node-id 2 --network --download` command:

http://paste.openstack.org/show/486320/

As you can see' fuelweb_admin' network is assigned to 2 interfaces (bond1 and enp0s3)

But in tests we assign networks correctly, here is a corresponding part of Nailgun logs:

http://paste.openstack.org/show/486319/

Other tests which also configure bond for admin network passed on the same iso and fuel-qa code, so I guess it could be also related to offloading settings.

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
importance: Undecided → High
Ilya Kutukov (ikutukov)
Changed in fuel:
status: New → Confirmed
tags: added: feature-bonding
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

JFYI today that test passed on 8.0 iso #541. I believe this bug is floating.

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Sergey Vasilenko (xenolog)
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :

Reproduced at https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.bonding_ha_one_controller/143/console

There is no bond1 in adminb bridge as seen on screenshot.

On other nodes it is:
2016-02-17 04:05:27 +0000 Scope(Class[main]) (debug): generate_network_config(): Transformation 'bond1' will be produced as
---
  name: bond1
  bridge: br-fw-admin
  mtu:
  interfaces:
    - enp0s3
    - enp0s4
  delay_while_up:
  bond_properties:
    mode: active-backup
  interface_properties:
    ethtool:
      offload:
        rx-all: true
        rx-vlan-offload: true
  vendor_specific:
  provider: lnx
  action: add-bond

Revision history for this message
Sergey Vasilenko (xenolog) wrote :

This bug is related to network topology.
I describe a problem into the https://docs.google.com/document/d/1eMAYn9xc4wvhOUJYmZMZiog9XRilG6z7QXpcM0PMsNE/ document.

Situations. like this, may be happens and on bare metal to..

Changed in fuel:
status: Confirmed → Won't Fix
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

Sorry, previous commit from another bug. Please forget it.

I can't reproduce this bug and mark it as Invalid.

Changed in fuel:
status: Won't Fix → Invalid
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

@Sergey,

please check comment #4. This issue is reproduced on CI from time to time.

Since this bug is cause by incorrect Nailgun behavior, assigning it to fuel-python

Changed in fuel:
status: Invalid → Confirmed
assignee: Sergey Vasilenko (xenolog) → Fuel Python Team (fuel-python)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Dmitry Guryanov (dguryanov)
Dmitry Pyzhov (dpyzhov)
tags: added: team-network
Changed in fuel:
assignee: Dmitry Guryanov (dguryanov) → Aleksey Kasatkin (alekseyk-ru)
Changed in fuel:
milestone: 9.0 → 10.0
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

Currently this bug is not reproduced on tests swarm, "offloading_bond_neutron_vlan" test is green more than a week. I believe this issue was fixed by set of changes in nailgun related to other bugs/features in 9.0. Moving to incomplete.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

Cannot reproduce this using unit tests either.

Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

No reports, reproduces for about 3 weeks. Moving to invalid.

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.