Bug #1510072 “Network interfaces are down after cluster restart ...” : Series 7.0.x : Bugs : Fuel for OpenStack

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2015-10-26:

#1

fail_error_deploy_bonding_neutron_tun-fuel-snapshot-2015-10-25_23-48-32.tar.xz Edit (74.4 MiB, application/octet-stream)

Nastya Urlapova (aurlapova) on 2015-10-27

tags:

added: swarm-blocker

Dmitry Pyzhov (dpyzhov) on 2015-10-27

tags:

added: area-library

Revision history for this message

Sergey Vasilenko (xenolog) wrote on 2015-10-28:

#2

this behavior depends of physical network topology.

Do you sure both networks (bridges on the host system), which handles traffic both bonded interfaces, mixed to one bridge before pass traffic to the master node?

Revision history for this message

Sergey Vasilenko (xenolog) wrote on 2015-10-28:

#3

PLease provide full physical network topology.

Changed in fuel:
status:	New → Incomplete

Revision history for this message

Dmitry Klenov (dklenov) wrote on 2015-11-05:

#4

@Andrey, this issue is still considered a swarm blocker - so please provide all the details needed by Sergey.

Revision history for this message

Artem Panchenko (apanchenko-8) wrote on 2015-11-10:

#5

@Sergey, doesn't matter what network architecture you have, after reboot non-controller nodes (which doesn't have access) lose their network configuration due to broken config files. This bug is a regression caused by https://review.openstack.org/#/c/232479/5 , configure_default_route.pp saves configs for admin and management bridges in OVS format, because there 2 default providers for l23_stored_config now:

2015-11-10 03:18:34 +0000 Scope(Class[main]) (notice): MODULAR: configure_default_route.pp
2015-11-10 03:18:35 +0000 Puppet (warning): Found multiple default providers for l23_stored_config: ovs_ubuntu, lnx_ubuntu; using ovs_ubuntu
2015-11-10 03:18:36 +0000 Puppet (debug): Prefetching ovs_ubuntu resources for l23_stored_config
2015-11-10 03:18:36 +0000 Puppet::Type::L23_stored_config::ProviderOvs_ubuntu (debug): format_file('/etc/network/interfaces.d/ifcfg-br-fw-admin')::properties: {:ovs_type=>"OVSIntPort", :bridge=>:absent, :ipaddr=>"10.109.10.6/24", :bond_slaves=>[:absent]}
2015-11-10 03:18:36 +0000 Puppet::Type::L23_stored_config::ProviderOvs_ubuntu (debug): format_file('/etc/network/interfaces.d/ifcfg-br-fw-admin')::content: ["auto br-fw-admin", "allow-absent br-fw-admin", "iface br-fw-admin inet static", "address 10.109.10.6/24", "ovs_type OVSIntPort"]
2015-11-10 03:18:37 +0000 Puppet::Type::L23_stored_config::ProviderOvs_ubuntu (debug): format_file('/etc/network/interfaces.d/ifcfg-br-mgmt')::properties: {:ovs_type=>"OVSIntPort", :bridge=>:absent, :ipaddr=>"192.168.0.6/24", :gateway=>"192.168.0.1", :bond_slaves=>[:absent]}
2015-11-10 03:18:37 +0000 Puppet::Type::L23_stored_config::ProviderOvs_ubuntu (debug): format_file('/etc/network/interfaces.d/ifcfg-br-mgmt')::content: ["auto br-mgmt", "allow-absent br-mgmt", "iface br-mgmt inet static", "address 192.168.0.6/24", "gateway 192.168.0.1", "ovs_type OVSIntPort"]

Looks like we need to set provider type explicitly while configuring default route.

BTW, commenting of this line also solves the problem:

https://github.com/openstack/fuel-library/blob/master/deployment/puppet/l23network/lib/puppet/provider/l23_stored_config/ovs_ubuntu.rb#L9

@Sergey, doesn't matter what network architecture you have, after reboot non-controller nodes (which doesn't have access) lose their network configuration due to broken config files. This bug is a regression caused by https://review.openstack.org/#/c/232479/5 , configure_default_route.pp saves configs for admin and management bridges in OVS format, because there 2 default providers for l23_stored_config now:

2015-11-10 03:18:34 +0000 Scope(Class[main]) (notice): MODULAR: configure_default_route.pp
2015-11-10 03:18:35 +0000 Puppet (warning): Found multiple default providers for l23_stored_config: ovs_ubuntu, lnx_ubuntu; using ovs_ubuntu
2015-11-10 03:18:36 +0000 Puppet (debug): Prefetching ovs_ubuntu resources for l23_stored_config
2015-11-10 03:18:36 +0000 Puppet::Type::L23_stored_config::ProviderOvs_ubuntu (debug): format_file('/etc/network/interfaces.d/ifcfg-br-fw-admin')::properties: {:ovs_type=>"OVSIntPort", :bridge=>:absent, :ipaddr=>"10.109.10.6/24", :bond_slaves=>[:absent]}
2015-11-10 03:18:36 +0000 Puppet::Type::L23_stored_config::ProviderOvs_ubuntu (debug): format_file('/etc/network/interfaces.d/ifcfg-br-fw-admin')::content: ["auto br-fw-admin", "allow-absent br-fw-admin", "iface br-fw-admin inet static", "address 10.109.10.6/24", "ovs_type OVSIntPort"]
2015-11-10 03:18:37 +0000 Puppet::Type::L23_stored_config::ProviderOvs_ubuntu (debug): format_file('/etc/network/interfaces.d/ifcfg-br-mgmt')::properties: {:ovs_type=>"OVSIntPort", :bridge=>:absent, :ipaddr=>"192.168.0.6/24", :gateway=>"192.168.0.1", :bond_slaves=>[:absent]}
2015-11-10 03:18:37 +0000 Puppet::Type::L23_stored_config::ProviderOvs_ubuntu (debug): format_file('/etc/network/interfaces.d/ifcfg-br-mgmt')::content: ["auto br-mgmt", "allow-absent br-mgmt", "iface br-mgmt inet static", "address 192.168.0.6/24", "gateway 192.168.0.1", "ovs_type OVSIntPort"]

Looks like we need to set provider type explicitly while configuring default route.

BTW, commenting of this line also solves the problem:

https://github.com/openstack/fuel-library/blob/master/deployment/puppet/l23network/lib/puppet/provider/l23_stored_config/ovs_ubuntu.rb#L9

Changed in fuel:
status:	Incomplete → Triaged
summary:	- Network is unreachable for nodes that are routed through the master node - after cluster restart + Network interfaces are down after cluster restart on nodes that aren't + connected to public network

Sergey Vasilenko (xenolog) on 2015-11-10

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Stanislav Makar (smakar)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-11: Fix proposed to fuel-library (master)

#6

Fix proposed to branch: master
Review: https://review.openstack.org/244017

Changed in fuel:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-11: Fix merged to fuel-library (master)

#7

Reviewed: https://review.openstack.org/244017
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f93b87f54d673e7229970c190626086a6cbdb721
Submitter: Jenkins
Branch: master

commit f93b87f54d673e7229970c190626086a6cbdb721
Author: Stanislav Makar <email address hidden>
Date: Wed Nov 11 08:19:59 2015 +0000

Fix the problem with regression after reboot

*Add new if_type vport
*Test coverage

Change-Id: I65cbbad1c35a34dac86b7331a04468fc0d060d83
Closes-bug: #1510072

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Artem Hrechanychenko (agrechanichenko) wrote on 2015-11-13:

#8

Download full text (3.5 KiB)

8.0.system_test.ubuntu.ceph_ha_one_controller

release_versions:
  2015.1.0-8.0:
    VERSION:
      api: '1.0'
      astute_sha: 959b06c5ef8143125efd1727d350c050a922eb12
      build_id: '152'
      build_number: '152'
      feature_groups:
      - mirantis
      fuel-agent_sha: 07560a9fc3ce5301ace04d2d3e5d68db6ee4f8d5
      fuel-createmirror_sha: a034dcb06520df58a7338816900a431a6b61d83f
      fuel-library_sha: 31f6ae4ced72927287b513e9c4e3a24d367e7736
      fuel-nailgun-agent_sha: 3e9d17211d65c80bf97c8d83979979f6c7feb687
      fuel-nailgun_sha: e72e94138d159308e85a16c382e90b54c7bc7c79
      fuel-ostf_sha: f169d495691ea3d40d3d6d0278265698d3f6ed14
      fuel-upgrade_sha: 1e894e26d4e1423a9b0d66abd6a79505f4175ff6
      fuelmain_sha: b5eb33ca7147dfda7a943a7f8f58c28e86d63992
      fuelmenu_sha: 8a32c53c1fa13b036000f589f96e876277dbd071
      network-checker_sha: a57e1d69acb5e765eb22cab0251c589cd76f51da
      openstack_version: 2015.1.0-8.0
      production: docker
      python-fuelclient_sha: e685d68c1c0d0fa0491a250f07d9c3a8d0f9608c
      release: '8.0'
      shotgun_sha: 25dd78a3118267e3616df0727ce746e7dead2d67
shotgun_sha: 25dd78a3118267e3616df0727ce746e7dead2d67

Scenario:
            1. Create cluster in Ha mode with 1 controller
            2. Add 1 node with controller role
            3. Add 1 node with compute and Ceph OSD roles
            4. Add 1 node with Ceph OSD role
            5. Deploy the cluster
            6. Check Ceph status
            7. Read current partitions
            8. Warm-reboot Ceph nodes
            9. Read partitions again
            10. Check Ceph health
            11. Cold-reboot Ceph nodes
            12. Read partitions again
            13. Check Ceph health

======================================================================
FAIL: Check that Ceph OSD partitions are remounted after reboot
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/helpers/decorators.py", line 80, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/tests/test_ceph.py", line 879, in check_ceph_partitions_after_reboot
    [self.fuel_web.environment.d_env.get_node(name=node)])
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/models/fuel_web_client.py", line 1593, in warm_restart_nodes
    self.warm_start_nodes(devops_nodes)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/models/fuel_web_client.py", line 1586, in warm_sta...

8.0.system_test.ubuntu.ceph_ha_one_controller

release_versions:
  2015.1.0-8.0:
    VERSION:
      api: '1.0'
      astute_sha: 959b06c5ef8143125efd1727d350c050a922eb12
      build_id: '152'
      build_number: '152'
      feature_groups:
      - mirantis
      fuel-agent_sha: 07560a9fc3ce5301ace04d2d3e5d68db6ee4f8d5
      fuel-createmirror_sha: a034dcb06520df58a7338816900a431a6b61d83f
      fuel-library_sha: 31f6ae4ced72927287b513e9c4e3a24d367e7736
      fuel-nailgun-agent_sha: 3e9d17211d65c80bf97c8d83979979f6c7feb687
      fuel-nailgun_sha: e72e94138d159308e85a16c382e90b54c7bc7c79
      fuel-ostf_sha: f169d495691ea3d40d3d6d0278265698d3f6ed14
      fuel-upgrade_sha: 1e894e26d4e1423a9b0d66abd6a79505f4175ff6
      fuelmain_sha: b5eb33ca7147dfda7a943a7f8f58c28e86d63992
      fuelmenu_sha: 8a32c53c1fa13b036000f589f96e876277dbd071
      network-checker_sha: a57e1d69acb5e765eb22cab0251c589cd76f51da
      openstack_version: 2015.1.0-8.0
      production: docker
      python-fuelclient_sha: e685d68c1c0d0fa0491a250f07d9c3a8d0f9608c
      release: '8.0'
      shotgun_sha: 25dd78a3118267e3616df0727ce746e7dead2d67
shotgun_sha: 25dd78a3118267e3616df0727ce746e7dead2d67

Scenario:
            1. Create cluster in Ha mode with 1 controller
            2. Add 1 node with controller role
            3. Add 1 node with compute and Ceph OSD roles
            4. Add 1 node with Ceph OSD role
            5. Deploy the cluster
            6. Check Ceph status
            7. Read current partitions
            8. Warm-reboot Ceph nodes
            9. Read partitions again
            10. Check Ceph health
            11. Cold-reboot Ceph nodes
            12. Read partitions again
            13. Check Ceph health

======================================================================
FAIL: Check that Ceph OSD partitions are remounted after reboot
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/helpers/decorators.py", line 80, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/tests/test_ceph.py", line 879, in check_ceph_partitions_after_reboot
    [self.fuel_web.environment.d_env.get_node(name=node)])
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/models/fuel_web_client.py", line 1593, in warm_restart_nodes
    self.warm_start_nodes(devops_nodes)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.ceph_ha_one_controller/fuelweb_test/models/fuel_web_client.py", line 1586, in warm_start_nodes
    'after warm start'.format(node.name))
AssertionError: Node slave-02 has not become online after warm start

Node available only from storage network

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.109.24.0     *               255.255.255.0   U     0      0        0 br-storage

Changed in fuel:
status:	Fix Committed → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-17: Fix proposed to fuel-library (master)

#9

Fix proposed to branch: master
Review: https://review.openstack.org/246296

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-24: Fix merged to fuel-library (master)

#10

Reviewed: https://review.openstack.org/246296
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=4c36911a8175fa8f7572513fdc61efde3ad24ad3
Submitter: Jenkins
Branch: master

commit 4c36911a8175fa8f7572513fdc61efde3ad24ad3
Author: Stanislav Makar <email address hidden>
Date: Tue Nov 17 10:10:42 2015 +0000

Refactor function configure_default_route

    Before the function configure_default_route was a little part of function
    generate_network_config which changed default gateway only, due to the provider
    for interfaces was not picked correctly.
    Now if default route is needed to change we just modify network_scheme
    and call generate_network_config with this network_scheme, if no - do
    nothing.
    Leave only provider lnx as default for l23_stored_config.

Change-Id: I33e88550af5d5cce2886254444ee5d450e578a1c
Closes-bug: #1510072

Changed in fuel:
status:	In Progress → Fix Committed

Grigory Mikhailov (gmikhailov) on 2015-11-27

tags:

added: on-verification

Revision history for this message

Grigory Mikhailov (gmikhailov) wrote on 2015-12-14:

#11

Verified on ISO #247.
Environment created via dos.py.
Described bug is not observed.

VERSION:
  feature_groups: - mirantis
  production: "docker"
  release: "8.0"
  openstack_version: "2015.1.0-8.0"
  api: "1.0"
  build_number: "247"
  build_id: "247"
  fuel-nailgun_sha: "86cebc1d92c7cc9ca25b00f5590954a7c4f880a0"
  python-fuelclient_sha: "91474bd8c526f4f536ab13368feb4a5c1b84d185"
  fuel-agent_sha: "660c6514caa8f5fcd482f1cc4008a6028243e009"
  fuel-nailgun-agent_sha: "a33a58d378c117c0f509b0e7badc6f0910364154"
  astute_sha: "b60624ee2c5f1d6d805619b6c27965a973508da1"
  fuel-library_sha: "032c707ec800f11044b32733dd4d395e06c209d0"
  fuel-ostf_sha: "65de07b5dce50349e7bc414f364505483c34e2b1"
  fuel-mirror_sha: "bfe7af26b7e6fdd46a16480481cc757f67958177"
  fuelmenu_sha: "fcb15df4fd1a790b17dd78cf675c11c279040941"
  shotgun_sha: "a0bd06508067935f2ae9be2523ed0d1717b995ce"
  network-checker_sha: "a3534f8885246afb15609c54f91d3b23d599a5b1"
  fuel-upgrade_sha: "1e894e26d4e1423a9b0d66abd6a79505f4175ff6"
  fuelmain_sha: "fda7c87dea9fb54c08bd3844d277b2e4778924e4"

Changed in fuel:
status:	Fix Committed → Fix Released
tags:	removed: on-verification

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-13: Fix proposed to fuel-library (stable/7.0)

#12

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/267055

Revision history for this message

Anton Matveev (amatveev) wrote on 2016-01-14:

#13

sla1 for MOS 7.0

tags:

added: customer-found sla1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-19: Fix merged to fuel-library (stable/7.0)

#14

Reviewed: https://review.openstack.org/267055
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=3315017b92a0e92243932fbd24c4c1827e75ef36
Submitter: Jenkins
Branch: stable/7.0

commit 3315017b92a0e92243932fbd24c4c1827e75ef36
Author: Stanislav Makar <email address hidden>
Date: Wed Nov 11 08:19:59 2015 +0000

Fix the problem with regression after reboot

* Add new if_type vport
* Test coverage

Change-Id: I65cbbad1c35a34dac86b7331a04468fc0d060d83
Closes-bug: #1510072

Olena Logvinova (ologvinova) on 2016-02-01

tags:

added: 7.0-mu-2

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Stanislav Makar	Fuel for OpenStack 8.0
	7.0.x	Fix Committed	High	slava valyavskiy	Fuel for OpenStack 7.0-mu-2

Fuel for OpenStack

Network interfaces are down after cluster restart on nodes that aren't connected to public network

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches