Network configuration is broken after reboot if we are using ovs network provider

Bug #1495534 reported by slava valyavskiy on 2015-09-14
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Stanislav Makar
7.0.x
High
Stanislav Makar

Bug Description

The following network configuration has been built for cluster environment after the netconfig task:
http://pastebin.com/5RfN7RSW

But, after the reboot there is no connectivity over the management network. It has been found that rootcase in incorrect network scripts configuration.

1. For example, we have "p_acf4537e-1" port in management bridge, but there is no any info about it into 'interfaces.d' folder.

    Bridge br-mgmt
        Port br-mgmt
            Interface br-mgmt
                type: internal
        Port "p_acf4537e-1"
            Interface "p_acf4537e-1"
                type: patch
                options: {peer="p_acf4537e-0"}

root@node-1:~# grep -r p_acf4537e-1 /etc/network/interfaces.d/
root@node-1:~#

2. We have port "p_acf4537e-0" in br-fw-admin bridge

    Bridge br-fw-admin
        Port "p_acf4537e-0"
            tag: 3500
            Interface "p_acf4537e-0"
                type: patch
                options: {peer="p_acf4537e-1"}

But, its configuration in init script doesn't seem to be correct (not tag + defined as internal port)
###
auto p_acf4537e-0
allow-br-fw-admin p_acf4537e-0
iface p_acf4537e-0 inet manual
mtu 65000
ovs_type OVSIntPort
ovs_bridge br-fw-admin
###

The same case for br-prv - br-ex ovs bridges pair. One of the ports is not configured correctly and init script is not present for second.

As result, I lose connectivity over public and mgmt networks after the reboot.

Changed in fuel:
importance: Undecided → Critical
slava valyavskiy (slava-val-al) wrote :

Network scheme part of satute.yaml file from failed node - http://pastebin.com/43cKQLrn

Stanislav Makar (smakar) on 2015-09-14
Changed in fuel:
assignee: nobody → Stanislav Makar (smakar)

Fix proposed to branch: master
Review: https://review.openstack.org/223198

Changed in fuel:
assignee: Stanislav Makar (smakar) → Sergey Vasilenko (xenolog)
status: New → In Progress
Vladimir Kuklin (vkuklin) wrote :

Folks

this is a very isolated use-case which is, strictly speaking, not viable as OpenvSwitch is not a working solution for anything but for neutron integration/tunnel bridge

Reviewed: https://review.openstack.org/223198
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=759e4b0da01dbf30f1f1becf4703dc04c2cc6771
Submitter: Jenkins
Branch: master

commit 759e4b0da01dbf30f1f1becf4703dc04c2cc6771
Author: Sergey Vasilenko <email address hidden>
Date: Mon Sep 14 10:55:52 2015 -0500

    Tests for OVS2LNX patchcords

    Change-Id: Idb241cab7ad2064f7c2bccf5813d3252d5c701fa
    Fuel-CI: disable
    Related-bug: #1495534

Reviewed: https://review.openstack.org/223278
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=f2eef7717b15c6c0a3e76ef98ad4c7c4532d56f9
Submitter: Jenkins
Branch: stable/7.0

commit f2eef7717b15c6c0a3e76ef98ad4c7c4532d56f9
Author: Sergey Vasilenko <email address hidden>
Date: Mon Sep 14 10:55:52 2015 -0500

    Tests for OVS2LNX patchcords

    Change-Id: Idb241cab7ad2064f7c2bccf5813d3252d5c701fa
    Fuel-CI: disable
    Related-bug: #1495534

Changed in fuel:
assignee: Sergey Vasilenko (xenolog) → Stanislav Makar (smakar)
Nastya Urlapova (aurlapova) wrote :

Please, provide clear steps for workaround, we will add it to release-notes.
According our policy we cannot merge fix for High bug after HCF.

tags: added: release-notes
Changed in fuel:
assignee: Stanislav Makar (smakar) → Sergey Vasilenko (xenolog)
Changed in fuel:
assignee: Sergey Vasilenko (xenolog) → Stanislav Makar (smakar)

Change abandoned by Sergey Vasilenko (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/223347
Reason: Another, more fresh, PR should be cherry-picked

Reviewed: https://review.openstack.org/223336
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=c2b2f795ecec1af1c02e925c7e6ce1445ac16748
Submitter: Jenkins
Branch: master

commit c2b2f795ecec1af1c02e925c7e6ce1445ac16748
Author: Sergey Vasilenko <email address hidden>
Date: Mon Sep 14 18:04:28 2015 -0500

    Do not create config file for ovs2ovs patchcord

    Ubuntu with OVS < 2.4 does not support such configs at all.

    Change-Id: I9161e282f285dffc018a9d7d3fe91cb3760486ad
    Closes-bug: #1495534

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/223347
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=c622dfebfdc79eb78668ca92da6ffc7413e04df2
Submitter: Jenkins
Branch: stable/7.0

commit c622dfebfdc79eb78668ca92da6ffc7413e04df2
Author: Sergey Vasilenko <email address hidden>
Date: Mon Sep 14 18:04:28 2015 -0500

    Do not create config file for ovs2ovs patchcord

    Ubuntu with OVS < 2.4 does not support such configs at all.

    Change-Id: I9161e282f285dffc018a9d7d3fe91cb3760486ad
    Closes-bug: #1495534

Mike Scherbakov (mihgen) wrote :

This bug was bumped to Critical, as it is a blocker for telco network topology. Fix was merged to stable/7.0 accordingly.

Dmitry Kalashnik (dkalashnik) wrote :

Reproduced on ISO #301

node-1 controller after first reboot, cmd: `crm status`
    https://paste.mirantis.net/show/1162/
node-1 controller after first reboot, cmd: `ovs-vsctl show`
    https://paste.mirantis.net/show/1161/
node-1 controller after second reboot, cmd: `crm status`
    https://paste.mirantis.net/show/1165/
node-1 controller after second reboot, cmd: `ovs-vsctl show`
    https://paste.mirantis.net/show/1166/

node-2 controller before reboot, cmd: `crm status`
    https://paste.mirantis.net/show/1163/
node-2 controller before reboot, cmd: `ovs-vsctl show`
    https://paste.mirantis.net/show/1164/

Check patch ports from paste#1164 in interfaces:
root@node-2:~# grep -r p_acf4537e-0 /etc/network/interfaces.d/
root@node-2:~# grep -r p_408821a5-0 /etc/network/interfaces.d/
root@node-2:~# grep -r p_acf4537e-1 /etc/network/interfaces.d/
root@node-2:~# grep -r p_408821a5-1 /etc/network/interfaces.d/
root@node-2:~#

node-2 controller after reboot, cmd: `crm status`
    https://paste.mirantis.net/show/1167/
node-2 controller after reboot, cmd: `ovs-vsctl show`
    https://paste.mirantis.net/show/1168/

Fix is in ISO:
[root@fueldc209hw ~]# cat /etc/fuel/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "301"
  build_id: "301"
  nailgun_sha: "4162b0c15adb425b37608c787944d1983f543aa8"
  python-fuelclient_sha: "486bde57cda1badb68f915f66c61b544108606f3"
  fuel-agent_sha: "50e90af6e3d560e9085ff71d2950cfbcca91af67"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "6c5b73f93e24cc781c809db9159927655ced5012"
  fuel-library_sha: "5d50055aeca1dd0dc53b43825dc4c8f7780be9dd"
  fuel-ostf_sha: "2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c"
  fuelmain_sha: "a65d453215edb0284a2e4761be7a156bb5627677"

Check exstance of file added with fix:
[root@fueldc209hw ~]# find /etc/puppet/ -name ovs2ovs__ovs_patch__spec.rb
/etc/puppet/2015.1.0-7.0/modules/l23network/spec/classes/ovs2ovs__ovs_patch__spec.rb

Vladimir Kuklin (vkuklin) wrote :

Folks

OVS bonds and bridges for non-neutron parts are not a part of our Reference Architecture and we do not recommend such way of installation as OVS bonds are very unstable and may lead to major cluster misbehaviour.

My suggestion here is that this bug is not a blocker for the release. We can still ship it as a custom patch after we verify that everything works OK with the latest version of the patch. After, it can get into the first Maintenance Update for 7.0 release.

Nevertheless, as soon as the patch is ready, anyone is free to apply it to using fuel-library package spec and build his own fuel library package and install it onto the fuel master node. Thus, one will be able to get early access to this patch if he needs it before the first Maintenance Update is published.

Stanislav Makar (smakar) wrote :

I have found the root cause
during reboot this script removes all bridges is config files are present

https://github.com/openvswitch/ovs/blob/branch-2.3/debian/ifupdown.sh#L90

carry on looking for the best fix

Stanislav Makar (smakar) on 2015-09-24
tags: added: feature
Andrew Maksimov (maximov) wrote :

Not a release blocker, will be delivered in first maintenance update.

tags: added: customer-found
tags: added: on-verification

Reviewed: https://review.openstack.org/227250
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=4a2aaff731e3aa0f59b1c4fbadbc714b5bd1bac1
Submitter: Jenkins
Branch: master

commit 4a2aaff731e3aa0f59b1c4fbadbc714b5bd1bac1
Author: Stanislav Makar <email address hidden>
Date: Thu Sep 24 11:56:27 2015 +0000

    Fix ovs2ovs saving problem during reboot

    * Implement saving configs for ovs2ovs patches
    * Implement vlan_ids as well
    * Test coverage

    Change-Id: I288da3b859d8a25f2428cfa785eeedb72c472eaa
    Closes-bug: #1495534

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/227416
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=1a4fc674cf5c08286d46643145fffea94412d5c4
Submitter: Jenkins
Branch: stable/7.0

commit 1a4fc674cf5c08286d46643145fffea94412d5c4
Author: Stanislav Makar <email address hidden>
Date: Thu Sep 24 11:56:27 2015 +0000

    Fix ovs2ovs saving problem during reboot

    * Implement saving configs for ovs2ovs patches
    * Implement vlan_ids as well
    * Test coverage

    Change-Id: I288da3b859d8a25f2428cfa785eeedb72c472eaa
    Closes-bug: #1495534

Vitaly Sedelnik (vsedelnik) wrote :

This bug is mandatory part of 7.0 MU1 - retargeted back to 7.0-mu-1 (see me or Dmitry Klenov if you have questions why)

Dmitry Pyzhov (dpyzhov) on 2015-10-22
tags: added: area-library
Dmitry Kalashnik (dkalashnik) wrote :

Verified on 7.0 + fuel-library7.0.noarch 0:7.0.0-7239.1.gitba38e41

tags: removed: on-verification
tags: added: rca-done
tags: added: on-verification
Stanislav Makar (smakar) wrote :

how to verify

env
1 controller + cinder
1 compute
neutron + vlan
1. fuel --env=N deployment --default2. add ​provider: ovs​ to bridges: br-aux, br-ex if the exist
vim /root/deployment_1/*yaml
one by one and change

network_scheme:
 transformations:
      - action: add-br
       name: br-aux
       provider: ovs

2. fuel --env=N deployment --upload
3. deploy and run all tests
4. reboot controller and compute and again run all tests again

Mikhail Samoylov (msamoylov) wrote :
Download full text (10.0 KiB)

Verification failed.
Step for reproduce:
root@nailgun ~]# fuel env create --name myenv --rel 2 --nst vlan
Environment 'myenv' with id=1 was created!
[root@nailgun ~]# fuel node set --node 4 --role compute --env 1
Nodes [4] with roles ['compute'] were added to environment 1
[root@nailgun ~]# fuel node set --node 1 --role controller,cinder --env 1
Nodes [1] with roles ['cinder', 'controller'] were added to environment 1
[root@nailgun ~]# fuel --env-id=1 deployment --default
Default deployment info was downloaded to /root/deployment_1
[root@nailgun ~]# vim /root/deployment_1/*.yaml
add ​provider: ovs​ to bridges: br-aux, br-ex i
vim /root/deployment_1/*yaml
one by one and change
network_scheme:
 transformations:
      - action: add-br
       name: br-aux
       provider: ovs
[root@nailgun ~]# fuel --env-id=1 deployment --upload
deployment facts were uploaded.
[root@nailgun ~]# fuel deploy-changes --env-id 1
[root@nailgun ~]# ssh node-1
root@node-1:~# ovs-vsctl show
root@node-1:~# grep -r p_ff798dba-0 /etc/network/interfaces.d/
/etc/network/interfaces.d/ifcfg-p_ff798dba-1:ovs_extra -- set Interface p_ff798dba-1 type=patch options:peer=p_ff798dba-0
/etc/network/interfaces.d/ifcfg-p_ff798dba-0:auto p_ff798dba-0
/etc/network/interfaces.d/ifcfg-p_ff798dba-0:allow-br-ex p_ff798dba-0
/etc/network/interfaces.d/ifcfg-p_ff798dba-0:iface p_ff798dba-0 inet manual
/etc/network/interfaces.d/ifcfg-p_ff798dba-0:ovs_extra -- set Interface p_ff798dba-0 type=patch options:peer=p_ff798dba-1
root@node-1:~# shutdown -r now
root@node-1:~# crm status
Last updated: Thu Nov 19 13:16:59 2015
Last change: Thu Nov 19 12:46:38 2015
Stack: corosync
Current DC: node-1.test.domain.local (1) - partition with quorum
Version: 1.1.12-561c4cf
1 Nodes configured
18 Resources configured

Online: [ node-1.test.domain.local ]

 sysinfo_node-1.test.domain.local (ocf::pacemaker:SysInfo): Started node-1.test.domain.local
 Clone Set: clone_p_vrouter [p_vrouter]
     Started: [ node-1.test.domain.local ]
 vip__management (ocf::fuel:ns_IPaddr2): Started node-1.test.domain.local
 vip__vrouter_pub (ocf::fuel:ns_IPaddr2): Started node-1.test.domain.local
 vip__vrouter (ocf::fuel:ns_IPaddr2): Started node-1.test.domain.local
 vip__public (ocf::fuel:ns_IPaddr2): Started node-1.test.domain.local
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-1.test.domain.local ]
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Masters: [ node-1.test.domain.local ]
 Clone Set: clone_p_dns [p_dns]
     Started: [ node-1.test.domain.local ]
 Master/Slave Set: master_p_conntrackd [p_conntrackd]
     Masters: [ node-1.test.domain.local ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-1.test.domain.local ]
 Clone Set: clone_p_heat-engine [p_heat-engine]
     Started: [ node-1.test.domain.local ]
 Clone Set: clone_p_neutron-plugin-openvswitch-agent [p_neutron-plugin-openvswitch-agent]
     Started: [ node-1.test.domain.local ]
 Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
     Started: [ node-1.test.domain.local ]
 Clone Set: clone_p_neutron-dhcp-agent [p_neutron-dhcp-agent]
     Started: [ node-1.test.domain.local ]
 Clone Set: cl...

tags: removed: on-verification
Dmitry Pyzhov (dpyzhov) on 2015-11-23
no longer affects: fuel/8.0.x
Changed in fuel:
status: Triaged → Fix Released
status: Fix Released → Won't Fix
status: Won't Fix → Triaged
Mikhail Samoylov (msamoylov) wrote :

Verification passed in fuel version:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  openstack_version: "2015.1.0-8.0"
  api: "1.0"
  build_number: "206"
  build_id: "206"
  fuel-nailgun_sha: "beec500b254fdadcce83767c01e8a80e40aee797"
  python-fuelclient_sha: "3e7738fd3fb18a2d5f53b1ecc9706dc53b65a511"
  fuel-agent_sha: "d96ed1d854166be6da5c7fafa299b0a3feda8c42"
  fuel-nailgun-agent_sha: "b56f832abc18aee9a8c603fd6cc2055c5f4287bc"
  astute_sha: "d2c1b401816c6f0341902272f37018b9cec3c775"
  fuel-library_sha: "ae564d690bfa1883f4c79182ac43ee6a5b21cd44"
  fuel-ostf_sha: "6bcb28b3196b34256f12c3f11cbe592e746d4dae"
  fuel-createmirror_sha: "9b335c8d551c87d788166947cb7ed519757881e8"
  fuelmenu_sha: "9627849843e84b7f01c44bd79898a8d62d96ce66"
  shotgun_sha: "a3d413d1ca411ddd5c26c850932b99c5e33ca17f"
  network-checker_sha: "2c62cd52655ea6456ff6294fd63f18d6ea54fe38"
  fuel-upgrade_sha: "1e894e26d4e1423a9b0d66abd6a79505f4175ff6"
  fuelmain_sha: "a262fc9460d92f410e4fc0c8db150592059b4b4d"

Steps:
1. Create env
2. Create node with role compute
3. Create node with role controller and cinder
4. Download current config (fuel --env-id=1 deployment --default)
5. Edit /root/deployment_1/primary-controller_1.yaml
network_scheme:
 transformations:
      - action: add-br
       name: br-ex
       provider: ovs
6. Edit /root/deployment_1/cinder_1.yaml
network_scheme:
 transformations:
      - action: add-br
       name: br-ex
       provider: ovs
7. Upload new sheme (fuel --env-id=1 deployment --upload)
8. Deploy env
9. Run OSTF test
10. Reboot controller
11. Reboot compute
12. Run OSTF tests

Changed in fuel:
status: Triaged → Fix Released
tags: added: 8.0 release-notes-done
removed: release-notes
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers