Custom nodegroup tests failing sporadically

Bug #1652765 reported by Dmitry Belyaninov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Kuklin
Mitaka
Fix Released
High
Alexey Shtokolov
Newton
Fix Released
High
Alexey Shtokolov
Ocata
Fix Committed
High
Vladimir Kuklin

Bug Description

        Scenario:
            1. Create new environment with VLAN segmentation for Neutron
            2. Set KVM as Hypervisor
            3. Add controller and compute nodes
            4. Configure HugePages for compute nodes
            5. Configure private network in DPDK mode
            6. Run network verification
            7. Deploy environment
            8. Run network verification
            9. Run OSTF
            10. Reboot compute
FAIL -> 11. Run OSTF
            12. Run instance on compute with DPDK and check its availability
                via floating IP

https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.support_dpdk/158/testReport/(root)/deploy_cluster_with_dpdk/deploy_cluster_with_dpdk/

https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.thread_7/166/testReport/(root)/add_custom_nodegroup/add_custom_nodegroup/

<163>Dec 27 10:35:25 node-1 neutron-openvswitch-agent: 2016-12-27 10:35:25.249 5321 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: hard_timeout=0,idle_timeout=0,priority=0,table=24,cookie=11784139632982448825,actions=drop; Stdout: ; Stderr: ovs-ofctl: br-int is not a bridge or a socket

Full logs:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.support_dpdk/158/artifact/logs/fail_error_deploy_cluster_with_dpdk-fuel-snapshot-2016-12-27_03-42-53.tar

https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.thread_7/166/artifact/logs/fail_error_add_custom_nodegroup-fuel-snapshot-2016-12-27_05-47-19.tar

tags: added: swarm-fail
tags: added: area-neutron
Changed in fuel:
status: New → Confirmed
tags: added: feature-dpdk
Revision history for this message
Alexander Ignatov (aignatov) wrote :

Starting from 3:40 in logs on node 1 it's been constantly seeing logs where agent gets terminated

2016-12-27T03:40:19.605173+00:00 err: 2016-12-27 03:40:19.584 3701 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: hard_timeout=0,idle_timeout=0,priority=0,table=24,cookie=13505880170845670788,actions=drop; Stdout: ; Stderr: ovs-ofctl: br-int is not a bridge or
a socket
2016-12-27T03:40:19.605173+00:00 err: 2016-12-27 03:40:19.584 3701 ERROR neutron.agent.common.ovs_lib [-] Unable to execute ['ovs-ofctl', 'add-flows', 'br-int', '-']. Exception: Exit code: 1; Stdin: hard_timeout=0,idle_timeout=0,priority=0,table=24,cookie=135058801708456
70788,actions=drop; Stdout: ; Stderr: ovs-ofctl: br-int is not a bridge or a socket
2016-12-27T03:40:19.605173+00:00 debug: 2016-12-27 03:40:19.590 3701 DEBUG neutron.agent.linux.utils [req-3da6051a-4253-4cae-a0bf-39fe8db2c969 - - - - -] Running command (rootwrap daemon): ['ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', 'list-br'] execu
te_rootwrap_daemon /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:100
2016-12-27T03:40:19.605173+00:00 debug: 2016-12-27 03:40:19.603 3701 DEBUG neutron.agent.linux.utils [req-3da6051a-4253-4cae-a0bf-39fe8db2c969 - - - - -] Exit code: 0 execute /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:142
2016-12-27T03:40:19.605722+00:00 info: 2016-12-27 03:40:19.604 3701 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3da6051a-4253-4cae-a0bf-39fe8db2c969 - - - - -] Mapping physical network physnet2 to bridge br-prv
2016-12-27T03:40:19.606037+00:00 err: 2016-12-27 03:40:19.604 3701 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3da6051a-4253-4cae-a0bf-39fe8db2c969 - - - - -] Bridge br-prv for physical network physnet2 does not exist. Agent terminated!
2016-12-27T03:40:19.607427+00:00 info: 2016-12-27 03:40:19.606 3701 INFO oslo_rootwrap.client [req-3da6051a-4253-4cae-a0bf-39fe8db2c969 - - - - -] Stopping rootwrap daemon process with pid=3709

Revision history for this message
Alexander Ignatov (aignatov) wrote :
Revision history for this message
Alexander Ignatov (aignatov) wrote :

On reverted snapshot on rebooted compute node I found that br-prv definition has beeing missed in
root@node-2:~# ovs-vsctl show
e1d97732-f42f-46c5-9604-a5004e144cfb
    Bridge br-int
        fail_mode: secure
        Port br-int
            Interface br-int
                type: internal
        Port int-br-prv
            Interface int-br-prv
                type: patch
                options: {peer=phy-br-prv}
    ovs_version: "2.6.1"

Revision history for this message
Alexander Ignatov (aignatov) wrote :

Suggest to reopen bug https://bugs.launchpad.net/fuel/+bug/1555162 for 9.2

Changed in fuel:
assignee: MOS Neutron (mos-neutron) → Fuel Sustaining (fuel-sustaining-team)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

I think that this bug needs to be splitted into 2 different ones. The first one (related to dpdk) might require investigation whether it is a duplicate of https://bugs.launchpad.net/mos/+bug/1652937. The latter should be investigated with regards to node group addition

summary: - [9.1 swarm] Neutron failed to bind a port
+ Custom nodegroup tests failing sporadically
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

So the root cause of the latter issue with the node groups is that some of the node groups do not have connectivity to the management VIP, while computes from the other node group have.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

So, the root cause of the issue is that, for some reason, haproxy is not restarted with new set of 'other_networks' parameter which controls the networks that should be routed through the host out of haproxy namespace and thus the packets do not reach newly added compute nodes from the new node group.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

At the same time, puppet reconfigures haproxy resource, but it is not restarted for some reason.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

So the issue is that haproxy does not set ip routes on reload which is triggered by pacemaker

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/416687

Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Vladimir Kuklin (vkuklin)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/416687
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=a4ad4193684a133beb80aec977e0f910ce2fcab6
Submitter: Jenkins
Branch: master

commit a4ad4193684a133beb80aec977e0f910ce2fcab6
Author: Vladimir Kuklin <email address hidden>
Date: Wed Jan 4 20:47:26 2017 +0300

    Fix multirack routes installation for vrouter and haproxy

    1. Both scripts do not flush ip route table for non-local
       routes, which makes them non-idempotent
    2. Haproxy did not add routes on reload

    Change-Id: I498870b45ac47e6d6d8808d18964f3c2777c930c
    Closes-bug: #1652765

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/417028

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/417029

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/417029
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=864c8e8696c20e2906803b707e5f6c5c67d7cf87
Submitter: Jenkins
Branch: stable/mitaka

commit 864c8e8696c20e2906803b707e5f6c5c67d7cf87
Author: Vladimir Kuklin <email address hidden>
Date: Wed Jan 4 20:47:26 2017 +0300

    Fix multirack routes installation for vrouter and haproxy

    1. Both scripts do not flush ip route table for non-local
       routes, which makes them non-idempotent
    2. Haproxy did not add routes on reload

    Change-Id: I498870b45ac47e6d6d8808d18964f3c2777c930c
    Closes-bug: #1652765
    (cherry picked from commit a4ad4193684a133beb80aec977e0f910ce2fcab6)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/newton)

Reviewed: https://review.openstack.org/417028
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=73e7860944c48ce3eecbbfd8b1ebb4aae2c76ce6
Submitter: Jenkins
Branch: stable/newton

commit 73e7860944c48ce3eecbbfd8b1ebb4aae2c76ce6
Author: Vladimir Kuklin <email address hidden>
Date: Wed Jan 4 20:47:26 2017 +0300

    Fix multirack routes installation for vrouter and haproxy

    1. Both scripts do not flush ip route table for non-local
       routes, which makes them non-idempotent
    2. Haproxy did not add routes on reload

    Change-Id: I498870b45ac47e6d6d8808d18964f3c2777c930c
    Closes-bug: #1652765
    (cherry picked from commit a4ad4193684a133beb80aec977e0f910ce2fcab6)

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

The last 5 runs of [1] are green. Moved to Fix Released for 9.2

[1] https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.support_dpdk/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 11.0.0.0rc1

This issue was fixed in the openstack/fuel-library 11.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.