Fuel for OpenStack

L3 agent did not register in several minutes after destruction of primary controller on Ubuntu HA Neutron cluster

Series 5.1.x
Bug #1395799

Bug #1395799 reported by Andrey Sledzinskiy on 2014-11-24

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Invalid	Medium	Aleksandr Didenko	Fuel for OpenStack 6.0
5.1.x	Fix Committed	High	MOS Neutron	Fuel for OpenStack 5.1.1
6.0.x	Invalid	Medium	Aleksandr Didenko	Fuel for OpenStack 6.0

Bug Description

{

    "build_id": "2014-11-23_21-01-00",
    "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",
    "build_number": "34",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "7580f6341a726c2019f880ae23ff3f1c581fd850",
    "production": "docker",
    "fuelmain_sha": "0fe3db7475b9f8b287c5b59cba94c9a40a8d8101",
    "astute_sha": "dade74af41d4972fe05a1c16ae1db2a2e60c6715",
    "feature_groups": [
        "mirantis"
    ],
    "release": "5.1.1",
    "release_versions": {
        "2014.1.3-5.1.1": {
            "VERSION": {
                "build_id": "2014-11-23_21-01-00",
                "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",
                "build_number": "34",
                "api": "1.0",
                "nailgun_sha": "7580f6341a726c2019f880ae23ff3f1c581fd850",
                "production": "docker",
                "fuelmain_sha": "0fe3db7475b9f8b287c5b59cba94c9a40a8d8101",
                "astute_sha": "dade74af41d4972fe05a1c16ae1db2a2e60c6715",
                "feature_groups": [
                    "mirantis"
                ],
                "release": "5.1.1",
                "fuellib_sha": "444339cae21c369c1d95e96c1059d4099077138e"
            }
        }
    },
    "fuellib_sha": "444339cae21c369c1d95e96c1059d4099077138e"

}

Steps:
1. Create next cluster - Ubuntu, HA, Neutron GRE, 3 controller, 2 compute, 1 cinder
2. Deploy cluster
3. Destroy primary controller

Actual result - router wasn't rescheduled to another l3 agent after destroy controller. It's still hosted by dead l3 agent - http://paste.openstack.org/show/137511/

Logs are attached

Tags:

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2014-11-24:

fail_error_deploy_ha_neutron-2014_11_24__04_53_39.tar.gz Edit (7.7 MiB, application/x-tar)

summary:

- L3 agent wasn't rescheduled after destroy primary controller on Ubuntu
- HA Neutron cluster
+ Router wasn't rescheduled to another l3 agent after destroy primary
+ controller on Ubuntu HA Neutron cluster

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-11-24: Re: Router wasn't rescheduled to another l3 agent after destroy primary controller on Ubuntu HA Neutron cluster

Workaround - rerun rescheduling script or restart l3 agent on the node. Everything will be fine and working.

Changed in fuel:
status:	New → Triaged
assignee:	Fuel Library Team (fuel-library) → MOS Neutron (mos-neutron)
summary:	- Router wasn't rescheduled to another l3 agent after destroy primary - controller on Ubuntu HA Neutron cluster + L3 agent did not register in several minutes after destruction of + primary controller on Ubuntu HA Neutron cluster

Dmitry Mescheryakov (dmitrymex) on 2014-11-24

tags:

added: neutron

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2014-11-24:

Folks, I believe we should target 6.0 too

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2014-11-25:

Issue wasn't reproduced on today 5.1.1 CI run - http://jenkins-product.srt.mirantis.net:8080/view/5.1_swarm/job/5.1_fuelmain.system_test.ubuntu.thread_5/57/

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2014-11-25:

Was not able to reproduce on 6.0

    "api": "1.0",
    "astute_sha": "c15623d05ccdf7ac10873e7a90df954de8726280",
    "auth_required": true,
    "build_id": "2014-11-24_22-41-00",
    "build_number": "4",
    "feature_groups": [
        "mirantis"
    ],
    "fuellib_sha": "893883f7fa8ffc5dde975b6806e538a11969a15b",
    "fuelmain_sha": "45b21f7bdb061b59b80f8d126d9a6f6e50505a0d",
    "nailgun_sha": "603a8d438dc7a3cf6286eb9f16deb8137f47d703",
    "ostf_sha": "a35f516f1606b0d03d51ff63bfe3fbe23de4b622",
    "production": "docker",
    "release": "6.0",

After deployment I've started VM, assigned floating IP and started ping from the outside. Then I've powered off controller that was hosting neutron router.

64 bytes from 10.108.1.129: icmp_seq=13 ttl=63 time=1.18 ms
64 bytes from 10.108.1.129: icmp_seq=14 ttl=63 time=1.17 ms
64 bytes from 10.108.1.129: icmp_seq=15 ttl=63 time=1.41 ms
64 bytes from 10.108.1.129: icmp_seq=16 ttl=63 time=0.783 ms
64 bytes from 10.108.1.129: icmp_seq=17 ttl=63 time=1.38 ms
From 10.108.1.129 icmp_seq=54 Destination Host Unreachable
From 10.108.1.129 icmp_seq=55 Destination Host Unreachable
From 10.108.1.129 icmp_seq=56 Destination Host Unreachable
From 10.108.1.129 icmp_seq=57 Destination Host Unreachable
...

But after controller cluster has finished reassabling itself, ping startet to work again:

64 bytes from 10.108.1.129: icmp_seq=1 ttl=63 time=2.66 ms
64 bytes from 10.108.1.129: icmp_seq=2 ttl=63 time=0.636 ms
64 bytes from 10.108.1.129: icmp_seq=3 ttl=63 time=0.741 ms
64 bytes from 10.108.1.129: icmp_seq=4 ttl=63 time=0.765 ms

So I'm marking this bug as invlaid for 6.0

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-01: Fix proposed to fuel-library (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/138072