L3 agent did not register in several minutes after destruction of primary controller on Ubuntu HA Neutron cluster

Bug #1395799 reported by Andrey Sledzinskiy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Aleksandr Didenko
5.1.x
Fix Committed
High
MOS Neutron
6.0.x
Invalid
Medium
Aleksandr Didenko

Bug Description

{

    "build_id": "2014-11-23_21-01-00",
    "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",
    "build_number": "34",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "7580f6341a726c2019f880ae23ff3f1c581fd850",
    "production": "docker",
    "fuelmain_sha": "0fe3db7475b9f8b287c5b59cba94c9a40a8d8101",
    "astute_sha": "dade74af41d4972fe05a1c16ae1db2a2e60c6715",
    "feature_groups": [
        "mirantis"
    ],
    "release": "5.1.1",
    "release_versions": {
        "2014.1.3-5.1.1": {
            "VERSION": {
                "build_id": "2014-11-23_21-01-00",
                "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",
                "build_number": "34",
                "api": "1.0",
                "nailgun_sha": "7580f6341a726c2019f880ae23ff3f1c581fd850",
                "production": "docker",
                "fuelmain_sha": "0fe3db7475b9f8b287c5b59cba94c9a40a8d8101",
                "astute_sha": "dade74af41d4972fe05a1c16ae1db2a2e60c6715",
                "feature_groups": [
                    "mirantis"
                ],
                "release": "5.1.1",
                "fuellib_sha": "444339cae21c369c1d95e96c1059d4099077138e"
            }
        }
    },
    "fuellib_sha": "444339cae21c369c1d95e96c1059d4099077138e"

}

Steps:
1. Create next cluster - Ubuntu, HA, Neutron GRE, 3 controller, 2 compute, 1 cinder
2. Deploy cluster
3. Destroy primary controller

Actual result - router wasn't rescheduled to another l3 agent after destroy controller. It's still hosted by dead l3 agent - http://paste.openstack.org/show/137511/

Logs are attached

Tags: neutron
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
summary: - L3 agent wasn't rescheduled after destroy primary controller on Ubuntu
- HA Neutron cluster
+ Router wasn't rescheduled to another l3 agent after destroy primary
+ controller on Ubuntu HA Neutron cluster
Revision history for this message
Vladimir Kuklin (vkuklin) wrote : Re: Router wasn't rescheduled to another l3 agent after destroy primary controller on Ubuntu HA Neutron cluster

Workaround - rerun rescheduling script or restart l3 agent on the node. Everything will be fine and working.

Changed in fuel:
status: New → Triaged
assignee: Fuel Library Team (fuel-library) → MOS Neutron (mos-neutron)
summary: - Router wasn't rescheduled to another l3 agent after destroy primary
- controller on Ubuntu HA Neutron cluster
+ L3 agent did not register in several minutes after destruction of
+ primary controller on Ubuntu HA Neutron cluster
tags: added: neutron
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Folks, I believe we should target 6.0 too

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Was not able to reproduce on 6.0

    "api": "1.0",
    "astute_sha": "c15623d05ccdf7ac10873e7a90df954de8726280",
    "auth_required": true,
    "build_id": "2014-11-24_22-41-00",
    "build_number": "4",
    "feature_groups": [
        "mirantis"
    ],
    "fuellib_sha": "893883f7fa8ffc5dde975b6806e538a11969a15b",
    "fuelmain_sha": "45b21f7bdb061b59b80f8d126d9a6f6e50505a0d",
    "nailgun_sha": "603a8d438dc7a3cf6286eb9f16deb8137f47d703",
    "ostf_sha": "a35f516f1606b0d03d51ff63bfe3fbe23de4b622",
    "production": "docker",
    "release": "6.0",

After deployment I've started VM, assigned floating IP and started ping from the outside. Then I've powered off controller that was hosting neutron router.

64 bytes from 10.108.1.129: icmp_seq=13 ttl=63 time=1.18 ms
64 bytes from 10.108.1.129: icmp_seq=14 ttl=63 time=1.17 ms
64 bytes from 10.108.1.129: icmp_seq=15 ttl=63 time=1.41 ms
64 bytes from 10.108.1.129: icmp_seq=16 ttl=63 time=0.783 ms
64 bytes from 10.108.1.129: icmp_seq=17 ttl=63 time=1.38 ms
From 10.108.1.129 icmp_seq=54 Destination Host Unreachable
From 10.108.1.129 icmp_seq=55 Destination Host Unreachable
From 10.108.1.129 icmp_seq=56 Destination Host Unreachable
From 10.108.1.129 icmp_seq=57 Destination Host Unreachable
...

But after controller cluster has finished reassabling itself, ping startet to work again:

64 bytes from 10.108.1.129: icmp_seq=1 ttl=63 time=2.66 ms
64 bytes from 10.108.1.129: icmp_seq=2 ttl=63 time=0.636 ms
64 bytes from 10.108.1.129: icmp_seq=3 ttl=63 time=0.741 ms
64 bytes from 10.108.1.129: icmp_seq=4 ttl=63 time=0.765 ms

So I'm marking this bug as invlaid for 6.0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/138072

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.1)

Reviewed: https://review.openstack.org/138072
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=60b5a480e8ffc860ecc1d0d12b475c7c91c43777
Submitter: Jenkins
Branch: stable/5.1

commit 60b5a480e8ffc860ecc1d0d12b475c7c91c43777
Author: Vladimir Kuklin <email address hidden>
Date: Mon Dec 1 17:10:04 2014 +0300

    Increase rescheduling retries to 10

    Increase rescheduling retries for agents
    in order to make it work even with very
    slow or busy environments

    Change-Id: I85c84b7c2395e1463b6b4c291ceab5041519d4b7
    Closes-bug: #1395799

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.