There is no connectivity to management vip after cluster reinstallation

Bug #1492210 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Vladimir Kuklin
7.0.x
Fix Released
High
Alexey Shtokolov
8.0.x
Fix Released
High
Vladimir Kuklin

Bug Description

https://product-ci.infra.mirantis.net/view/7.0_swarm/job/7.0.system_test.ubuntu.full_cluster_reinstallation/15/

Steps to reproduce:
Scenario:
            1 .Create a cluster
                Add 3 nodes with controller roles
                Add a node with compute and cinder roles
                Add a node with mongo role
                Deploy the cluster
               Verify that the deployment is completed successfully
            2. Create an empty sample file on each node to check that it is not
               available after cluster reinstallation
            3. Reinstall all cluster nodes
            4. Verify that all nodes are reinstalled (not just rebooted),
               i.e. there is no sample file on a node
            5. Run network verification
            6. Run OSTF
            7. Verify that Ceilometer API service is up and running
            8. Verify that all cinder services are up and running on nodes

Actual Result:
Ostf tests are failed with keystone unavailable message by management endpoint, but actually there is no connectivity to management vip from 2 controllers and mgmt_vip is available only from controllers where it run
As result any os services do not work from 2 controllers:
 root@node-5:~# keystone user-list
/usr/lib/python2.7/dist-packages/keystoneclient/shell.py:65: DeprecationWarning: The keystone CLI is deprecated in favor of python-openstackclient. For a Python library, continue using python-keystoneclient.
  'python-keystoneclient.', DeprecationWarning)
Authorization Failed: Unable to establish connection to http://10.109.17.3:5000/v2.0/tokens

if we look at the mac near 10.109.17.3 on this node:
root@node-5:~# arp -i br-mgmt
Address HWtype HWaddress Flags Mask Iface
node-4.test.domain.loca ether 3e:57:71:b9:8a:24 C br-mgmt
node-2.test.domain.loca ether 64:39:32:7c:11:33 C br-mgmt
10.109.17.2 ether 72:34:ff:45:a1:48 C br-mgmt
node-1.test.domain.loca ether 64:2e:ce:b4:22:34 C br-mgmt
10.109.17.3 ether 56:a2:ec:9b:99:ba C br-mgmt
node-3.test.domain.loca ether 64:fc:10:5a:5d:cd C br-mgmt
root@node-5:~# arping -I br-mgmt

we can see that it actually incorrect, so that we failed to connect to management_vip

Correct mac looks like described bellow (see command output from node-4)
root@node-4:~# arp -i br-mgmt
Address HWtype HWaddress Flags Mask Iface
node-3.test.domain.loca ether 64:fc:10:5a:5d:cd C br-mgmt
10.109.17.2 ether 72:34:ff:45:a1:48 C br-mgmt
node-5.test.domain.loca ether 64:80:fe:df:2b:0e C br-mgmt
node-2.test.domain.loca ether 64:39:32:7c:11:33 C br-mgmt
10.109.17.3 ether 26:3c:31:ad:fd:10 C br-mgmt
node-1.test.domain.loca ether 64:2e:ce:b4:22:34 C br-mgmt
root@node-4:~# ip netns exec ip a
Cannot open network namespace "ip": No such file or directory
root@node-4:~# ip netns exec haproxy ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
20: hapr-ns: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 32:d3:12:47:bd:a3 brd ff:ff:ff:ff:ff:ff
    inet 240.0.0.2/30 scope global hapr-ns
       valid_lft forever preferred_lft forever
    inet6 fe80::30d3:12ff:fe47:bda3/64 scope link
       valid_lft forever preferred_lft forever
25: b_management: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 26:3c:31:ad:fd:10 brd ff:ff:ff:ff:ff:ff
    inet 10.109.17.3/24 scope global b_management
       valid_lft forever preferred_lft forever
    inet6 fe80::243c:31ff:fead:fd10/64 scope link
       valid_lft forever preferred_lft forever

After next actions connectivity was restored

root@node-5:~# arp -d 10.109.17.3
root@node-5:~# ping 10.109.17.3
PING 10.109.17.3 (10.109.17.3) 56(84) bytes of data.
64 bytes from 10.109.17.3: icmp_seq=1 ttl=64 time=0.187 ms
64 bytes from 10.109.17.3: icmp_seq=2 ttl=64 time=0.285 ms
64 bytes from 10.109.17.3: icmp_seq=3 ttl=64 time=0.184 ms

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Changed in fuel:
status: New → Confirmed
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

The issue is related to tests. What we need to do is two things:
extend the time we send arpings (increase count and wait time)
add ip neigh flush all to restore task (along with time sync)

Changed in fuel:
assignee: Stanislaw Bogatkin (sbogatkin) → Fuel Library Team (fuel-library)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

The solution for arps is to restart vips on all the nodes after we ensure that the nodes booted up correctly.

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Fuel QA Team (fuel-qa)
tags: added: non-release system-tests
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Vova, I do not agree that it is the test issue, there was not snapshot/ revert or any other issue after re-deploy and run ostf, and if you look at the snapshot, connectivity was lost after deployment finishes with success, so back issue to library

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Fuel Library Team (fuel-library)
tags: removed: non-release system-tests
Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Dmitry Ilyin (idv1985)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/220623

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/220623
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=4536284876dec4c1424680a76a6b1915f938dc74
Submitter: Jenkins
Branch: master

commit 4536284876dec4c1424680a76a6b1915f938dc74
Author: Dmitry Ilyin <email address hidden>
Date: Fri Sep 4 21:50:27 2015 +0300

    Send both REQUEST and REPLY packets to the peers

    Change-Id: I6957026028372ecc5e98ac395cad2c1c375ce11d
    Related-Bug: 1492210

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Alexey Shtokolov (ashtokolov) wrote :

Fix was verified on ISO#281
"cluster reinstallation = deploy -> provisioning -> deploy"

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

{"build_id": "286", "build_number": "286", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "286", "build_number": "286", "api": "1.0", "fuel-library_sha": "ff63a0bbc93a3a0fb78215c2fd0c77add8dfe589", "nailgun_sha": "5c33995a2e6d9b1b8cdddfa2630689da5084506f", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "082a47bf014002e515001be05f99040437281a2d", "production": "docker", "python-fuelclient_sha": "1ce8ecd8beb640f2f62f73435f4e18d1469979ac", "astute_sha": "8283dc2932c24caab852ae9de15f94605cc350c6", "fuel-ostf_sha": "1f08e6e71021179b9881a824d9c999957fcc7045", "release": "7.0", "fuelmain_sha": "9ab01caf960013dc882825dc9b0e11ccf0b81cb0"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "ff63a0bbc93a3a0fb78215c2fd0c77add8dfe589", "nailgun_sha": "5c33995a2e6d9b1b8cdddfa2630689da5084506f", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "082a47bf014002e515001be05f99040437281a2d", "production": "docker", "python-fuelclient_sha": "1ce8ecd8beb640f2f62f73435f4e18d1469979ac", "astute_sha": "8283dc2932c24caab852ae9de15f94605cc350c6", "fuel-ostf_sha": "1f08e6e71021179b9881a824d9c999957fcc7045", "release": "7.0", "fuelmain_sha": "9ab01caf960013dc882825dc9b0e11ccf0b81cb0"} fixed

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Alexey Shtokolov (ashtokolov) wrote :

Bug was reproduced on 284 iso,
Now it happens after reinstall of primary controller
And 7 OSTF tests failed:
https://product-ci.infra.mirantis.net/view/7.0_swarm/job/7.0.system_test.ubuntu.ready_node_reinstallation/31/testReport/(root)/reinstall_single_regular_controller_node/reinstall_single_regular_controller_node/

After the reinstallation of node-1 the vip_management migrated to node-2
vip__management (ocf::fuel:ns_IPaddr2): Started node-2.test.domain.local

root@node-1:~# arp -an | grep 10.109.2.3
? (10.109.2.3) at 1e:2f:1d:c5:5c:bd [ether] on br-mgmt
root@node-2:~# arp -an|grep 10.109.2.3
? (10.109.2.3) at 1e:2f:1d:c5:5c:bd [ether] on br-mgmt

But node-5 still has old mac-address of vip_management
root@node-5:~# arp -an| grep 10.109.2.3
? (10.109.2.3) at 06:ca:b1:6f:5c:bb [ether] on br-mgmt

Changed in fuel:
status: Fix Released → Triaged
Andrey Maximov (maximov)
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Fuel Library Team (fuel-library)
Andrey Maximov (maximov)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Michael Polenchuk (mpolenchuk)
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Guys I can not agree that it is the same case, some tests passed here - also failed test successfully connected to management vip and kystone and can send request to nva and other service (like example we was possible to request instance creation) If there was not connectivity ti management vip we should fail when try to get token from keystone. And actually instance failed because get ERROR state and in compute log there are a lot of oslo message errors, so we should look at the rabbit and oslo here

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/221185

Changed in fuel:
status: In Progress → Triaged
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "288"
  build_id: "288"
  nailgun_sha: "93477f9b42c5a5e0506248659f40bebc9ac23943"
  python-fuelclient_sha: "1ce8ecd8beb640f2f62f73435f4e18d1469979ac"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "a717657232721a7fafc67ff5e1c696c9dbeb0b95"
  fuel-library_sha: "121016a09b0e889994118aa3ea42fa67eabb8f25"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "6b83d6a6a75bf7bca3177fcf63b2eebbf1ad0a85"

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/221718

Changed in fuel:
assignee: Michael Polenchuk (mpolenchuk) → Vladimir Kuklin (vkuklin)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/221719

Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Michael Polenchuk (mpolenchuk)
Changed in fuel:
assignee: Michael Polenchuk (mpolenchuk) → Vladimir Kuklin (vkuklin)
Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Michael Polenchuk (mpolenchuk)
Revision history for this message
Michael Polenchuk (mpolenchuk) wrote :

it's important that corosync shuts down after pacemaker. But we've the following:
# ls -l /etc/rc{0,1,6}.d/ | egrep '(corosync|pacemaker)'
lrwxrwxrwx 1 root root 18 Sep 9 07:14 K01corosync -> ../init.d/corosync
lrwxrwxrwx 1 root root 26 Sep 9 07:14 K01corosync-notifyd -> ../init.d/corosync-notifyd
lrwxrwxrwx 1 root root 19 Sep 9 07:14 K20pacemaker -> ../init.d/pacemaker
lrwxrwxrwx 1 root root 18 Sep 9 07:14 K01corosync -> ../init.d/corosync
lrwxrwxrwx 1 root root 26 Sep 9 07:14 K01corosync-notifyd -> ../init.d/corosync-notifyd
lrwxrwxrwx 1 root root 19 Sep 9 07:14 K20pacemaker -> ../init.d/pacemaker
lrwxrwxrwx 1 root root 18 Sep 9 07:14 K01corosync -> ../init.d/corosync
lrwxrwxrwx 1 root root 26 Sep 9 07:14 K01corosync-notifyd -> ../init.d/corosync-notifyd
lrwxrwxrwx 1 root root 19 Sep 9 07:14 K20pacemaker -> ../init.d/pacemaker

tags: added: release-notes
Revision history for this message
Michael Polenchuk (mpolenchuk) wrote :

Switching the stop order of corosync/pacemaker solved the issue.
In {/etc/rc6.d, /etc/rc1.d, /etc/rc0.d} make the following:
  # rm K20pacemaker
  # ln -s ../init.d/pacemaker K00pacemaker

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Michael Polenchuk (<email address hidden>) on branch: master
Review: https://review.openstack.org/221185
Reason: I guess the issue should be fixed by means of pacemaker package: /etc/rc{0,1,6}.d/K20pacemaker -> K00pacemaker

Revision history for this message
Alexey Shtokolov (ashtokolov) wrote :

We should cover both cases: standard reboot and unexpected shutdown
For last case we need https://review.openstack.org/#/c/221719 it was tested on ISO#286

/etc/rc{0,1,6}.d/K20pacemaker -> K00pacemaker should be implemented and carefully tested in the next release circle

Changed in fuel:
assignee: Michael Polenchuk (mpolenchuk) → Vladimir Kuklin (vkuklin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/221718
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=8f280f6ac44707973a19cb779b77e474355cff59
Submitter: Jenkins
Branch: master

commit 8f280f6ac44707973a19cb779b77e474355cff59
Author: Vladimir Kuklin <email address hidden>
Date: Wed Sep 9 14:58:05 2015 +0300

    Send gratutious ARPs on ip monitor event also

    This commit sends gratuitious ARP requests in case
    ip_monitor returns zero exit code. This will allow
    for periodical updates of neighbours in case our IP
    is OK and neighbour did not receive arps of the node
    after ip was initially started.

    Change-Id: I6d6134eee08352c1b0f0ef85dc79bfd6ee804378
    Closes-bug: #1492210

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/7.0)

Reviewed: https://review.openstack.org/221719
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=8380734ab1877d67a5c85b4f8be5b48ac0d79a81
Submitter: Jenkins
Branch: stable/7.0

commit 8380734ab1877d67a5c85b4f8be5b48ac0d79a81
Author: Vladimir Kuklin <email address hidden>
Date: Wed Sep 9 14:58:05 2015 +0300

    Send gratutious ARPs on ip monitor event also

    This commit sends gratuitious ARP requests in case
    ip_monitor returns zero exit code. This will allow
    for periodical updates of neighbours in case our IP
    is OK and neighbour did not receive arps of the node
    after ip was initially started.

    Change-Id: I6d6134eee08352c1b0f0ef85dc79bfd6ee804378
    Closes-bug: #1492210

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-docs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/223120

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/223124

tags: added: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-docs (master)

Reviewed: https://review.openstack.org/223124
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=9d5194973238848d1a1ccdab310dae99109b4a47
Submitter: Jenkins
Branch: master

commit 9d5194973238848d1a1ccdab310dae99109b4a47
Author: evkonstantinov <email address hidden>
Date: Mon Sep 14 17:29:15 2015 +0300

    Add virt role deployment issue to relnotes

    Change-Id: I1f62e7c0490b160dc158c294dc9ebad0f2ad0760
    Related-Bug:#1492210

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/223120
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=f4c00aa6869e3bf7f9bfbf3d5f269cf13a41f512
Submitter: Jenkins
Branch: master

commit f4c00aa6869e3bf7f9bfbf3d5f269cf13a41f512
Author: evkonstantinov <email address hidden>
Date: Mon Sep 14 17:19:28 2015 +0300

    Add management VIP issue to relnotes

    Change-Id: I5d1b4637f9d87694ed1a6c86aacda563d6e643e1
    Related-Bug:#1492210

Revision history for this message
Veronica Krayneva (vkrayneva) wrote :

Checked on 292 iso. Bug was fixed

tags: removed: on-verification
tags: added: on-verification
Revision history for this message
Veronica Krayneva (vkrayneva) wrote :

Checked on fuel-kilo-8.0-108-2015-10-02_10-23-00.iso. Bug was fixed.

tags: removed: on-verification
Dmitry Pyzhov (dpyzhov)
tags: added: area-library
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 7.0 → 8.0
tags: added: 8.0 release-notes-done
removed: release-notes
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.