Fuel for OpenStack

There is no connectivity to management vip after cluster reinstallation

Bug #1492210 reported by Tatyanka on 2015-09-04

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Released	High	Vladimir Kuklin	Fuel for OpenStack 8.0
7.0.x	Fix Released	High	Alexey Shtokolov	Fuel for OpenStack 7.0
8.0.x	Fix Released	High	Vladimir Kuklin	Fuel for OpenStack 8.0

Bug Description

https://product-ci.infra.mirantis.net/view/7.0_swarm/job/7.0.system_test.ubuntu.full_cluster_reinstallation/15/

Steps to reproduce:
Scenario:
            1 .Create a cluster
                Add 3 nodes with controller roles
                Add a node with compute and cinder roles
                Add a node with mongo role
                Deploy the cluster
               Verify that the deployment is completed successfully
            2. Create an empty sample file on each node to check that it is not
               available after cluster reinstallation
            3. Reinstall all cluster nodes
            4. Verify that all nodes are reinstalled (not just rebooted),
               i.e. there is no sample file on a node
            5. Run network verification
            6. Run OSTF
            7. Verify that Ceilometer API service is up and running
            8. Verify that all cinder services are up and running on nodes

Actual Result:
Ostf tests are failed with keystone unavailable message by management endpoint, but actually there is no connectivity to management vip from 2 controllers and mgmt_vip is available only from controllers where it run
As result any os services do not work from 2 controllers:
root@node-5:~# keystone user-list
/usr/lib/python2.7/dist-packages/keystoneclient/shell.py:65: DeprecationWarning: The keystone CLI is deprecated in favor of python-openstackclient. For a Python library, continue using python-keystoneclient.
'python-keystoneclient.', DeprecationWarning)
Authorization Failed: Unable to establish connection to http://10.109.17.3:5000/v2.0/tokens

if we look at the mac near 10.109.17.3 on this node:
root@node-5:~# arp -i br-mgmt
Address HWtype HWaddress Flags Mask Iface
node-4.test.domain.loca ether 3e:57:71:b9:8a:24 C br-mgmt
node-2.test.domain.loca ether 64:39:32:7c:11:33 C br-mgmt
10.109.17.2 ether 72:34:ff:45:a1:48 C br-mgmt
node-1.test.domain.loca ether 64:2e:ce:b4:22:34 C br-mgmt
10.109.17.3 ether 56:a2:ec:9b:99:ba C br-mgmt
node-3.test.domain.loca ether 64:fc:10:5a:5d:cd C br-mgmt
root@node-5:~# arping -I br-mgmt

we can see that it actually incorrect, so that we failed to connect to management_vip

Correct mac looks like described bellow (see command output from node-4)
root@node-4:~# arp -i br-mgmt
Address HWtype HWaddress Flags Mask Iface
node-3.test.domain.loca ether 64:fc:10:5a:5d:cd C br-mgmt
10.109.17.2 ether 72:34:ff:45:a1:48 C br-mgmt
node-5.test.domain.loca ether 64:80:fe:df:2b:0e C br-mgmt
node-2.test.domain.loca ether 64:39:32:7c:11:33 C br-mgmt
10.109.17.3 ether 26:3c:31:ad:fd:10 C br-mgmt
node-1.test.domain.loca ether 64:2e:ce:b4:22:34 C br-mgmt
root@node-4:~# ip netns exec ip a
Cannot open network namespace "ip": No such file or directory
root@node-4:~# ip netns exec haproxy ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
20: hapr-ns: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 32:d3:12:47:bd:a3 brd ff:ff:ff:ff:ff:ff
    inet 240.0.0.2/30 scope global hapr-ns
       valid_lft forever preferred_lft forever
    inet6 fe80::30d3:12ff:fe47:bda3/64 scope link
       valid_lft forever preferred_lft forever
25: b_management: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 26:3c:31:ad:fd:10 brd ff:ff:ff:ff:ff:ff
    inet 10.109.17.3/24 scope global b_management
       valid_lft forever preferred_lft forever
    inet6 fe80::243c:31ff:fead:fd10/64 scope link
       valid_lft forever preferred_lft forever

After next actions connectivity was restored

root@node-5:~# arp -d 10.109.17.3
root@node-5:~# ping 10.109.17.3
PING 10.109.17.3 (10.109.17.3) 56(84) bytes of data.
64 bytes from 10.109.17.3: icmp_seq=1 ttl=64 time=0.187 ms
64 bytes from 10.109.17.3: icmp_seq=2 ttl=64 time=0.285 ms
64 bytes from 10.109.17.3: icmp_seq=3 ttl=64 time=0.184 ms

Tags:

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2015-09-04:

fail_error_full_cluster_reinstallation-fuel-snapshot-2015-09-04_01-33-03.tar.xz Edit (57.1 MiB, application/octet-stream)

Nastya Urlapova (aurlapova) on 2015-09-04

Changed in fuel:
status:	New → Confirmed

Stanislaw Bogatkin (sbogatkin) on 2015-09-04

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2015-09-04:

The issue is related to tests. What we need to do is two things:
extend the time we send arpings (increase count and wait time)
add ip neigh flush all to restore task (along with time sync)

Stanislaw Bogatkin (sbogatkin) on 2015-09-04

Changed in fuel:
assignee:	Stanislaw Bogatkin (sbogatkin) → Fuel Library Team (fuel-library)

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2015-09-04:

The solution for arps is to restart vips on all the nodes after we ensure that the nodes booted up correctly.

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Fuel QA Team (fuel-qa)
tags:	added: non-release system-tests

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2015-09-04:

Vova, I do not agree that it is the test issue, there was not snapshot/ revert or any other issue after re-deploy and run ostf, and if you look at the snapshot, connectivity was lost after deployment finishes with success, so back issue to library

Changed in fuel:
assignee:	Fuel QA Team (fuel-qa) → Fuel Library Team (fuel-library)
tags:	removed: non-release system-tests

Dmitry Ilyin (idv1985) on 2015-09-04

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Dmitry Ilyin (idv1985)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-04: Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/220623

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-05: Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/220623
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=4536284876dec4c1424680a76a6b1915f938dc74
Submitter: Jenkins
Branch: master

commit 4536284876dec4c1424680a76a6b1915f938dc74
Author: Dmitry Ilyin <email address hidden>
Date: Fri Sep 4 21:50:27 2015 +0300

Send both REQUEST and REPLY packets to the peers

Change-Id: I6957026028372ecc5e98ac395cad2c1c375ce11d
Related-Bug: 1492210

Alexey Shtokolov (ashtokolov) on 2015-09-06

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

Alexey Shtokolov (ashtokolov) wrote on 2015-09-06:

Fix was verified on ISO#281
"cluster reinstallation = deploy -> provisioning -> deploy"

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2015-09-07:

{"build_id": "286", "build_number": "286", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "286", "build_number": "286", "api": "1.0", "fuel-library_sha": "ff63a0bbc93a3a0fb78215c2fd0c77add8dfe589", "nailgun_sha": "5c33995a2e6d9b1b8cdddfa2630689da5084506f", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "082a47bf014002e515001be05f99040437281a2d", "production": "docker", "python-fuelclient_sha": "1ce8ecd8beb640f2f62f73435f4e18d1469979ac", "astute_sha": "8283dc2932c24caab852ae9de15f94605cc350c6", "fuel-ostf_sha": "1f08e6e71021179b9881a824d9c999957fcc7045", "release": "7.0", "fuelmain_sha": "9ab01caf960013dc882825dc9b0e11ccf0b81cb0"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "ff63a0bbc93a3a0fb78215c2fd0c77add8dfe589", "nailgun_sha": "5c33995a2e6d9b1b8cdddfa2630689da5084506f", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "082a47bf014002e515001be05f99040437281a2d", "production": "docker", "python-fuelclient_sha": "1ce8ecd8beb640f2f62f73435f4e18d1469979ac", "astute_sha": "8283dc2932c24caab852ae9de15f94605cc350c6", "fuel-ostf_sha": "1f08e6e71021179b9881a824d9c999957fcc7045", "release": "7.0", "fuelmain_sha": "9ab01caf960013dc882825dc9b0e11ccf0b81cb0"} fixed

Changed in fuel:
status:	Fix Committed → Fix Released

Revision history for this message

Alexey Shtokolov (ashtokolov) wrote on 2015-09-07:

Bug was reproduced on 284 iso,
Now it happens after reinstall of primary controller
And 7 OSTF tests failed:
https://product-ci.infra.mirantis.net/view/7.0_swarm/job/7.0.system_test.ubuntu.ready_node_reinstallation/31/testReport/(root)/reinstall_single_regular_controller_node/reinstall_single_regular_controller_node/

After the reinstallation of node-1 the vip_management migrated to node-2
vip__management (ocf::fuel:ns_IPaddr2): Started node-2.test.domain.local

root@node-1:~# arp -an | grep 10.109.2.3
? (10.109.2.3) at 1e:2f:1d:c5:5c:bd [ether] on br-mgmt
root@node-2:~# arp -an|grep 10.109.2.3
? (10.109.2.3) at 1e:2f:1d:c5:5c:bd [ether] on br-mgmt

But node-5 still has old mac-address of vip_management
root@node-5:~# arp -an| grep 10.109.2.3
? (10.109.2.3) at 06:ca:b1:6f:5c:bb [ether] on br-mgmt

Changed in fuel:
status:	Fix Released → Triaged

Andrey Maximov (maximov) on 2015-09-07

Changed in fuel:
assignee:	Dmitry Ilyin (idv1985) → Fuel Library Team (fuel-library)

Andrey Maximov (maximov) on 2015-09-07

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Michael Polenchuk (mpolenchuk)

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2015-09-07:

#10

Guys I can not agree that it is the same case, some tests passed here - also failed test successfully connected to management vip and kystone and can send request to nva and other service (like example we was possible to request instance creation) If there was not connectivity ti management vip we should fail when try to get token from keystone. And actually instance failed because get ERROR state and in compute log there are a lot of oslo message errors, so we should look at the rabbit and oslo here

Michael Polenchuk (mpolenchuk) on 2015-09-08

Changed in fuel:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-08: Fix proposed to fuel-library (master)

#11

Fix proposed to branch: master
Review: https://review.openstack.org/221185

Nastya Urlapova (aurlapova) on 2015-09-08

Changed in fuel:
status:	In Progress → Triaged

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2015-09-09:

#12

reproduced on https://product-ci.infra.mirantis.net/view/7.0_swarm/job/7.0.system_test.ubuntu.ha_destructive_ceph_neutron/52/

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2015-09-09:

#13

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "288"
  build_id: "288"
  nailgun_sha: "93477f9b42c5a5e0506248659f40bebc9ac23943"
  python-fuelclient_sha: "1ce8ecd8beb640f2f62f73435f4e18d1469979ac"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "a717657232721a7fafc67ff5e1c696c9dbeb0b95"
  fuel-library_sha: "121016a09b0e889994118aa3ea42fa67eabb8f25"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "6b83d6a6a75bf7bca3177fcf63b2eebbf1ad0a85"

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-09:

#14

Fix proposed to branch: master
Review: https://review.openstack.org/221718

Changed in fuel:
assignee:	Michael Polenchuk (mpolenchuk) → Vladimir Kuklin (vkuklin)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-09: Fix proposed to fuel-library (stable/7.0)

#15

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/221719

OpenStack Infra (hudson-openstack) on 2015-09-09

Changed in fuel:
assignee:	Vladimir Kuklin (vkuklin) → Michael Polenchuk (mpolenchuk)

OpenStack Infra (hudson-openstack) on 2015-09-09

Changed in fuel:
assignee:	Michael Polenchuk (mpolenchuk) → Vladimir Kuklin (vkuklin)

OpenStack Infra (hudson-openstack) on 2015-09-10

Changed in fuel:
assignee:	Vladimir Kuklin (vkuklin) → Michael Polenchuk (mpolenchuk)

Revision history for this message

Michael Polenchuk (mpolenchuk) wrote on 2015-09-10:

#16

it's important that corosync shuts down after pacemaker. But we've the following:
# ls -l /etc/rc{0,1,6}.d/ | egrep '(corosync|pacemaker)'
lrwxrwxrwx 1 root root 18 Sep 9 07:14 K01corosync -> ../init.d/corosync
lrwxrwxrwx 1 root root 26 Sep 9 07:14 K01corosync-notifyd -> ../init.d/corosync-notifyd
lrwxrwxrwx 1 root root 19 Sep 9 07:14 K20pacemaker -> ../init.d/pacemaker
lrwxrwxrwx 1 root root 18 Sep 9 07:14 K01corosync -> ../init.d/corosync
lrwxrwxrwx 1 root root 26 Sep 9 07:14 K01corosync-notifyd -> ../init.d/corosync-notifyd
lrwxrwxrwx 1 root root 19 Sep 9 07:14 K20pacemaker -> ../init.d/pacemaker
lrwxrwxrwx 1 root root 18 Sep 9 07:14 K01corosync -> ../init.d/corosync
lrwxrwxrwx 1 root root 26 Sep 9 07:14 K01corosync-notifyd -> ../init.d/corosync-notifyd
lrwxrwxrwx 1 root root 19 Sep 9 07:14 K20pacemaker -> ../init.d/pacemaker

Nastya Urlapova (aurlapova) on 2015-09-10

tags:

added: release-notes

Revision history for this message

Michael Polenchuk (mpolenchuk) wrote on 2015-09-10:

#17

Switching the stop order of corosync/pacemaker solved the issue.
In {/etc/rc6.d, /etc/rc1.d, /etc/rc0.d} make the following:
# rm K20pacemaker
# ln -s ../init.d/pacemaker K00pacemaker

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-10: Change abandoned on fuel-library (master)

#18

Change abandoned by Michael Polenchuk (<email address hidden>) on branch: master
Review: https://review.openstack.org/221185
Reason: I guess the issue should be fixed by means of pacemaker package: /etc/rc{0,1,6}.d/K20pacemaker -> K00pacemaker

Revision history for this message

Alexey Shtokolov (ashtokolov) wrote on 2015-09-10:

#19

We should cover both cases: standard reboot and unexpected shutdown
For last case we need https://review.openstack.org/#/c/221719 it was tested on ISO#286

/etc/rc{0,1,6}.d/K20pacemaker -> K00pacemaker should be implemented and carefully tested in the next release circle

OpenStack Infra (hudson-openstack) on 2015-09-10

Changed in fuel:
assignee:	Michael Polenchuk (mpolenchuk) → Vladimir Kuklin (vkuklin)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-11: Fix merged to fuel-library (master)

#20

Reviewed: https://review.openstack.org/221718
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=8f280f6ac44707973a19cb779b77e474355cff59
Submitter: Jenkins
Branch: master

commit 8f280f6ac44707973a19cb779b77e474355cff59
Author: Vladimir Kuklin <email address hidden>
Date: Wed Sep 9 14:58:05 2015 +0300

Send gratutious ARPs on ip monitor event also

    This commit sends gratuitious ARP requests in case
    ip_monitor returns zero exit code. This will allow
    for periodical updates of neighbours in case our IP
    is OK and neighbour did not receive arps of the node
    after ip was initially started.

Change-Id: I6d6134eee08352c1b0f0ef85dc79bfd6ee804378
Closes-bug: #1492210

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-11: Fix merged to fuel-library (stable/7.0)

#21

Reviewed: https://review.openstack.org/221719
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=8380734ab1877d67a5c85b4f8be5b48ac0d79a81
Submitter: Jenkins
Branch: stable/7.0

commit 8380734ab1877d67a5c85b4f8be5b48ac0d79a81
Author: Vladimir Kuklin <email address hidden>
Date: Wed Sep 9 14:58:05 2015 +0300

Send gratutious ARPs on ip monitor event also

Change-Id: I6d6134eee08352c1b0f0ef85dc79bfd6ee804378
Closes-bug: #1492210

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-14: Related fix proposed to fuel-docs (master)

#22

Related fix proposed to branch: master
Review: https://review.openstack.org/223120

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-14:

#23

Related fix proposed to branch: master
Review: https://review.openstack.org/223124

Veronica Krayneva (vkrayneva) on 2015-09-15

tags:

added: on-verification

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-15: Related fix merged to fuel-docs (master)

#24

Reviewed: https://review.openstack.org/223124
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=9d5194973238848d1a1ccdab310dae99109b4a47
Submitter: Jenkins
Branch: master

commit 9d5194973238848d1a1ccdab310dae99109b4a47
Author: evkonstantinov <email address hidden>
Date: Mon Sep 14 17:29:15 2015 +0300

Add virt role deployment issue to relnotes

Change-Id: I1f62e7c0490b160dc158c294dc9ebad0f2ad0760
Related-Bug:#1492210

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-15:

#25

Reviewed: https://review.openstack.org/223120
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=f4c00aa6869e3bf7f9bfbf3d5f269cf13a41f512
Submitter: Jenkins
Branch: master

commit f4c00aa6869e3bf7f9bfbf3d5f269cf13a41f512
Author: evkonstantinov <email address hidden>
Date: Mon Sep 14 17:19:28 2015 +0300

Add management VIP issue to relnotes

Change-Id: I5d1b4637f9d87694ed1a6c86aacda563d6e643e1
Related-Bug:#1492210

Revision history for this message

Veronica Krayneva (vkrayneva) wrote on 2015-09-15:

#26

Checked on 292 iso. Bug was fixed

Veronica Krayneva (vkrayneva) on 2015-09-15

tags:

removed: on-verification

Veronica Krayneva (vkrayneva) on 2015-10-05

tags:

added: on-verification

Revision history for this message

Veronica Krayneva (vkrayneva) wrote on 2015-10-06:

#27

Checked on fuel-kilo-8.0-108-2015-10-02_10-23-00.iso. Bug was fixed.

tags:

removed: on-verification

Dmitry Pyzhov (dpyzhov) on 2015-10-22

tags:

added: area-library

Dmitry Pyzhov (dpyzhov) on 2015-11-30

Changed in fuel:
milestone:	7.0 → 8.0

Olga Gusarenko (ogusarenko) on 2016-02-26

tags:

added: 8.0 release-notes-done
removed: release-notes

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

fail_error_full_cluster_reinstallation-fuel-snapshot-2015-09-04_01-33-03.tar.xz Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.