Management VIP flaps on CentOS IBP installation with 1 physical controller

Bug #1452715 reported by Amichay Polishuk
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Stanislav Makar

Bug Description

Fuel Version :

{"build_id": "2015-04-29_07-55-19", "build_number": "361", "release_versions": {"2014.2.2-6.1": {"VERSION": {"build_id": "2015-04-29_07-55-19", "build_number": "361", "api": "1.0", "fuel-library_sha": "0e5b82d24853304befb22145ac4aaf3545d295e1", "nailgun_sha": "e660b1c09d7d4d07bdd48d424ce9aed3b6facd6e", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "8cd6cf575d3c101dee1032abb6877dfa8487e077", "astute_sha": "04ebab96d57b0e8acbf2d7f3ba05e4fbf31b741e", "fuel-ostf_sha": "b38602c841deaa03ddffc95c02f319360462cbe3", "release": "6.1", "fuelmain_sha": "ee112acfdd0f9017ef40be53e8e51bb5c429e97c"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "0e5b82d24853304befb22145ac4aaf3545d295e1", "nailgun_sha": "e660b1c09d7d4d07bdd48d424ce9aed3b6facd6e", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "8cd6cf575d3c101dee1032abb6877dfa8487e077", "astute_sha": "04ebab96d57b0e8acbf2d7f3ba05e4fbf31b741e", "fuel-ostf_sha": "b38602c841deaa03ddffc95c02f319360462cbe3", "release": "6.1", "fuelmain_sha": "ee112acfdd0f9017ef40be53e8e51bb5c429e97c"}

Diagnostic Snapshot :

https://drive.google.com/file/d/0BzuAt0EZGLAMLTAwTVU2MHhzQm8/view?usp=sharing

VIP is unreachable. There are multiple errors in all OpenStack services logs, e.g.:

http://paste.openstack.org/show/223743/
http://paste.openstack.org/show/223744/
http://paste.openstack.org/show/223745/
http://paste.openstack.org/show/223753/

Changed in fuel:
status: New → Incomplete
status: Incomplete → Triaged
Changed in fuel:
importance: Undecided → Medium
assignee: nobody → MOS Nova (mos-nova)
Noam Angel (noama)
tags: added: nova
Changed in fuel:
milestone: none → 7.0
Noam Angel (noama)
summary: - Create an Instance Failed after No SRIOV (OS=Centos) Deployment
+ Deploy CentOS with mellanox ofed failed to boot VM's with PV
Revision history for this message
Aviram Bar-Haim (aviramb) wrote : Re: Deploy CentOS with mellanox ofed failed to boot VM's with PV

It is solved after stoping IP tables in the controller node.
Moreover, controller is not accessible by virtual IP until stiopping IPTABLES.
Attached IP tables status.
There is a rule of "reject-with icmp-host-prohibited" that I don't see on working environments.

Revision history for this message
Aviram Bar-Haim (aviramb) wrote :

It was reproduced with TestVM too. same ISO as in the description.

summary: - Deploy CentOS with mellanox ofed failed to boot VM's with PV
+ VMs start is failing after CentOS IBP installation
tags: added: customer-found
Revision history for this message
Aviram Bar-Haim (aviramb) wrote : Re: VMs start is failing after CentOS IBP installation

This behavior reproduced on ISO 395 too.. I suspect that https://bugs.launchpad.net/fuel/+bug/1454364 is a related bug and that its fix is just a workaround for iptables problem in CentOS IBP..

Revision history for this message
Evgeniya Shumakher (eshumakher) wrote :

MOS-Nova team, this is a blocker for MLNX.Please consider to fix this bug in 6.1.

Changed in fuel:
importance: Medium → High
Changed in fuel:
milestone: 7.0 → 6.1
tags: removed: nova
Changed in fuel:
status: Triaged → Confirmed
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :
Changed in fuel:
assignee: MOS Nova (mos-nova) → nobody
assignee: nobody → Fuel Library Team (fuel-library)
summary: - VMs start is failing after CentOS IBP installation
+ VIP is unreachable after CentOS IBP installation
description: updated
Revision history for this message
Vladimir Kuklin (vkuklin) wrote : Re: VIP is unreachable after CentOS IBP installation

Folks, which installation are you using? Are you using any additional plugins/workarounds? We do not see this behaviour with vanulla installation.

So, please, provide more info:

1) are you using any plugins or applying any customizations?
2) which VIP (public or management) is unreachable and by which protocol (ICMP, TCP ) and from which locations (compute nodes,master node, controller nodes)

Changed in fuel:
status: Confirmed → Incomplete
assignee: Fuel Library Team (fuel-library) → Aviram Bar-Haim (aviramb)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Vova, according to logs it's management VIP (192.168.0.2), it's both TCP and ICMP

Revision history for this message
Mike Scherbakov (mihgen) wrote :

I've asked Sergey Vasilenko to analyze what happens here.

Revision history for this message
Aviram Bar-Haim (aviramb) wrote :

Vladimir, it is an installation with mellanox plugin, but we face it without enabling any of our features or change any of openstack/deployment configurations..
We failed to ping the dashboard vip after successful deployment, from outside cluster or from some of its nodes (including controllers) until stopping iptables.

Changed in fuel:
status: Incomplete → Confirmed
Changed in fuel:
assignee: Aviram Bar-Haim (aviramb) → Fuel Library Team (fuel-library)
Revision history for this message
Igor Zinovik (izinovik) wrote :

Amichay, according to astute.yaml on node-39 which is primary-controller in your case
you are using Infiniband interfaces in your setup:
fuel-snapshot-2015-05-07_02-34-39/node-39.domain.tld/etc/astute.yaml
...
  interfaces:
    eth5:
      vendor_specific:
        driver: eth_ipoib
        bus_info: ib0
    eth4:
      vendor_specific:
        driver: mlx4_en
        bus_info: '0000:05:00.0'
    eth3:
      vendor_specific:
        driver: tg3
        bus_info: '0000:02:00.1'
    eth2:
      vendor_specific:
        driver: tg3
        bus_info: '0000:02:00.0'
    eth1:
      vendor_specific:
        driver: tg3
        bus_info: '0000:01:00.1'
    eth0:
      vendor_specific:
        driver: tg3
        bus_info: '0000:01:00.0'

Since you are using IB problem might be related to driver.

Please verify same setup on Ubuntu.

If Ubuntu cluster will be working fine in same setup with IB it will show that
problem exist with drivers, if not it is problem in Fuel.

I'm moving this bug to Incomplete, until we get results on Ubuntu cluster.

Changed in fuel:
status: Confirmed → Incomplete
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Fuel Partner Integration Team (fuel-partner)
summary: - VIP is unreachable after CentOS IBP installation
+ [Infiniband] VIP is unreachable after CentOS IBP installation
Revision history for this message
Igor Zinovik (izinovik) wrote : Re: [Infiniband] VIP is unreachable after CentOS IBP installation

In diagnostic snapshot I found flapping interface:

./node-39.domain.tld/var/log/daemon.log:<27>May 7 01:46:40 node-39 ns_IPaddr2(vip__management)[46800]: ERROR: Device "br-mgmt-hapr" does not exist.
./node-39.domain.tld/var/log/daemon.log:<30>May 7 01:46:43 node-39 ntpd[38585]: Listen normally on 18 br-mgmt-hapr fe80::a480:5aff:fed6:e312 UDP 123
./node-39.domain.tld/var/log/daemon.log:<30>May 7 01:48:57 node-39 ns_IPaddr2(vip__management)[11178]: INFO: 25: br-mgmt-hapr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast sta
./node-39.domain.tld/var/log/daemon.log:<30>May 7 01:48:59 node-39 ntpd[38585]: Deleting interface #18 br-mgmt-hapr, fe80::a480:5aff:fed6:e312#123, interface stats: received=0, sent=0, dropp
./node-39.domain.tld/var/log/daemon.log:<27>May 7 01:49:05 node-39 ns_IPaddr2(vip__management)[11511]: ERROR: Device "br-mgmt-hapr" does not exist.
./node-39.domain.tld/var/log/daemon.log:<30>May 7 01:49:08 node-39 ntpd[38585]: Listen normally on 27 br-mgmt-hapr fe80::78bd:1aff:fe77:9568 UDP 123
./node-39.domain.tld/var/log/syslog:<27>May 7 01:46:40 node-39 ns_IPaddr2(vip__management)[46800]: ERROR: Device "br-mgmt-hapr" does not exist.
./node-39.domain.tld/var/log/syslog:<27>May 7 01:49:05 node-39 ns_IPaddr2(vip__management)[11511]: ERROR: Device "br-mgmt-hapr" does not exist.

Which might explain why connections do not establish.

Interface constantly goes up and down.

Igor Zinovik (izinovik)
summary: - [Infiniband] VIP is unreachable after CentOS IBP installation
+ Management VIP flaps on CentOS IBP installation with 1 controller
summary: - Management VIP flaps on CentOS IBP installation with 1 controller
+ Management VIP flaps on CentOS IBP installation with 1 physical
+ controller
Changed in fuel:
assignee: Fuel Partner Integration Team (fuel-partner) → Amichay Polishuk (amichayp)
Stanislav Makar (smakar)
Changed in fuel:
assignee: Amichay Polishuk (amichayp) → Stanislav Makar (smakar)
Revision history for this message
Stanislav Makar (smakar) wrote :

The root cause of it is that monitor action for vip__management is failed due to
 ping ipaddress of bridge br-ex from namespace haproxy doesnt work

still debugging
more detail later

Stanislav Makar (smakar)
Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Stanislav Makar (smakar) wrote :

disabled ping check in ocf script ns_IPaddr2
vip__management is up
pinged (ip address of bridge br-ex from namespace haproxy) manually and found out that there are packets loss
meanwhile run tcpdump in parallel - traffic goes

Will try to carry on debugging tomorrow

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Stas, vip__management is pinging br-ex IP address looks very weird

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Folks, the original question was whether this behaviour is reproducible on Ubuntu or not. So I am moving this bug back to incomplete state.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Please tell us if Ubuntu case has the same issue or not.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Stanislav Makar (smakar) wrote :

Stas, vip__management is pinging br-ex IP address looks very weird - my mistake :( , overworked

I prepared visual description of problem, see attachments

Revision history for this message
Stanislav Makar (smakar) wrote :
Igor Zinovik (izinovik)
Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Stanislav Makar (smakar) wrote :

After careful environment analysis we have found an IP address conflict, it was an infrastructure configuration error. :(

It will be fine if we implement ip conflict checking during generating of diagnostic snapshot
It will help us to investigate such kind of problem faster on customer side

Here is one good approach we can use to do it
 http://www.unixmen.com/find-ip-conflicts-linux/

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Aviram Bar-Haim (aviramb) wrote :

Igor, Stas - thanks for the effort!
The original bug was on VIP access problem and solved by stopping IPtables, so maybe there were a few bugs here and we left with the infra configuration bug eventually. We'll recheck all clusters in our lab ASAP to verify this indeed solves all VIP problems we faced.
Thanks again!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.