Fuel for OpenStack

ha_ceph_neutron_rabbit_master_destroy test failed after controller destroy with Can not ping instance by floating ip 10.109.1.129

Bug #1516631 reported by Andrey Sledzinskiy on 2015-11-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Fuel QA Team	Fuel for OpenStack 8.0

Bug Description

Next test is failing on different stages of pinging instance by floating IP - https://github.com/openstack/fuel-qa/blob/master/fuelweb_test/tests/tests_strength/test_failover_base.py#L673

Steps to reproduce:
1. Create next cluster - Neutron Vlan, ceph for volumes and images, 1 controller+ceph, 2 controllers, 1 compute, 1 compute+ceh
2. Deploy cluster
3. Run OSTF - everything is working
4. Create instance and assign floating ip to it
5. Get controller with rabbit master
crm resource status master_p_rabbitmq-server
6. Shutdown this controller
7. Wait 15 minutes
8. Try to ping floating IP of instance

Actual result - instance isn't pingable due to neutron agents unable to connect to rabbit

See original description

Tags:

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2015-11-16:

fail_error_ha_ceph_neutron_rabbit_master_destroy_diagnostic-logs_2015_11_16__03_07_02.tgz Edit (19.0 MiB, application/x-tar)

Nastya Urlapova (aurlapova) on 2015-11-16

Changed in fuel:
status:	New → Confirmed

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2015-11-18:

Reproduced locally:
after shutdown node-1 only controller is shown up in pacemaker:
Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
Slaves: [ node-5.test.domain.local ]

rabbit on node-5 isn't alive:
root@node-5:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-5' ...
Error: unable to connect to node 'rabbit@node-5': nodedown

description:

updated

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2015-11-18:

https://drive.google.com/file/d/0B9Qf0veURSnybmZwb1RaUDM2anc/view?usp=sharing
logs

Changed in fuel:
assignee:	Fuel QA Team (fuel-qa) → MOS Oslo (mos-oslo)

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2015-11-18:

env is availble to be investigated

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2015-11-20:

Some strange thing happens in Pacemaker: according to 'pcs status', both node-2 and node-5 are online and in the cluster: http://paste.openstack.org/show/479577/

But 'pcs resource' does not show RabbitMQ status for node-2. Only node-5 is listed as slave: http://paste.openstack.org/show/479576/

At the same time lrmd.log for node-2 shows that LRMD daemon calls 'monitor' operation and it returns OCF_ERR_GENERIC. But pacemaker just ignores that. Also, lrmd.log for node-5 shows that OCF script constantly tries to join RabbitMQ on node-5 to the one on node-2, but fails since RabbitMQ on node-2 is stuck.

Still, the main problem here is that Pacemaker does not act on node-2, though OCF script returns OCF_ERR_GENERIC meaning there are problems here.

Revision history for this message

Alexey Lebedeff (alebedev-a) wrote on 2015-11-20:

I have looked only into the problems of rabbitmq itself, there were 2 bugs there:
- One is already fixed upstream - https://github.com/rabbitmq/rabbitmq-common/pull/18
- Another is described at https://github.com/rabbitmq/rabbitmq-server/issues/349 , I'll fix it in a nearest future.

Nastya Urlapova (aurlapova) on 2015-11-24

tags:

added: swarm-blocker

Dmitry Pyzhov (dpyzhov) on 2015-11-25

tags:

added: area-mos
removed: area-qa

Revision history for this message

Nastya Urlapova (aurlapova) wrote on 2015-12-21:

Moved to High, because we have to fix it in 8.0 release.

Changed in fuel:
importance:	Medium → High

Revision history for this message

Sergey Shevorakov (sshevorakov) wrote on 2015-12-21:

Tag swarm-blocker is set due to this bug fails 3 test cases (1% of all others).

Revision history for this message

Alexey Lebedeff (alebedev-a) wrote on 2015-12-22:

I believe that this issue should be fixed with https://review.fuel-infra.org/#/c/14586/
Is it still reproduces with this fix applied?

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2015-12-22:

#10

QA team, per Alexey's comment above, could you please check if the issue is still reproducible and if yes, provide us new repro?

Changed in fuel:
assignee:	MOS Oslo (mos-oslo) → Fuel QA Team (fuel-qa)

Revision history for this message

Vladimir Khlyunev (vkhlyunev) wrote on 2015-12-31:

#11

https://product-ci.infra.mirantis.net/view/8.0_swarm/job/8.0.system_test.ubuntu.ha_destructive_ceph_neutron/ passed 2 last times, looks like the fix is fine - but need a deeper verification.

Changed in fuel:
status:	Confirmed → Fix Committed

ElenaRossokhina (esolomina) on 2016-01-12

tags:

added: on-verification

Revision history for this message

ElenaRossokhina (esolomina) wrote on 2016-01-13:

#12

Verified (iso#427)
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "427"
  build_id: "427"
  fuel-nailgun_sha: "9ebbaa0473effafa5adee40270da96acf9c7d58a"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "df16d41cd7a9445cf82ad9fd8f0d53824711fcd8"
  fuel-nailgun-agent_sha: "92ebd5ade6fab60897761bfa084aefc320bff246"
  astute_sha: "c7ca63a49216744e0bfdfff5cb527556aad2e2a5"
  fuel-library_sha: "fae42170a54b98d8e8c8db99b0fbb312633c693c"
  fuel-ostf_sha: "214e794835acc7aa0c1c5de936e93696a90bb57a"
  fuel-mirror_sha: "b62f3cce5321fd570c6589bc2684eab994c3f3f2"
  fuelmenu_sha: "85de57080a18fda18e5325f06eaf654b1b931592"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "e8e36cff332644576d7853c80b8a53d5b955420a"

tags:	removed: on-verification
Changed in fuel:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

fail_error_ha_ceph_neutron_rabbit_master_destroy_diagnostic-logs_2015_11_16__03_07_02.tgz Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.