Fuel for OpenStack

Openstack cluster do not work after failover of primary controller

Bug #1322259 reported by Tatyanka on 2014-05-22

This bug affects 3 people

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Released	Critical	Vladimir Kuklin	Fuel for OpenStack 5.1
4.1.x	Fix Released	Critical	Registry Administrators	Fuel for OpenStack 4.1.1
5.0.x	Fix Released	Critical	Vladimir Kuklin	Fuel for OpenStack 5.0

Bug Description

{"build_id": "2014-05-21_01-10-31", "mirantis": "yes", "build_number": "214", "ostf_sha": "353f918197ec53a00127fd28b9151f248a2a2d30", "nailgun_sha": "0b6e8eabaccad2aa29519561ce7cde9df9292964", "production": "docker", "api": "1.0", "fuelmain_sha": "910f262f85e94bef08e0e9b9d6230ad890bf139e", "astute_sha": "9a0d86918724c1153b5f70bdae008dea8572fd3e", "release": "5.0", "fuellib_sha": "3d92142a5643af82596f0450e39282550a45e5db"}

Steps to Reproduce:
1. Deploy environment
3 controllers + 2 computes on nova Vlan
2. When deployment finish with succes - run ostf to be sure that all works
3. Run rally banchmark tests(create/delete isnatce) and ostf
4. While tests running - force off primary controller(in my deployment it is node-1)
5. wait untill vips and other ha services recovered
6. run ostf

Expected Result:
Openstack cluster is operational. Ostf passed. User can succesfully create/delete instance on horizon

Actual result:
Ostf failes, Instance do not created/ deleted

queues status
http://paste.openstack.org/show/81161/

rabbit cluster status:
[root@node-2 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-2' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]},
{running_nodes,['rabbit@node-3','rabbit@node-2']},
{partitions,[]}]
...done.
[root@node-2 ~]#

crm:

[root@node-2 ~]# crm_mon -1
Last updated: Thu May 22 15:47:26 2014
Last change: Thu May 22 12:40:37 2014 via cibadmin on node-3.test.domain.local
Stack: classic openais (with plugin)
Current DC: node-2.test.domain.local - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
3 Nodes configured, 3 expected votes
9 Resources configured

Online: [ node-2.test.domain.local node-3.test.domain.local ]
OFFLINE: [ node-1.test.domain.local ]

vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-2.test.domain.local
vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-3.test.domain.local
Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
openstack-heat-engine (ocf::mirantis:openstack-heat-engine): Started node-2.test.domain.local
[root@node-2 ~]#

on compute I have not see rabbit connection at all:
-leasefile-ro --domain=novalocal --no-hosts --addn-hosts=/var/lib/nova/networks/nova-br103.hosts
[root@node-4 ~]# lsof -p 21836 | grep IP
nova-comp 21836 nova 20u IPv4 89794 0t0 TCP node-4:43550->node-2:jms (ESTABLISHED)
nova-comp 21836 nova 21u IPv4 94273 0t0 TCP node-4:43597->node-2:jms (ESTABLISHED)
nova-comp 21836 nova 22u IPv4 94275 0t0 TCP node-4:43598->node-2:jms (ESTABLISHED)
nova-comp 21836 nova 23u IPv4 94287 0t0 TCP node-4:43599->node-2:jms (ESTABLISHED)
[root@node-4 ~]#

[root@node-4 ~]# lsof -p 21836 | grep 56714-05-22 15:48:54.561 21836 DEBUG nova.compute.manager [-] Didn't find any instances for network info cache update. _heal_instance_info_cache /usr/lib/python2.6/site-packages/nova/compute/manager.py:4895

also on computes a lot of errors
2014-05-22 15:48:54.561 21836 DEBUG nova.openstack.common.loopingcall [-] Dynamic looping call sleeping for 60.00 seconds _inner /usr/lib/python2.6/site-packages/nova/openstack/common/loopingcall.py:132
2014-05-22 15:49:54.561 21836 ERROR nova.servicegroup.drivers.db [-] model server went away
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db Traceback (most recent call last):
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/servicegroup/drivers/db.py", line 95, in _report_state
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db service.service_ref, state_catalog)
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/conductor/api.py", line 218, in service_update
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db return self._manager.service_update(context, service, values)
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 330, in service_update
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db service=service_p, values=values)
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db wait_for_reply=True, timeout=timeout)
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/transport.py", line 90, in _send
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db timeout=timeout)
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 409, in send
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db return self._send(target, ctxt, message, wait_for_reply, timeout)
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 400, in _send
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db result = self._waiter.wait(msg_id, timeout)
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db reply, ending = self._poll_connection(msg_id, timeout)
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db % msg_id)
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID 23c5b1c2b4e2425b8f1bf555722477bf
2014-05-22 15:49:54.561 21836 TRACE nova.servicegroup.drivers.db
3
[root@node-4 ~]#

Tags:

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2014-05-22:

fuel-snapshot-2014-05-22_15-05-44.tgz Edit (8.4 MiB, application/x-tar)

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-05-23:

This commit seems to improve recovery significantly:
https://review.openstack.org/95007

Not sure if it will make the described test steps pass (ostf might still fail of controller is lost in the middle of a test run), but it does reduce post-failover recovery time considerably.

Changed in fuel:
status:	New → Confirmed

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2014-05-23:

Do we want to keep this bug as Critical for 5.0? Tatyana, after time passed, were you able to run VMs?

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2014-05-23:

Download full text (3.9 KiB)

No I am not able create vm, it stack in Building state.
Seem sthe problem is on compute node - according it still reply with errors
2014-05-23 08:53:42.056 21836 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: Timed out waiting for a reply to message ID 2286043347a14011bb1212d48ccbc5
aa
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/openstack/common/periodic_task.py", line 182, in run_periodic_tasks
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task task(self, context)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 5446, in update_available_resource
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task compute_nodes_in_db = self._get_compute_nodes_in_db(context)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 5457, in _get_compute_nodes_in_db
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task context, self.host)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/conductor/api.py", line 186, in service_get_by_compute_host
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task result = self._manager.service_get_all_by(context, 'compute', host)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 280, in service_get_all_by
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task topic=topic, host=host, binary=binary)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task wait_for_reply=True, timeout=timeout)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/transport.py", line 90, in _send
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task timeout=timeout)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 409, in send
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task return self._send(target, ctxt, message, wait_for_reply, timeout)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 400, in _send
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task result = self._waiter.wait(msg_id, timeout)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_driver...

No I am not able create vm, it stack in Building state. 
Seem sthe problem is on compute node - according it still reply with errors
2014-05-23 08:53:42.056 21836 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: Timed out waiting for a reply to message ID 2286043347a14011bb1212d48ccbc5
aa
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/openstack/common/periodic_task.py", line 182, in run_periodic_tasks
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     task(self, context)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 5446, in update_available_resource
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     compute_nodes_in_db = self._get_compute_nodes_in_db(context)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 5457, in _get_compute_nodes_in_db
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     context, self.host)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/conductor/api.py", line 186, in service_get_by_compute_host
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     result = self._manager.service_get_all_by(context, 'compute', host)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 280, in service_get_all_by
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     topic=topic, host=host, binary=binary)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     wait_for_reply=True, timeout=timeout)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/transport.py", line 90, in _send
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     timeout=timeout)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 409, in send
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     return self._send(target, ctxt, message, wait_for_reply, timeout)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 400, in _send
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     result = self._waiter.wait(msg_id, timeout)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     reply, ending = self._poll_connection(msg_id, timeout)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task     % msg_id)
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID 2286043347a14011bb1212d48ccbc5aa
2014-05-23 08:53:42.056 21836 TRACE nova.openstack.common.periodic_task

And still lsof show that nova from compute do not listen rabbit port at all

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-05-23:

Well, rabbit connections are in place (jms is 5673 port), just recheck it with lsof -P -p instead

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-05-23:

Tanya, please elaborate was the result from the #4 comment received with the https://review.openstack.org/95007 applied?

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2014-05-23:

As I can see https://review.openstack.org/95007 was merged yestarday evening, so this environment without this patch. I try to reproduce it on 5.0-19 iso - and back with updates :)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-05-23:

Ok, thank you, waiting for update then (incomplete)

Changed in fuel:
status:	Confirmed → Incomplete

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2014-05-23:

{"build_id": "2014-05-23_03-53-39", "mirantis": "yes", "build_number": "19", "ostf_sha": "5c479f04c35127576d35526650ec83b104f9a33d", "nailgun_sha": "bd09f89ef56176f64ad5decd4128933c96cb20f4", "production": "docker", "api": "1.0", "fuelmain_sha": "db2d153e62cb2b3034d33359d7e3db9d4742c811", "astute_sha": "9a0d86918724c1153b5f70bdae008dea8572fd3e", "release": "5.0", "fuellib_sha": "2ed4fbe1e04b85e83f1010ca23be7f5da34bd492"}
The same situation
Instance stack in building and deletig state. On compute nodes error about message timeouts
Also -P helps to see current conncetions with rabbit - thaks)
[root@node-4 log]# lsof -P -p 22985 | grep IPv4
nova-comp 22985 nova 20u IPv4 74323 0t0 TCP node-4:50256->node-2:5673 (ESTABLISHED)
nova-comp 22985 nova 21u IPv4 74329 0t0 TCP node-4:50258->node-2:5673 (ESTABLISHED)
nova-comp 22985 nova 22u IPv4 74596 0t0 TCP node-4:50261->node-2:5673 (ESTABLISHED)
nova-comp 22985 nova 23u IPv4 74600 0t0 TCP node-4:50263->node-2:5673 (ESTABLISHED)

So issue is reprodusable on 19 iso

Changed in fuel:
status:	Incomplete → Confirmed

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-05-23:

#10

Reproduced. The fix is to increase kombu_reconnect_delay from 1 to 5 seconds. Looks like 1 second is not enough for envs with poor performance. After updating delay to the 5 secons, all issues gone and instances are able to spawn

Changed in fuel:
status:	Confirmed → Triaged

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-05-23:

#11

related bug https://bugs.launchpad.net/fuel/+bug/1289200

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-05-23:

#12

we need to port kombu_reconnect_delay to all subprojects which do not use oslo.messaging and set it to 5.0 explicitly

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-05-23:

#13

Neutron port: http://gerrit.mirantis.com/15857

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-05-23:

#14

Heat and ceilometer to come

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-23: Fix proposed to fuel-library (master)

#15

Fix proposed to branch: master
Review: https://review.openstack.org/95205

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Vladimir Kuklin (vkuklin)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-23: Fix proposed to fuel-library (stable/5.0)

#16

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/95209

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-23: Fix proposed to fuel-library (master)

#17

Fix proposed to branch: master
Review: https://review.openstack.org/95210

OpenStack Infra (hudson-openstack) on 2014-05-23

Changed in fuel:
assignee:	Vladimir Kuklin (vkuklin) → Dmitry Borodaenko (dborodaenko)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-23: Fix merged to fuel-library (master)

#18

Reviewed: https://review.openstack.org/95205
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=89d9a27331578ee0c2f4f6cd63ec695a5516ac0b
Submitter: Jenkins
Branch: master

commit 89d9a27331578ee0c2f4f6cd63ec695a5516ac0b
Author: Vladimir Kuklin <email address hidden>
Date: Fri May 23 20:16:46 2014 +0400

Set kombu_reconnect_delay to 5.0

Set delay to 5.0 to recover channel errors on highly loaded environments.

Change-Id: Ibec002828b785282221fa6d2827163a2deb0e627
Partial-Bug: 1322259

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-23: Fix merged to fuel-library (stable/5.0)

#19

Reviewed: https://review.openstack.org/95209
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=b9985e42159187853edec82c406fdbc38dc5a6d0
Submitter: Jenkins
Branch: stable/5.0

commit b9985e42159187853edec82c406fdbc38dc5a6d0
Author: Vladimir Kuklin <email address hidden>
Date: Fri May 23 20:16:46 2014 +0400

Set kombu_reconnect_delay to 5.0

Set delay to 5.0 to recover channel errors on highly loaded environments.

Change-Id: Ibec002828b785282221fa6d2827163a2deb0e627
Partial-Bug: 1322259

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2014-05-23:

#20

Raised this issue to Critical priority. Reminding that we expect only Critical, release blocking issues to be fixed with patch into stable/5.0 (after Hard Code Freeze).

OpenStack Infra (hudson-openstack) on 2014-05-25

Changed in fuel:
assignee:	Dmitry Borodaenko (dborodaenko) → Vladimir Kuklin (vkuklin)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-05-26:

#21

Action items: validate kombu_reconnect_delay for Neutron, Heat, Ceilometer, once ported for MOS packages

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-26: Fix proposed to fuel-library (stable/5.0)

#22

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/95477

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-05-26:

#23

heat backport: https://gerrit.mirantis.com/#/c/15872/

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-05-26:

#24

ceilometer backport: http://gerrit.mirantis.com/15874

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-26: Fix merged to fuel-library (master)

#25

Reviewed: https://review.openstack.org/95210
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=1fd0732e53b4805c18dbca693ab31914a6a73c47
Submitter: Jenkins
Branch: master

commit 1fd0732e53b4805c18dbca693ab31914a6a73c47
Author: Vladimir Kuklin <email address hidden>
Date: Fri May 23 20:23:06 2014 +0400

Set kombu reconnect delay to 5 seconds

Change-Id: I0ad9bcfd1f35e5d557a147a2cb0d3b2f2d79c846
Partial-Bug: #1322259

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-26: Fix merged to fuel-library (stable/5.0)

#26

Reviewed: https://review.openstack.org/95477
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=f2f2f4d0b0dff2078313507a7508de5ebdee984f
Submitter: Jenkins
Branch: stable/5.0

commit f2f2f4d0b0dff2078313507a7508de5ebdee984f
Author: Vladimir Kuklin <email address hidden>
Date: Fri May 23 20:23:06 2014 +0400

Set kombu reconnect delay to 5 seconds

Change-Id: I0ad9bcfd1f35e5d557a147a2cb0d3b2f2d79c846
Partial-Bug: #1322259

Revision history for this message

Egor Kotko (ykotko) wrote on 2014-05-26:

#27

Have the same on:
{"build_id": "2014-05-25_23-01-31", "mirantis": "yes", "build_number": "22", "ostf_sha": "1f020d69acbf50be00c12c29564f65440971bafe", "nailgun_sha": "bd09f89ef56176f64ad5decd4128933c96cb20f4", "production": "docker", "api": "1.0", "fuelmain_sha": "db2d153e62cb2b3034d33359d7e3db9d4742c811", "astute_sha": "a7eac46348dc77fc2723c6fcc3dbc66cc1a83152", "release": "5.0", "fuellib_sha": "b9985e42159187853edec82c406fdbc38dc5a6d0"}

Steps to reproduce:
1. Env configuration: Centos 3 Controller,1 Compute Neutron Vlan
2. Determine primary controller
3. Destroy virtual machine with primary controller

Revision history for this message

Egor Kotko (ykotko) wrote on 2014-05-26:

#28

logs.tar.gz Edit (5.6 MiB, application/x-tar)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-05-26:

#29

Please clarify which OSt component became broken. For now, we have patch kombu_reconnect_delay only for nova. What exactly was the issue for given in the #27 case? Was nova-compute nodes marked as down? Or any other issues?

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-05-26:

#30

build number did not contain all the required fixes. also, description does not contain, which components did not work. closing until there is clearer description

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-05-30:

#31

We need to add kombu_reconnect_delay parameter to all OpenStack components connecting to RabbitMQ in 4.1/Havana:
oslo.messaging
cinder (uses oslo.messaging in Havana)
nova
neutron
glance
heat
ceilometer

cinder, nova, and neutron are most critical.

Revision history for this message

Andriy Kurilin (andreykurilin) wrote on 2014-06-02:

#32

For openstack-ci/fuel-4.1.1/2013.2.3:
- cinder: https://gerrit.mirantis.com/#/c/16134/
- nova: https://gerrit.mirantis.com/#/c/16135/
- neutron: https://gerrit.mirantis.com/#/c/16137/
- heat: https://gerrit.mirantis.com/#/c/16139/
- ceilometer: https://gerrit.mirantis.com/#/c/16140/

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-06-02:

#33

we also need glance fix

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-06-02:

#34

For glance openstack-ci/fuel-4.1.1/2013.2.3: https://gerrit.mirantis.com/16157

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-03: Fix proposed to fuel-library (stable/4.1)

#35

Fix proposed to branch: stable/4.1
Review: https://review.openstack.org/97399

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-03:

#36

Fix proposed to branch: stable/4.1
Review: https://review.openstack.org/97402

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-03: Fix merged to fuel-library (stable/4.1)

#37

Reviewed: https://review.openstack.org/97402
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=0cc0eb42906e45fb056c5d16f394824c76af6b0b
Submitter: Jenkins
Branch: stable/4.1

commit 0cc0eb42906e45fb056c5d16f394824c76af6b0b
Author: Vladimir Kuklin <email address hidden>
Date: Fri May 23 20:16:46 2014 +0400

Set kombu_reconnect_delay to 5.0

Set delay to 5.0 to recover channel errors on highly loaded environments.

depends on:

    https://gerrit.mirantis.com/#/c/16134/
    https://gerrit.mirantis.com/#/c/16135/
    https://gerrit.mirantis.com/#/c/16157/

but can be safely merged (option will be ignored)

Change-Id: Ibec002828b785282221fa6d2827163a2deb0e627
Partial-Bug: 1322259

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-03:

#38

Reviewed: https://review.openstack.org/97399
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=f8fdf936dcf27bd127dc4b833a65957e1063da8b
Submitter: Jenkins
Branch: stable/4.1

commit f8fdf936dcf27bd127dc4b833a65957e1063da8b
Author: Vladimir Kuklin <email address hidden>
Date: Fri May 23 20:23:06 2014 +0400

Set kombu reconnect delay to 5 seconds

depends on https://gerrit.mirantis.com/#/c/16137/
but can be safely merged (the option will be ignored)

    Change-Id: I0ad9bcfd1f35e5d557a147a2cb0d3b2f2d79c846
    Partial-Bug: #1322259
    (cherry picked from commit 1fd0732e53b4805c18dbca693ab31914a6a73c47)

Bogdan Dobrelya (bogdando) on 2014-07-04

tags:

added: to-be-covered-by-tests

Dmitry Pyzhov (dpyzhov) on 2014-08-13

no longer affects:	fuel/5.1.x
Changed in fuel:
milestone:	5.0 → 5.1

Tom Fifield (fifieldt) on 2015-06-11

Changed in fuel:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1322631

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.