[library] Deployment Failed with many errors

Bug #1335911 reported by Timur Nurlygayanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Critical
Fuel Library (Deprecated)

Bug Description

Note: looks like some problems with Gallera HA cluster.

Environment:
KVM, CentOS, HA, Neutron with VLANs, 2CEPH nodes.
Fuel 5.0.1, test ISO #77:
{"build_id": "2014-06-30_03-01-14", "mirantis": "yes", "build_number": "77", "ostf_sha": "d0fe60e0eba61685008b86d101f459fc2d3bb654", "nailgun_sha": "dd7f32ab80c023a4afda70b521dd5391e5e464fd", "production": "docker", "api": "1.0", "fuelmain_sha": "d5cd1439a382b335ec9f7ac69d2ace78c9c68120", "astute_sha": "17a1dc816d6d56dda64c2db21b94581472cabefb", "release": "5.0.1", "fuellib_sha": "7f7a7b33a5711b6146e5a11811c68e91ca4761af"}

Steps To Reproduce:
1. Create environment with 3 controllers (HA), 1 compute, and 2 CEPH OSD nodes (CentOS, Neutron with VLANs).
2. Deploy.

Observed Result:
Deployment has failed. Error occurred while running method 'deploy'. Inspect Orchestrator logs for the details.
(please see attached snapshot)
in logs we can see many errors with Neutron, like:
______________________________
neutron.openstack.common.rpc.amqp [req-48ba5d28-a646-4c65-9fb1-fb5c96970095 None] Exception during message handling
______________________________
neutron.openstack.common.rpc.common [req-c2d35080-6c13-4cdc-a023-067be0bfa255 None] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/amqp.py", line 462, in _process_data\n **args)\n', ' File "/usr/lib/python2.6/site-packages/neutron/common/rpc.py", line 45, in dispatch\n neutron_ctxt, version, method, namespace, **kwargs)\n', ' File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch\n result = getattr(proxyobj, method)(ctxt, **kwargs)\n', ' File "/usr/lib/python2.6/site-packages/neutron/db/agents_db.py", line 219, in report_state\n self.plugin.create_or_update_agent(context, agent_state)\n', ' File "/usr/lib/python2.6/site-packages/neutron/db/agents_db.py", line 180, in create_or_update_agent\n return self._create_or_update_agent(context, agent)\n', ' File "/usr/lib/python2.6/site-packages/neutron/db/agents_db.py", line 159, in _create_or_update_agent\n context, agent[\'agent_type\'], agent[\'host\'])\n', ' File "/usr/lib/python2.6/site-packages/neutron/db/agents_db.py", line 136, in _get_agent_by_type_and_host\n Agent.host == host).one()\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2184, in one\n ret = list(self)\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2227, in __iter__\n return self._execute_and_instances(context)\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2240, in _execute_and_instances\n close_with_result=True)\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2231, in _connection_from_session\n **kw)\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 777, in connection\n close_with_result=close_with_result)\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 781, in _connection_for_bind\n return self.transaction._connection_for_bind(engine)\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 306, in _connection_for_bind\n conn = bind.contextual_connect()\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 2489, in contextual_connect\n self.pool.connect(),\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 236, in connect\n return _ConnectionFairy(self).checkout()\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 474, in checkout\n self)\n', ' File "/usr/lib64/python2.6/site-packages/sqlalchemy/event.py", line 377, in __call__\n fn(*args, **kw)\n', ' File "/usr/lib/python2.6/site-packages/neutron/openstack/common/db/sqlalchemy/session.py", line 684, in _ping_listener\n cursor.execute(ping_sql)\n', ' File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 205, in execute\n self.errorhandler(self, exc, value)\n', ' File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler\n raise errorclass, errorvalue\n', "OperationalError: (1047, 'Unknown command')\n"]
______________________________

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote : Re: Deployment Failed with many Neutron errors: Unknown command error

^^^ Screenshot of Fuel Web

tags: added: neutron
summary: - Deployment Failed with many Neutron errors
+ Deployment Failed with many Neutron errors: Unknown command error
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

^^^ Snapshot

description: updated
description: updated
Revision history for this message
Aleksandr Shaposhnikov (alashai8) wrote :

Looking on this environment right now. Guys, could you please provide login/password from fuel node ?

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

*access provided in skype.

description: updated
summary: - Deployment Failed with many Neutron errors: Unknown command error
+ Deployment Failed with many errors
Revision history for this message
Aleksandr Shaposhnikov (alashai8) wrote : Re: Deployment Failed with many errors

I looked at environment and it clearly seen that the problem not in neutron, at least there is another problem exists that prevent cluster from deploying it correctly. At least 'crm status' shows that node-2 is in non-working state. Also node-2 isn't accessible from nodes, moreover it even not accessible from fuel node.

Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
tags: removed: neutron
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

This bug shows there are clearly connectivity problems, especially with connectivity between galera and amqp cluster members. Check network connectivity, try to reproduce on the other environment.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Vladimir, during the deployment one controller going to shutdown. Looks like this is the reason of this fail, but why this controller going to shutdown?

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

The side note:
Galera cluster consists of 3 nodes, 2 of which were online, and the failure of 3rd one brought the whole thing down.
What kind of HA is that?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Yes, there is a known issue with Galera cluster reassemble which is being addressed by https://review.openstack.org/#/c/95764/

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

HA is what happens after the cluster is deployed. We are not considering cluster to be stable in case of intermittent failure during cluster configuration.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Re-targeting to 5.1. If this issue is confirmed (so far I see from the comments above that there is no agreement on what is the problem and how to reproduce it), please propose to 5.0.x release series separately, so that fix status can be tracked independently in both release series.

See: http://lists.openstack.org/pipermail/openstack-dev/2014-June/039032.html

Changed in fuel:
milestone: 5.0.1 → 5.1
Dmitry Ilyin (idv1985)
summary: - Deployment Failed with many errors
+ [library] Deployment Failed with many errors
Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.