when the taskmanager times out performing some task, it just resets the task to NONE even though the instance may be busily working on the task

Bug #1529138 reported by Amrith Kumar
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
New
Undecided
Unassigned

Bug Description

Consider the case of restart ( which is where i found this).

If you issue a 'trove restart <instance>' the task manager will send down a restart and wait for 60s for the restart to complete.

If you don't finish the restart in 60s, the task manager just resets state to NONE and goes along. This marks the instance as ACTIVE in list.

2015-12-24 12:29:02.986 DEBUG trove.guestagent.api [-] Sending the call to restart the database process on the Guest. from (pid=122858) restart /opt/stack/trove/trove/guestagent/api.py:268
2015-12-24 12:29:02.986 DEBUG trove.guestagent.api [-] Calling restart with timeout 60 from (pid=122858) _call /opt/stack/trove/trove/guestagent/api.py:59
2015-12-24 12:29:02.987 DEBUG oslo_messaging._drivers.amqpdriver [-] CALL msg_id: 02ea8ae0a59949d09abc83b10a281397 exchange 'openstack' topic 'guestagent.8d823f32-65fc-4fea-9ab0-cff31ec5950f' from (pid=122858) _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
2015-12-24 12:30:02.989 ERROR trove.guestagent.api [-] Error calling restart
2015-12-24 12:30:02.989 TRACE trove.guestagent.api Traceback (most recent call last):
2015-12-24 12:30:02.989 TRACE trove.guestagent.api File "/opt/stack/trove/trove/guestagent/api.py", line 62, in _call
2015-12-24 12:30:02.989 TRACE trove.guestagent.api result = cctxt.call(self.context, method_name, **kwargs)
2015-12-24 12:30:02.989 TRACE trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2015-12-24 12:30:02.989 TRACE trove.guestagent.api retry=self.retry)
2015-12-24 12:30:02.989 TRACE trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2015-12-24 12:30:02.989 TRACE trove.guestagent.api timeout=timeout, retry=retry)
2015-12-24 12:30:02.989 TRACE trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 464, in send
2015-12-24 12:30:02.989 TRACE trove.guestagent.api retry=retry)
2015-12-24 12:30:02.989 TRACE trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 453, in _send
2015-12-24 12:30:02.989 TRACE trove.guestagent.api result = self._waiter.wait(msg_id, timeout)
2015-12-24 12:30:02.989 TRACE trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 334, in wait
2015-12-24 12:30:02.989 TRACE trove.guestagent.api message = self.waiters.get(msg_id, timeout=timeout)
2015-12-24 12:30:02.989 TRACE trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 237, in get
2015-12-24 12:30:02.989 TRACE trove.guestagent.api 'to message ID %s' % msg_id)
2015-12-24 12:30:02.989 TRACE trove.guestagent.api MessagingTimeout: Timed out waiting for a reply to message ID 02ea8ae0a59949d09abc83b10a281397
2015-12-24 12:30:02.989 TRACE trove.guestagent.api

****************************************

2015-12-24 12:30:02.990 ERROR trove.taskmanager.models [-] Failed to initiate datastore restart on instance 8d823f32-65fc-4fea-9ab0-cff31ec5950f.
2015-12-24 12:30:02.990 INFO trove.instance.models [-] Resetting task status to NONE on instance 8d823f32-65fc-4fea-9ab0-cff31ec5950f.

***************************************

2015-12-24 12:30:02.994 DEBUG trove.db.models [-] Saving DBInstance: {u'cluster_id': None, u'shard_id': None, u'deleted_at': None, u'id': u'8d823f32-65fc-4fea-9ab0-cff31ec5950f', u'datastore_version_id': u'f6c3cccf-9047-46e0-bc49-2a279bcbb139', 'errors': {}, u'hostname': None, u'server_status': None, u'task_description': 'No tasks for the instance.', u'volume_size': 3, u'type': None, u'updated': datetime.datetime(2015, 12, 24, 17, 30, 2, 994572), '_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x7feb85aa6a10>, u'deleted': 0, u'configuration_id': None, u'volume_id': u'05f3f4c3-301a-4343-95fa-a7ceca90e111', u'slave_of_id': None, u'task_start_time': None, u'name': u'h1', u'task_id': 1, u'created': datetime.datetime(2015, 12, 24, 16, 53, 25), u'tenant_id': u'26af169a81134303b915f4a6ea7f3952', u'compute_instance_id': u'5558b270-e859-4cfa-9ea1-39e31179395f', u'flavor_id': u'2'} from (pid=122858) save /opt/stack/trove/trove/db/models.py:62

Amrith Kumar (amrith)
Changed in trove:
assignee: nobody → Amrith (amrith)
Revision history for this message
Kayode Odeyemi (dreyemi) wrote :

This occurs also when a new instance is launched. This makes the instance stuck in BUILD state.

Amrith Kumar (amrith)
Changed in trove:
assignee: Amrith Kumar (amrith) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.