database failed to start

Bug #1714433 reported by tianhui
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
Incomplete
Medium
tianhui

Bug Description

when I create a mongodb cluster,there is a error in the log, and then the instances state become error

2017-09-01 09:59:49.819 2134 DEBUG trove.conductor.manager [-] [Instance ed53138d-3d1d-4301-813e-43dd4968c4a7] Rec'd message is younger than last seen. Updating. _message_too_old /usr/lib/python2.7/site-packages/trove/conductor/manager.py:76
2017-09-01 10:00:18.375 788 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 4765f9b716b14973ab92efe771049a9e __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:302
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api [-] Error calling add_config_servers
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api Traceback (most recent call last):
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/trove/guestagent/api.py", line 63, in _call
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api result = cctxt.call(self.context, method_name, **kwargs)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 157, in call
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api retry=self.retry)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api timeout=timeout, retry=retry)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api retry=retry)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 461, in _send
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api raise result
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api RuntimeError: Database failed to start.
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api Traceback (most recent call last):
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api incoming.message))
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api return self._do_dispatch(endpoint, method, ctxt, args)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api result = func(ctxt, **new_args)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 117, in wrapper
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api return f(*args, **kwargs)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/experimental/mongodb/manager.py", line 225, in add_config_servers
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api self.app.add_config_servers(config_servers)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/experimental/mongodb/service.py", line 263, in add_config_servers
                                                                                                                                                                                                         2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api self.start_db(True)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/experimental/mongodb/service.py", line 94, in start_db
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api enable_on_boot=True, update_db=update_db)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/service.py", line 266, in start_db_service
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api self.wait_for_database_service_start(timeout, update_db=update_db)
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/service.py", line 286, in wait_for_database_service_start
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api raise RuntimeError(_("Database failed to start."))
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api RuntimeError: Database failed to start.
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.376 788 ERROR trove.guestagent.api
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager [-] error adding config servers
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager Traceback (most recent call last):
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/lib/python2.7/site-packages/trove/common/strategies/cluster/experimental/mongodb/taskmanager.py", line 363, in _add_query_routers
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager guest.add_config_servers(config_server_ips)
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/lib/python2.7/site-packages/trove/common/strategies/cluster/experimental/mongodb/guestagent.py", line 59, in add_config_servers
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager self.version_cap, config_servers=config_servers)
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/lib/python2.7/site-packages/trove/guestagent/api.py", line 72, in _call
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager raise exception.GuestError(original_message=str(e))
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager GuestError: An error occurred communicating with the guest: Database failed to start.
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager Traceback (most recent call last):
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager incoming.message))
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager return self._do_dispatch(endpoint, method, ctxt, args)
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager result = func(ctxt, **new_args)
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 117, in wrapper
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager return f(*args, **kwargs)
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/experimental/mongodb/manager.py", line 225, in add_config_servers
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager self.app.add_config_servers(config_servers)
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/experimental/mongodb/service.py", line 263, in add_config_servers
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager self.start_db(True)
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/experimental/mongodb/service.py", line 94, in start_db
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager enable_on_boot=True, update_db=update_db)
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/service.py", line 266, in start_db_service
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager self.wait_for_database_service_start(timeout, update_db=update_db)
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/service.py", line 286, in wait_for_database_service_start
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager raise RuntimeError(_("Database failed to start."))
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager RuntimeError: Database failed to start.
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager .
2017-09-01 10:00:18.378 788 ERROR trove.common.strategies.cluster.experimental.mongodb.taskmanager

Revision history for this message
tianhui (tianhui) wrote :
Download full text (7.4 KiB)

I log in the instance,there is also a error in the log:
2017-09-01 02:00:18.175 935 DEBUG trove.guestagent.datastore.experimental.mongodb.service [-] Disconnecting from MongoDB. __exit__ /usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/experimental/mongodb/service.py:823
2017-09-01 02:00:18.176 935 ERROR trove.guestagent.datastore.service [-] Timeout while waiting for database status to change.Expected state running, current state is shutdown
2017-09-01 02:00:18.177 935 INFO trove.guestagent.datastore.service [-] Service status did not change to running within the given timeout: 180s
2017-09-01 02:00:18.177 935 DEBUG trove.guestagent.datastore.service [-] Attempting to cleanup stalled services. _wait_for_database_service_status /usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/service.py:343
2017-09-01 02:00:18.178 935 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): ps xaco pid,cmd | awk '/mongo(d|db|s)/ {print $1}' execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:344
2017-09-01 02:00:18.244 935 DEBUG oslo_concurrency.processutils [-] CMD "ps xaco pid,cmd | awk '/mongo(d|db|s)/ {print $1}'" returned: 0 in 0.066s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:374
2017-09-01 02:00:18.246 935 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): sudo kill execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:344
2017-09-01 02:00:18.274 935 DEBUG oslo_concurrency.processutils [-] CMD "sudo kill " returned: 1 in 0.028s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:374
2017-09-01 02:00:18.275 935 DEBUG oslo_concurrency.processutils [-] u'sudo kill ' failed. Not Retrying. execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:422
2017-09-01 02:00:18.276 935 DEBUG trove.guestagent.datastore.service [-] Cleanup failed. _wait_for_database_service_status /usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/service.py:347
2017-09-01 02:00:18.276 935 ERROR trove.guestagent.datastore.service Traceback (most recent call last):
2017-09-01 02:00:18.276 935 ERROR trove.guestagent.datastore.service File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/service.py", line 345, in _wait_for_database_service_status
2017-09-01 02:00:18.276 935 ERROR trove.guestagent.datastore.service self.cleanup_stalled_db_services()
2017-09-01 02:00:18.276 935 ERROR trove.guestagent.datastore.service File "/usr/local/lib/python2.7/dist-packages/trove/guestagent/datastore/experimental/mongodb/service.py", line 462, in cleanup_stalled_db_services
2017-09-01 02:00:18.276 935 ERROR trove.guestagent.datastore.service utils.execute_with_timeout(system.MONGODB_KILL % pid, shell=True)
2017-09-01 02:00:18.276 935 ERROR trove.guestagent.datastore.service File "/usr/local/lib/python2.7/dist-packages/trove/common/utils.py", line 239, in execute_with_timeout
2017-09-01 02:00:18.276 935 ERROR trove.guestagent.datastore.service return execute(*args, **kwargs)
2017-09-01 02:00:18.276 935 ERROR trove.guestagent.datastore.service File "/usr...

Read more...

Revision history for this message
Amrith Kumar (amrith) wrote :

While these error log messages clearly indicate that something went wrong, they don't point to what may have caused the issue. Some additional debugging is going to be required to figure out what went wrong. It may be merely a timeout, or it may be that something failed (and is being seen on the controller as a timeout).

Changed in trove:
status: New → Incomplete
Revision history for this message
tianhui (tianhui) wrote :

When I modify the time for the timeout, the instance is in a build state for a long time and then it become error, the log shows that
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api [-] Error calling get_filesystem_stats
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api Traceback (most recent call last):
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/trove/guestagent/api.py", line 63, in _call
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api result = cctxt.call(self.context, method_name, **kwargs)
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 157, in call
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api retry=self.retry)
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api timeout=timeout, retry=retry)
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api retry=retry)
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in _send
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api result = self._waiter.wait(msg_id, timeout)
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 342, in wait
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api message = self.waiters.get(msg_id, timeout=timeout)
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 244, in get
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api 'to message ID %s' % msg_id)
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api MessagingTimeout: Timed out waiting for a reply to message ID d25c3687071d40fba5b40f5d3308b3ed
2017-09-01 09:32:59.562 2135 ERROR trove.guestagent.api
2017-09-01 09:32:59.563 2135 ERROR trove.instance.models [-] An error occurred communicating with the guest: Timed out waiting for a reply to message ID d25c3687071d40fba5b40f5d3308b3ed.

could you enlighten me on the specific steps needed for further debugging?

Revision history for this message
tianhui (tianhui) wrote :

I see, the Correct error log is that:
2017-09-01 21:08:37.376 21576 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'trove.common.utils.poll_and_check' failed
2017-09-01 21:08:37.376 21576 ERROR oslo.service.loopingcall Traceback (most recent call last):
2017-09-01 21:08:37.376 21576 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 136, in _run_loop
2017-09-01 21:08:37.376 21576 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
2017-09-01 21:08:37.376 21576 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/trove/common/utils.py", line 200, in poll_and_check
2017-09-01 21:08:37.376 21576 ERROR oslo.service.loopingcall raise exception.PollTimeOut
2017-09-01 21:08:37.376 21576 ERROR oslo.service.loopingcall PollTimeOut: Polling request timed out.
2017-09-01 21:08:37.376 21576 ERROR oslo.service.loopingcall
2017-09-01 21:08:37.377 21576 ERROR trove.taskmanager.models [-] Failed to create instance ebc5502a-e2cc-43e5-af44-f8196ea49aea. Timeout waiting for instance to become active. No usage create-event was sent.
2017-09-01 21:08:37.404 21576 ERROR trove.taskmanager.models [-] Service status: ERROR
Service error description: guestagent error

Changed in trove:
assignee: nobody → tianhui (tianhui)
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.