DB instance goes into ERROR state when trove setup with devstack

Bug #1352916 reported by Rajalakshmi Ganesan
58
This bug affects 9 people
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
Expired
Wishlist
Unassigned

Bug Description

Trove instance is reaching ERROR status instead of ACTIVE in a environment that is setup using devstack with trove enabled.

Steps to Reproduce:

1. On a Ubuntu 12.04 VM, clone devstack folder from https://github.com/openstack-dev/devstack.git

2. Create a localrc file with following details:
#enable the trove service
MYSQL_PASSWORD=*****
RABBIT_PASSWORD=*****
SERVICE_TOKEN=*****
SERVICE_PASSWORD=*****
ADMIN_PASSWORD=*****
PUBLIC_INTERFACE=eth0
HOST_IP=127.0.0.1
ENABLED_SERVICES+=,trove,tr-api,tr-tmgr,tr-cond
# Trove also requires swift to be enabled for backup / restore
ENABLED_SERVICES+=,s-proxy,s-object,s-container,s-account
SWIFT_HASH=12go358snjw24501
# Devstack related network information needs to be set correctly
FLOATING_RANGE=192.168.8.224/27
FIXED_RANGE=10.1.0.0/24
FIXED_NETWORK_SIZE=256
FLAT_INTERFACE=eth0

3. Run stack.sh script.

4. Once stack is successful, source openrc file in /devstack folder and execute following commands.

5. trove create <instance_name> 1 --size 1

6. trove list

Expected result:
The DB instance should reach ACTIVE status.

Actual Status:
The DB instance reached ERROR status.

Please find the attached tr-tmgr log generated by rejoining stack.

Revision history for this message
Rajalakshmi Ganesan (rajalakshmi-ganesan) wrote :
Revision history for this message
Denis M. (dmakogon) wrote :

Can you post log in txt format, not all devs are working on Windows

Revision history for this message
Rajalakshmi Ganesan (rajalakshmi-ganesan) wrote :
Download full text (4.1 KiB)

Trove manager Log:

e dns support = False from (pid=14403) _create_dns_entry /opt/stack/trove/trove/taskmanager/models.py:632
2014-08-05 13:53:18.347 DEBUG trove.taskmanager.models [req-bd43fc15-bb3e-4beb-b76a-82cc420c551a 4006c227e37e4a3da64f7c24e6691005 c49ff301e3964206b7de6a5021667b7d] <greenlet.greenlet object at 0x300cc30>: DNS not enabled for instance: 199b4580-d239-41c8-9988-930ebbf5abd2 from (pid=14403) _create_dns_entry /opt/stack/trove/trove/taskmanager/models.py:671
2014-08-05 13:53:18.348 DEBUG trove.taskmanager.models [req-bd43fc15-bb3e-4beb-b76a-82cc420c551a 4006c227e37e4a3da64f7c24e6691005 c49ff301e3964206b7de6a5021667b7d] Successfully created DNS entry for instance: 199b4580-d239-41c8-9988-930ebbf5abd2 from (pid=14403) create_instance /opt/stack/trove/trove/taskmanager/models.py:252
2014-08-05 13:53:45.221 ERROR trove.common.utils [req-bd43fc15-bb3e-4beb-b76a-82cc420c551a 4006c227e37e4a3da64f7c24e6691005 c49ff301e3964206b7de6a5021667b7d] In looping call.
2014-08-05 13:53:45.221 TRACE trove.common.utils Traceback (most recent call last):
2014-08-05 13:53:45.221 TRACE trove.common.utils File "/opt/stack/trove/trove/common/utils.py", line 213, in _inner
2014-08-05 13:53:45.221 TRACE trove.common.utils self.f(*self.args, **self.kw)
2014-08-05 13:53:45.221 TRACE trove.common.utils File "/opt/stack/trove/trove/common/utils.py", line 250, in poll_and_check
2014-08-05 13:53:45.221 TRACE trove.common.utils obj = retriever()
2014-08-05 13:53:45.221 TRACE trove.common.utils File "/opt/stack/trove/trove/taskmanager/models.py", line 325, in _service_is_active
2014-08-05 13:53:45.221 TRACE trove.common.utils raise TroveError(_("Server not active, status: %s") % nova_status)
2014-08-05 13:53:45.221 TRACE trove.common.utils TroveError: Server not active, status: ERROR
2014-08-05 13:53:45.221 TRACE trove.common.utils
2014-08-05 13:53:45.223 ERROR trove.taskmanager.models [req-bd43fc15-bb3e-4beb-b76a-82cc420c551a 4006c227e37e4a3da64f7c24e6691005 c49ff301e3964206b7de6a5021667b7d] Error during create-event call.
2014-08-05 13:53:45.223 TRACE trove.taskmanager.models Traceback (most recent call last):
2014-08-05 13:53:45.223 TRACE trove.taskmanager.models File "/opt/stack/trove/trove/taskmanager/models.py", line 261, in create_instance
2014-08-05 13:53:45.223 TRACE trove.taskmanager.models time_out=usage_timeout)
2014-08-05 13:53:45.223 TRACE trove.taskmanager.models File "/opt/stack/trove/trove/common/utils.py", line 256, in poll_until
2014-08-05 13:53:45.223 TRACE trove.taskmanager.models return lc.wait()
2014-08-05 13:53:45.223 TRACE trove.taskmanager.models File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 120, in wait
2014-08-05 13:53:45.223 TRACE trove.taskmanager.models return hubs.get_hub().switch()
2014-08-05 13:53:45.223 TRACE trove.taskmanager.models File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
2014-08-05 13:53:45.223 TRACE trove.taskmanager.models return self.greenlet.switch()
2014-08-05 13:53:45.223 TRACE trove.taskmanager.models File "/opt/stack/trove/trove/common/utils.py", line 213, in _inner
2014-08-05 13:53:45.223 TRACE trov...

Read more...

Revision history for this message
Doug Shelley (0-doug) wrote :

I believe the issue here is that devstack flavor 1 - m1.tiny - doesn't have enough disk to load the default guest image which I believe needs 3GB (m1.tiny only has 1gb of disk). I would recommend making a flavor to use with trove create. For example:

$ nova flavor-create mysql-minimum 6 512 5 1

That will create flavor 6 which will have 5gb of disk. Then you can do :

$ trove create <instance_name> 6 --size 1

Revision history for this message
Rajalakshmi Ganesan (rajalakshmi-ganesan) wrote :
Download full text (3.6 KiB)

Hi Shelly,

I tried creating the flavor as you mentioned and then booting an instance. But still the instance goes into error status.

Following is the tr-tmgr log:

2014-08-06 11:13:21.729 DEBUG trove.taskmanager.models [req-c62d2472-645b-4d4a-a7ff-263f9d8d9194 b15e4169540d4d4f997ceb5471c0c282 acffd6800fc14138bc16ea3bea1379b0] Successfully created DNS entry for instance: d5b3e48f-95b4-4aff-ae66-8e2cf657c37a from (pid=19270) create_instance /opt/stack/trove/trove/taskmanager/models.py:252
2014-08-06 11:20:05.240 ERROR trove.common.utils [req-c62d2472-645b-4d4a-a7ff-263f9d8d9194 b15e4169540d4d4f997ceb5471c0c282 acffd6800fc14138bc16ea3bea1379b0] In looping call.
2014-08-06 11:20:05.240 TRACE trove.common.utils Traceback (most recent call last):
2014-08-06 11:20:05.240 TRACE trove.common.utils File "/opt/stack/trove/trove/common/utils.py", line 213, in _inner
2014-08-06 11:20:05.240 TRACE trove.common.utils self.f(*self.args, **self.kw)
2014-08-06 11:20:05.240 TRACE trove.common.utils File "/opt/stack/trove/trove/common/utils.py", line 254, in poll_and_check
2014-08-06 11:20:05.240 TRACE trove.common.utils raise exception.PollTimeOut
2014-08-06 11:20:05.240 TRACE trove.common.utils PollTimeOut: Polling request timed out.
2014-08-06 11:20:05.240 TRACE trove.common.utils
2014-08-06 11:20:05.250 ERROR trove.taskmanager.models [req-c62d2472-645b-4d4a-a7ff-263f9d8d9194 b15e4169540d4d4f997ceb5471c0c282 acffd6800fc14138bc16ea3bea1379b0] Timeout for service changing to active. No usage create-event sent.
2014-08-06 11:20:05.266 ERROR trove.taskmanager.models [req-c62d2472-645b-4d4a-a7ff-263f9d8d9194 b15e4169540d4d4f997ceb5471c0c282 acffd6800fc14138bc16ea3bea1379b0] Service status: ERROR
2014-08-06 11:20:05.268 ERROR trove.taskmanager.models [req-c62d2472-645b-4d4a-a7ff-263f9d8d9194 b15e4169540d4d4f997ceb5471c0c282 acffd6800fc14138bc16ea3bea1379b0] Service error description: guestagent error
2014-08-06 11:20:05.278 DEBUG trove.db.models [req-c62d2472-645b-4d4a-a7ff-263f9d8d9194 b15e4169540d4d4f997ceb5471c0c282 acffd6800fc14138bc16ea3bea1379b0] Saving DBInstance: {u'task_start_time': None, u'updated': datetime.datetime(2014, 8, 6, 11, 20, 5, 277751), '_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x4f0a190>, u'name': u'ps_test', u'task_id': 84, u'created': datetime.datetime(2014, 8, 6, 11, 13, 11), u'deleted': 0, u'tenant_id': u'acffd6800fc14138bc16ea3bea1379b0', u'compute_instance_id': u'67cdb0d4-7a56-44f2-82c8-418447434f46', u'hostname': None, u'configuration_id': None, u'server_status': None, u'task_description': 'Build error: guestagent timeout.', u'volume_size': 1L, 'errors': {}, u'flavor_id': 6L, u'volume_id': u'a5323227-440c-4d07-8420-0d0237bc6d56', u'slave_of_id': None, u'deleted_at': None, u'id': u'd5b3e48f-95b4-4aff-ae66-8e2cf657c37a', u'datastore_version_id': u'94dcc352-9b07-456d-b01f-669fc398b5e5'} from (pid=19270) save /opt/stack/trove/trove/db/models.py:61
2014-08-06 11:20:05.328 ERROR trove.taskmanager.models [req-c62d2472-645b-4d4a-a7ff-263f9d8d9194 b15e4169540d4d4f997ceb5471c0c282 acffd6800fc14138bc16ea3bea1379b0] Trove instance status: ERROR
2014-08-06 11:20:05.330 ERROR trove.taskmanager.models [req-c62...

Read more...

Revision history for this message
Rajalakshmi Ganesan (rajalakshmi-ganesan) wrote :

The Jekins build of submissions https://review.openstack.org/#/c/109935/ is failing due to this bug. Request someone to assign this bug and resolve.

Revision history for this message
Doug Shelley (0-doug) wrote :

My next suggestion is that your usage_timeout is too low - the default is 400 seconds and if your machine/infrastructure is running too slowly, trove create will exceed this value when trying to start an instance. To check if this is the issue in your environment locate your trove-taskmanager.conf (usually in /etc/trove) and add this to the bottom:
[mysql]
usage_timeout = 1000

Then restart the trove task manager and give it a try.

Revision history for this message
Rajalakshmi Ganesan (rajalakshmi-ganesan) wrote :

Hi Shelly,

I tried your above suggestion.

I added the line "usage_timeout = 1000" to trove-taskmanager.conf and restarted trove taskmanager. But still the instance goes into ERROR state.

Thanks,
Rajalakshmi Ganesan

Revision history for this message
Denis M. (dmakogon) wrote :

This is not a bug for Trove itself, it's more about networking issue in devstack deployment.

Changed in trove:
status: New → Incomplete
status: Incomplete → Opinion
status: Opinion → Invalid
Revision history for this message
Amrith Kumar (amrith) wrote :

Rajalakshmi,

Please update with latest status and if you believe that this is no longer a bug, please mark it as such.

Thanks.

Changed in trove:
status: Invalid → New
status: New → Incomplete
Revision history for this message
Denis M. (dmakogon) wrote :

I've seen such behaviour not even once: "Timeout for service changing to active. No usage create-event sent.". It means that guest wasn't able to start properly due to failed rsync'ing from the compute host - it means that wasn't established SSH connection between compute and vm host. The person who entered this bug should check 'authorized_hosts' file, its rights, ssh keys, etc.

Changed in trove:
importance: Undecided → Wishlist
Revision history for this message
meenakshi m (meenakshi-m) wrote :

Hi Amrith,

I have tried all the above suggestions for devstack with trove enabled with devstack, but still the DB instance is reaching ERROR status instead of ACTIVE status.

Revision history for this message
Amrith Kumar (amrith) wrote :

including a larger size than size = 1?

Revision history for this message
meenakshi m (meenakshi-m) wrote :

Yes, I have created a DB instance with size = 3, but still the instance is in ERROR status.

Revision history for this message
Ali Nazemian (alinazemian) wrote :

Dear Meenakshi,
Hi,
Did you manage to solve this issue? I think I have the same problem with both icehouse and juno releases. Both is installed by devstack script!

Revision history for this message
meenakshi m (meenakshi-m) wrote :

Hi Ali,

I haven't yet resolved the issue, still am facing the same problem.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Trove because there has been no activity for 60 days.]

Changed in trove:
status: Incomplete → Expired
Revision history for this message
Nikhil Manchanda (slicknik) wrote :

I looked into this and the issue is because the correct trove key for the cached images haven't been inserted into the ~/.ssh/authorized_keys file on the devstack host.

The particular public key which needs to be added for the cached image to be able to access the devstack host can be found at http://git.openstack.org/cgit/openstack/trove-integration/tree/scripts/files/keys/id_rsa.pub

Please let me know if you're still seeing this issue after updating the configuration.

Revision history for this message
meenakshi m (meenakshi-m) wrote :

Hi Nikhil,

As suggested, I have updated authorized_keys file on the devstack host. Still the instance is in Error status.

Even the nova instance reached ERROR status with "mysql" image.

Revision history for this message
Rohit Jaiswal (rohit-jaiswal-3) wrote :

I am also facing this issue with the latest trove bits from redstack install.

Revision history for this message
Amrith Kumar (amrith) wrote :

Can you provide the guest agent log as well please.

Revision history for this message
Rohit Jaiswal (rohit-jaiswal-3) wrote :

Here are the trove-agent logs:

01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00mTraceback (most recent call last):
^[[01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00m File "/home/ubuntu/trove/contrib/trove-guestagent", line 34, in <module>
^[[01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00m sys.exit(main())
^[[01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00m File "/home/ubuntu/trove/trove/cmd/guest.py", line 60, in main
^[[01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00m from trove import rpc
^[[01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00m File "/home/ubuntu/trove/trove/rpc.py", line 36, in <module>
^[[01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00m import trove.common.exception
^[[01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00m File "/home/ubuntu/trove/trove/common/exception.py", line 20, in <module>
^[[01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00m from oslo_concurrency import processutils
^[[01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00mImportError: No module named oslo_concurrency
^[[01;31m2015-03-04 23:19:14.187 TRACE root ^[[01;35m^[[00m

Revision history for this message
ylargou (yslargou) wrote :

i m facing the some issue on new openstack installation (7 nodes ) (not devstack) how did you resolve it?

Revision history for this message
Ali Nazemian (alinazemian) wrote : Re: [Bug 1352916] Re: DB instance goes into ERROR state when trove setup with devstack

I still could not solve the problem yet. I think there is not any complete
working manual for deploying trove or at least the one that I am aware of,
or maybe It does not work at all.
Regards.

On Mon, Apr 6, 2015 at 4:09 AM, ylargou <email address hidden> wrote:

> i m facing the some issue on new openstack installation (7 nodes ) (not
> devstack) how did you resolve it?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1352916
>
> Title:
> DB instance goes into ERROR state when trove setup with devstack
>
> Status in Openstack Database (Trove):
> Expired
>
> Bug description:
> Trove instance is reaching ERROR status instead of ACTIVE in a
> environment that is setup using devstack with trove enabled.
>
> Steps to Reproduce:
>
> 1. On a Ubuntu 12.04 VM, clone devstack folder from
> https://github.com/openstack-dev/devstack.git
>
> 2. Create a localrc file with following details:
> #enable the trove service
> MYSQL_PASSWORD=*****
> RABBIT_PASSWORD=*****
> SERVICE_TOKEN=*****
> SERVICE_PASSWORD=*****
> ADMIN_PASSWORD=*****
> PUBLIC_INTERFACE=eth0
> HOST_IP=127.0.0.1
> ENABLED_SERVICES+=,trove,tr-api,tr-tmgr,tr-cond
> # Trove also requires swift to be enabled for backup / restore
> ENABLED_SERVICES+=,s-proxy,s-object,s-container,s-account
> SWIFT_HASH=12go358snjw24501
> # Devstack related network information needs to be set correctly
> FLOATING_RANGE=192.168.8.224/27
> FIXED_RANGE=10.1.0.0/24
> FIXED_NETWORK_SIZE=256
> FLAT_INTERFACE=eth0
>
> 3. Run stack.sh script.
>
> 4. Once stack is successful, source openrc file in /devstack folder
> and execute following commands.
>
> 5. trove create <instance_name> 1 --size 1
>
> 6. trove list
>
> Expected result:
> The DB instance should reach ACTIVE status.
>
> Actual Status:
> The DB instance reached ERROR status.
>
> Please find the attached tr-tmgr log generated by rejoining stack.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/trove/+bug/1352916/+subscriptions
>

--
A.Nazemian

Revision history for this message
Tomoki Sekiyama (tsekiyama) wrote :

I believe most people who are using devstack to setup are failing to launch mysql instance because guest-to-host communication issue. A trove guest instance must be able to login to the compute host by the following command:

  ssh ubuntu@10.0.0.1

It requires some assumptions which are not always satisfied automatically:

- host should have an IP address "10.0.0.1"

- host devstack username must be "ubuntu"

- ssh public key must be added to "ubuntu" user
  ( http://git.openstack.org/cgit/openstack/trove-integration/tree/scripts/files/keys/id_rsa.pub )

Currently these values are hard-coded in guest's /etc/init/trove-guest.conf, but it should be dynamically determined in the future.

Changed in trove:
status: Expired → Incomplete
Revision history for this message
Ali Nazemian (alinazemian) wrote :

Dear Tomoki,
Hi,
Unfortunately this problem also exists for normal installation of trove (using the huge openstack installation guide for Neutron network). I did not test the nova legacy network for this purpose yet, so I though maybe it is related to integration problem between Neutron and Trove. I tried Devstak, Packstack, Trove-integration and normal installation. So far the only working installation for me was trove-integration script. This script works with nova legacy networking therefore I thought maybe all the problems caused by integration of Trove and Neutron.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Trove because there has been no activity for 60 days.]

Changed in trove:
status: Incomplete → Expired
Revision history for this message
yuchuan zhang (yuchuan-zhang) wrote :

I met the same issue. how to get the trove-agent logs when "trove create" fails ? I cannot see the instance with "nova list" .

Revision history for this message
FelixD (generalkalbasa) wrote :

i've encountered similar issue on regular openstack deployment with manualy built image. Here's a link to description and logs:
https://ask.openstack.org/en/question/96013/trove-guest-instance-starts-normally-but-trove-taskmanager-still-reports-error-state/

Wang,Sen (sanmuny)
Changed in trove:
status: Expired → Confirmed
Amrith Kumar (amrith)
Changed in trove:
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack DBaaS (Trove) because there has been no activity for 60 days.]

Changed in trove:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.