We can now see that it could take a long time to get a message from the task manager to the guest agent.
Consider these messages from the host:
2015-10-30 03:05:44.066 31192 DEBUG trove.db.models [-] Saving DBInstance: {u'cluster_id': None, u'shard_id': None, u'deleted_at': None, u'id': u'0b7d8b3b-5cf9-4323-a3a9-fbf24665619e', u'datastore_version_id': u'8c29840f-edf5-433e-a80c-ab67c66cf9a8', 'errors': {}, u'hostname': None, u'server_status': None, u'task_description': u'The instance is building.', u'volume_size': 1, u'type': None, u'updated': datetime.datetime(2015, 10, 30, 3, 5, 44, 66689), '_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x7fbb5da12b90>, u'deleted': 0, u'configuration_id': None, u'volume_id': u'4d257db1-a856-49cc-8dad-df719ef23b44', u'slave_of_id': None, u'task_start_time': None, u'name': u'TEST_4188c881-9fac-41a6-bc13-8bd0279c0ad1', u'task_id': 5, u'created': datetime.datetime(2015, 10, 30, 3, 5, 38), u'tenant_id': u'e77d43662fd04881bd125ba69bc8a075', u'compute_instance_id': u'162c57f9-cc4e-4881-8408-9e766efe2235', u'flavor_id': u'7'} save /opt/stack/new/trove/trove/db/models.py:62
2015-10-30 03:05:44.074 31192 DEBUG trove.taskmanager.models [-] End _create_server_volume_individually for id: 0b7d8b3b-5cf9-4323-a3a9-fbf24665619e _create_server_volume_individually /opt/stack/new/trove/trove/taskmanager/models.py:764
2015-10-30 03:05:44.091 31192 DEBUG trove.taskmanager.models [-] Entering guest_prepare _guest_prepare /opt/stack/new/trove/trove/taskmanager/models.py:877
2015-10-30 03:05:44.095 31192 DEBUG trove.guestagent.api [-] Sending the call to prepare the Guest. prepare /opt/stack/new/trove/trove/guestagent/api.py:232
2015-10-30 03:05:44.099 31192 DEBUG oslo_messaging._drivers.amqp [-] Pool creating new connection create /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqp.py:103
2015-10-30 03:05:44.101 31192 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on localhost:5672
2015-10-30 03:05:44.114 31192 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 127.0.0.1:5672
2015-10-30 03:05:44.126 31192 DEBUG trove.guestagent.api [-] Casting prepare _cast /opt/stack/new/trove/trove/guestagent/api.py:76
2015-10-30 03:05:44.128 31192 DEBUG oslo_messaging._drivers.amqp [-] Pool creating new connection create /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqp.py:103
2015-10-30 03:05:44.130 31192 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on localhost:5672
2015-10-30 03:05:44.142 31192 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 127.0.0.1:5672
2015-10-30 03:05:44.142 31192 DEBUG oslo_messaging._drivers.amqpdriver [-] CAST unique_id: 616ac9a7d92144e6a80a64596dd6790d exchange 'openstack' topic 'guestagent.0b7d8b3b-5cf9-4323-a3a9-fbf24665619e' _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
Here is the time when the message makes it to the guest
2015-10-30 03:12:16.688 31263 DEBUG trove.conductor.manager [-] Message from 0b7d8b3b-5cf9-4323-a3a9-fbf24665619e: trove.guestagent.datastore.mysql.manager_base received prepare(), invoking begin_install(). log_message /opt/stack/new/trove/trove/conductor/manager.py:141
that's almost 7 minutes.
Anecdotally, killing the task manager makes the message get to the guest sooner.
At least one part of this problem has been narrowed to an unfortunate side-effect of the oslo.messaging change which went into Juno. To understand this problem, a simple repro was created.
You can recreate the problem and experiment with the proposed fix by un-commenting a line in the test program.
To run the repro:
copy server.py, server.conf, client.py and client.conf to a linux machine (I used ubuntu) and execute the following.
python client.py --config-file client.conf
You will get an output like:
amrith@ amrith- work:~/ source/ rabbit$ python client.py --config-file client.conf _drivers. amqp [-] Pool creating new connection create /usr/local/ lib/python2. 7/dist- packages/ oslo_messaging/ _drivers/ amqp.py: 103 _drivers. amqp [-] Pool creating new connection create /usr/local/ lib/python2. 7/dist- packages/ oslo_messaging/ _drivers/ amqp.py: 103 _drivers. amqpdriver [-] CAST unique_id: 435d43890c394e5 ab41bdd3b336012 e7 exchange 'openstack' topic 'fbb9b530- 3299-44d3- b048-ee2411afce 0c' _send /usr/local/ lib/python2. 7/dist- packages/ oslo_messaging/ _drivers/ amqpdriver. py:448 _drivers. amqpdriver [-] received message unique_id: 435d43890c394e5 ab41bdd3b336012 e7 __call__ /usr/local/ lib/python2. 7/dist- packages/ oslo_messaging/ _drivers/ amqpdriver. py:195
2016-01-12 13:30:06.948 101415 DEBUG __main__ [-] client starting main client.py:27
2016-01-12 13:30:07.005 101415 DEBUG __main__ [-] Now calling get_rpc_server() main client.py:56
2016-01-12 13:30:07.010 101415 DEBUG __main__ [-] Now calling server.start() main client.py:59
2016-01-12 13:30:07.010 101415 DEBUG oslo_messaging.
2016-01-12 13:30:07.024 101415 DEBUG __main__ [-] Now calling rpcserver.stop() main client.py:67
2016-01-12 13:30:07.025 101415 DEBUG oslo_messaging.
2016-01-12 13:30:07.031 101415 DEBUG oslo_messaging.
2016-01-12 13:30:07.033 101415 DEBUG __main__ [-] Going to sleep for 240s main client.py:89
2016-01-12 13:30:07.033 101415 DEBUG oslo_messaging.
--
In another window, execute
python server.py --config-file server.conf
You get something like this:
amrith@ amrith- work:~/ source/ rabbit$ python server.py --config-file server.conf fbb9b530- 3299-44d3- b048-ee2411afce 0c, topic = fbb9b530- 3299-44d3- b048-ee2411afce 0c main server.py:64 _drivers. amqp [-] Pool creating new connection create /usr/local/ lib/python2. 7/dist- packages/ oslo_messaging/ _drivers/ amqp.py: 103 _drivers. amqpdriver [-] received message unique_id: 435d43890c394e5 ab41bdd3b336012 e7 __call__ /usr/local/ lib/python2. 7/dist- packages/ oslo_messaging/ _drivers/ amqpdriver. py:195 12T13:30: 07.024895 at 2016-01-12 13:33:07.024389 ping server.py:41
2016-01-12 13:31:01.532 101448 DEBUG __main__ [-] server starting main server.py:54
2016-01-12 13:31:01.589 101448 DEBUG __main__ [-] Server listening for server = guestclient.
2016-01-12 13:31:01.594 101448 DEBUG __main__ [-] Calling server.start() main server.py:76
2016-01-12 13:31:01.594 101448 DEBUG oslo_messaging.
2016-01-12 13:31:01.606 101448 DEBUG __main__ [-] Going to sleep for 240s now main server.py:79
2016-01-12 13:33:07.023 101448 DEBUG oslo_messaging.
2016-01-12 13:33:07.024 101448 DEBUG __main__ [-] got a ping from 2016-01-
Observe that the message sent from the client at about 2016-01-12 13:30:0...