Charm holds in "waiting" state, "'shared-db' incomplete"

Bug #1820365 reported by Angel Vargas
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack API Layer
Expired
Undecided
Unassigned
OpenStack Octavia Charm
Expired
Undecided
Unassigned
mysql-shared charm interface
Expired
Undecided
Unassigned

Bug Description

Hi I just re-deployed for 3rd time the whole cloud looking to get Octavia up,

octavia/0* waiting idle 6 10.10.0.18 9876/tcp 'shared-db'incomplete

Our environment have 4 maas spaces:
- public: 10.50.0.0/24,
- admin: 10.100.0.0/24,
- internal: 10.101.0.0/24,
- cluster: 10.102.0.0/24

Then the percona-cluster charm is aliased as "mysql",

Octavia is being deployed in it is own baremetal node, to get the subordinate relation neutron-openvswitch map accordingly the bridges:

bond0.99:physnet1
bond0.104:external

The cloud is working, nova, cinder, ceph, keystone, glance, vault, barbican not a single problem in the current deployment, except for this charm which is not connecting to mysql. One thing I noticed is in the config file in the Octavia host (/etc/octavia/octavia.conf) we can not find a single line describing the connection to the database service, e.g.
----------------------------------------------------
[DEFAULT]
debug = True

[house_keeping]

[controller_worker]
amp_secgroup_list = 0228955f-b585-41a9-b9c3-3e9e6a01ed5f
amp_flavor_id = 0f8d7a93-ed51-44c0-8209-be0956a3b31f
amp_boot_network_list = a321b8f3-988a-4739-81f9-178d6aef09da
amp_image_tag = octavia-amphora
amp_active_retries = 180
# This certificate is installed on the ``Amphorae`` and used for validating
# the authenticity of the ``Octavia`` controller.
client_ca = /etc/octavia/certs/controller_ca.pem
network_driver = allowed_address_pairs_driver
compute_driver = compute_nova_driver
amphora_driver = amphora_haproxy_rest_driver
loadbalancer_topology = SINGLE
.
.
.
---------------------------------------

I expect a database configuration line in the DEFAULT block, am I wrong? Seems the charm is not providing it. I was wondering if for some reason the network spaces or something else is breaking the charm to set the database configuration in the config file.

Octavia health-manager logs shows:

{{{

2019-03-16 11:22:03.698 4605 DEBUG futurist.periodics [-] Submitting periodic callback 'octavia.cmd.health_manager.hm_health_check.<locals>.periodic_health_check' _process_scheduled /usr/lib/python3/dist-packages/futurist/periodics.py:639
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics [-] Failed to call periodic 'octavia.cmd.health_manager.hm_health_check.<locals>.periodic_health_check' (it runs every 3.00 seconds): oslo_db.exception.DBNonExistentTable: (sqlite3.OperationalError) no such table: amphora_health [SQL: 'SELECT amphora_health.amphora_id AS amphora_health_amphora_id, amphora_health.last_update AS amphora_health_last_update, amphora_health.busy AS amphora_health_busy \nFROM amphora_health \nWHERE amphora_health.busy = 0 AND amphora_health.last_update < ?\n LIMIT ? OFFSET ?'] [parameters: ('2019-03-16 11:21:03.699506', 1, 0)] (Background on this error at: http://sqlalche.me/e/e3q8)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics Traceback (most recent call last):
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics context)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 508, in do_execute
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics cursor.execute(statement, parameters)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics sqlite3.OperationalError: no such table: amphora_health
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics The above exception was the direct cause of the following exception:
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics Traceback (most recent call last):
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/futurist/periodics.py", line 290, in run
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics work()
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/futurist/periodics.py", line 64, in __call__
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics return self.callback(*self.args, **self.kwargs)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/futurist/periodics.py", line 178, in decorator
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics return f(*args, **kwargs)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/octavia/cmd/health_manager.py", line 64, in periodic_health_check
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics hm.health_check()
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/octavia/controller/healthmanager/health_manager.py", line 113, in health_check
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics lock_session.rollback()
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics self.force_reraise()
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics six.reraise(self.type_, self.value, self.tb)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics raise value
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/octavia/controller/healthmanager/health_manager.py", line 90, in health_check
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics lock_session)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/octavia/db/repositories.py", line 1189, in get_stale_amphora
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics self.model_class.last_update < expired_time).first()
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2835, in first
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics ret = list(self[0:1])
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2627, in __getitem__
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics return list(res)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2935, in __iter__
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics return self._execute_and_instances(context)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2958, in _execute_and_instances
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics result = conn.execute(querycontext.statement, self._params)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 948, in execute
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics return meth(self, multiparams, params)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics return connection._execute_clauseelement(self, multiparams, params)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics compiled_sql, distilled_params
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics context)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics util.raise_from_cause(newraise, exc_info)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics reraise(type(exception), exception, tb=exc_tb, cause=cause)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 186, in reraise
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics raise value.with_traceback(tb)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics context)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 508, in do_execute
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics cursor.execute(statement, parameters)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics oslo_db.exception.DBNonExistentTable: (sqlite3.OperationalError) no such table: amphora_health [SQL: 'SELECT amphora_health.amphora_id AS amphora_health_amphora_id, amphora_health.last_update AS amphora_health_last_update, amphora_health.busy AS amphora_health_busy \nFROM amphora_health \nWHERE amphora_health.busy = 0 AND amphora_health.last_update < ?\n LIMIT ? OFFSET ?'] [parameters: ('2019-03-16 11:21:03.699506', 1, 0)] (Background on this error at: http://sqlalche.me/e/e3q8)
2019-03-16 11:22:03.702 4605 ERROR futurist.periodics

}}}

How can I help to debug this bug?

Thanks.

description: updated
tags: added: rocky
tags: added: bionic lbaasv2 openstack
description: updated
description: updated
description: updated
Revision history for this message
James Page (james-page) wrote :

Can you share your bundle including network space bindings for all charms?

If the shared-db endpoint on the octavia charm and the associated endpoint on the percona-cluster charm are not bound to the same network space you might get this issue.

Revision history for this message
Angel Vargas (angelvargas) wrote :

Hi James, many thanks for the clear reply.

You are right, to get this issue solved I have to bind the shared-db space during the charm deployment.

e.g.

juju deploy --to 6 --config octavia.yaml octavia --bind="shared-db=admin internal=internal public=public cluster=cluster admin=admin"

In our case, the charm is being deployed to a baremetal machine, as other charms, like nova-compute, neutron-gateway I expected the charm to work without the bind option, we didn't use the bind option in the previous deployments attempts.

Regards,
Angel.

Revision history for this message
Frode Nordahl (fnordahl) wrote :
Revision history for this message
Frode Nordahl (fnordahl) wrote :
Revision history for this message
Frode Nordahl (fnordahl) wrote :

@angelvargas does the deployment contain the Gnocchi charm, if so, did it require binding of the shared-db relation or was it able to cope without?

Changed in charm-octavia:
status: New → Incomplete
Revision history for this message
Angel Vargas (angelvargas) wrote :

Hi @fnordahl,

Sorry for the late reply.

Yes it does include gnocchi, but gnocchi completed installation and bind with "internal=internal public=public admin=admin", which are currently our main spaces. After that later on, we had problems with gnochhi, charm which couldn't connect to memcached, but quickly we solved changing the firewall rules at the memcached unit.

As other charms like keystone, nova-compute, aodh, etc. we expected gnocchi and octavia to resolve the binds with the spaces public, internal and admin. Since (james-page) reply we went back to the deployment and tried to be as much explicit we can and everything worked fine.

We couldn't use octavia after all as we couldn't make it work after we deployed. We found for some reason in our setup the overlay network used by octavia wasn't capable to reach the amphora vm after we tried to launch a lb. We tried many things to get it working, but didn't work for us.

Changed in layer-openstack-api:
status: New → Incomplete
Changed in charm-interface-mysql-shared:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for mysql-shared charm interface because there has been no activity for 60 days.]

Changed in charm-interface-mysql-shared:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Octavia Charm because there has been no activity for 60 days.]

Changed in charm-octavia:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack API Layer because there has been no activity for 60 days.]

Changed in layer-openstack-api:
status: Incomplete → Expired
Revision history for this message
Tolga Kaprol (tolgakaprol) wrote :

Hello,

This bug should be reconsidered.

I guess there is a problem with the remove application process, Octavia charm can not be removed without using force, which causes some configurations such as MySQL router users staying on the Mysql backend.

Our workaround is deploying Octavia on a brand new server that has new IP addresses. Therefore both OVN, MySQL routers do not cause any problems on deployment.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.