charm remains in waiting state "'shared-db' incomplete"

Bug #1909385 reported by Godswill Ogbu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack AODH Charm
New
Undecided
Unassigned
OpenStack Designate Charm
New
Undecided
Unassigned
OpenStack Placement Charm
New
Undecided
Unassigned

Bug Description

We deployed the aodh charm on our infrastructure that's running mysql/percona-cluster in a cluster(ha-cluster) of three nodes, after adding relation between aodh and MySQL, the aodh charm goes into waiting state instead of active state. We tried deploying aodh in a demo environment that is running a single node of mysql and the deployment successfully goes into active state after all the relevant relations are added. We are thinking that this might be an issue with ha-cluster, but we haven't seen anything in the logs that points to the source of this issue.

UBUNTU VERSION: BIONIC 18.0
OPENSTACK VERSION: Openstack Rocky

JUJU STATUS:
Unit Workload Agent Machine Public address Ports Message
aodh/4* waiting idle 93 10.10.2.2 8042/tcp 'shared-db' incomplete

JUJU LOG:

tracer: ++ queue handler hooks/relations/mysql-shared/requires.py:16:joined
2020-12-26 21:57:41 INFO juju-log shared-db:157: Invoking reactive handler: hooks/relations/mysql-shared/requires.py:16:joined
2020-12-26 21:57:42 DEBUG juju-log shared-db:157: tracer: set flag shared-db.connected
2020-12-26 21:57:42 DEBUG juju-log shared-db:157: tracer>
tracer: main dispatch loop, 4 handlers queued
tracer: ++ queue handler hooks/relations/tls-certificates/requires.py:109:broken:certificates
tracer: ++ queue handler reactive/aodh_handlers.py:45:setup_amqp_req
tracer: ++ queue handler reactive/aodh_handlers.py:55:setup_database
tracer: ++ queue handler reactive/aodh_handlers.py:64:setup_endpoint
2020-12-26 21:57:42 INFO juju-log shared-db:157: Invoking reactive handler: reactive/aodh_handlers.py:45:setup_amqp_req
2020-12-26 21:57:43 INFO juju-log shared-db:157: Invoking reactive handler: reactive/aodh_handlers.py:55:setup_database
2020-12-26 21:57:44 INFO juju-log shared-db:157: Invoking reactive handler: reactive/aodh_handlers.py:64:setup_endpoint
2020-12-26 21:57:46 WARNING juju-log shared-db:157: configure_ssl method is DEPRECATED, please use configure_tls instead.
2020-12-26 21:57:46 INFO juju-log shared-db:157: Invoking reactive handler: hooks/relations/tls-certificates/requires.py:109:broken:certificates
2020-12-26 21:57:46 DEBUG juju-log shared-db:157: Running _assess_status()
2020-12-26 21:57:52 INFO juju-log shared-db:157: Reactive main running for hook shared-db-relation-changed
2020-12-26 21:57:52 DEBUG juju-log shared-db:157: tracer>
tracer: starting handler dispatch, 41 flags set
tracer: set flag amqp.available
tracer: set flag amqp.connected
tracer: set flag aodh-installed
tracer: set flag charm.installed
tracer: set flag charms.openstack.do-default-certificates.available
tracer: set flag charms.openstack.do-default-cluster.available
tracer: set flag charms.openstack.do-default-upgrade-charm
tracer: set flag config.default.action-managed-upgrade
tracer: set flag config.default.debug
tracer: set flag config.default.dns-ha
tracer: set flag config.default.haproxy-client-timeout
tracer: set flag config.default.haproxy-connect-timeout
tracer: set flag config.default.haproxy-queue-timeout
tracer: set flag config.default.haproxy-server-timeout
tracer: set flag config.default.os-admin-hostname
tracer: set flag config.default.os-admin-network
tracer: set flag config.default.os-internal-hostname
tracer: set flag config.default.os-internal-network
tracer: set flag config.default.os-public-hostname
tracer: set flag config.default.os-public-network
tracer: set flag config.default.region
tracer: set flag config.default.ssl_ca
tracer: set flag config.default.ssl_cert
tracer: set flag config.default.ssl_key
tracer: set flag config.default.use-policyd-override
tracer: set flag config.default.use-syslog
tracer: set flag config.default.vip
tracer: set flag config.default.vip_cidr
tracer: set flag config.default.vip_iface
tracer: set flag config.default.worker-multiplier
tracer: set flag config.set.openstack-origin
tracer: set flag config.set.region
tracer: set flag config.set.use-internal-endpoints
tracer: set flag config.set.vip_cidr
tracer: set flag config.set.vip_iface
tracer: set flag haproxy.stat.password
tracer: set flag identity-service.available
tracer: set flag identity-service.available.auth
tracer: set flag identity-service.connected
tracer: set flag shared-db.connected
tracer: set flag ssl.enabled
2020-12-26 21:57:53 DEBUG juju-log shared-db:157: tracer>
tracer: hooks phase, 1 handlers queued
tracer: ++ queue handler hooks/relations/mysql-shared/requires.py:20:changed

Godswill Ogbu (dev-uche)
description: updated
Godswill Ogbu (dev-uche)
description: updated
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Please could you provide the output of "juju status aodh --relations" and the same for percona-cluster? It's not clear why aodh "thinks" the relation is incomplete from the log snippet. There may be further information higher up/further down in the log. Please, if possible, could you attach the log for aodh? (juju debug-log --replay -i unit-aodh-*)

Thanks.

Changed in charm-aodh:
status: New → Incomplete
Revision history for this message
Godswill Ogbu (dev-uche) wrote :

Hello Alex, Thanks for your response. We have successfully resolved this issue, for some reason percona-cluster was rejecting the relation with the warning:

2021-01-01 14:03:10 DEBUG juju-log shared-db:158: Percona cluster not yet bootstrapped - deferring shared-db rel until bootstrapped

we had to bootstrap percona-cluster again for it to accept the relation. It should be noted that we only had this issue when running percona-cluster in a cluster of three nodes.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Thanks for replying. I'm going to close this as it was (possibly) a problem in percona-cluster and not the aodh charm. Please feel free to open a bug in the percona-cluster charm if it fails to bootstrap during another deploy. Thanks.

Changed in charm-aodh:
status: Incomplete → Invalid
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :
Download full text (4.1 KiB)

Were seeing something similar with SQA on wallaby in this testrun:https://solutions.qa.canonical.com/testruns/testRun/2125bbad-971b-4bd4-9521-5f28dd36d853

Only 1 out of 3 aodh units is stuck waiting.

In the aodh-evaluator logs we see:
------------------------------------------------------------------
 Traceback (most recent call last):
   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
     self.dialect.do_execute(
   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 609, in do_execute
     cursor.execute(statement, parameters)
 sqlite3.OperationalError: no such table: alarm

 The above exception was the direct cause of the following exception:

 Traceback (most recent call last):
   File "/usr/lib/python3/dist-packages/aodh/evaluator/__init__.py", line 248, in _evaluate_assigned_alarms
     alarms = self._assigned_alarms()
   File "/usr/lib/python3/dist-packages/aodh/evaluator/__init__.py", line 292, in _assigned_alarms
     selected = self.storage_conn.get_alarms(
   File "/usr/lib/python3/dist-packages/aodh/storage/impl_sqlalchemy.py", line 250, in get_alarms
     alarms = self._retrieve_alarms(query)
   File "/usr/lib/python3/dist-packages/aodh/storage/impl_sqlalchemy.py", line 205, in _retrieve_alarms
     return [self._row_to_alarm_model(x) for x in query.all()]
   File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 3373, in all
     return list(self)
   File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
     return self._execute_and_instances(context)
   File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
     result = conn.execute(querycontext.statement, self._params)
   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
     return meth(self, multiparams, params)
   File "/usr/lib/python3/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
     return connection._execute_clauseelement(self, multiparams, params)
   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
     ret = self._execute_context(
   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
     self._handle_dbapi_exception(
   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1508, in _handle_dbapi_exception
     util.raise_(newraise, with_traceback=exc_info[2], from_=e)
   File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
     raise exception
   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
     self.dialect.do_execute(
   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 609, in do_execute
     cursor.execute(statement, parameters)
 oslo_db.exception.DBNonExistentTable: (sqlite3.OperationalError) no such table: alarm
 [SQL: SELECT alarm.alarm_id AS alarm_alarm_id, alarm.enabled AS alarm_enabled, alarm.name AS alarm_name, alarm.type AS alarm_type, alarm.severity AS alarm_severity, alarm.description AS alarm_description, a...

Read more...

Changed in charm-aodh:
status: Invalid → New
Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :

We've also seen this same behavior on charm-placement.

Run: https://solutions.qa.canonical.com/testruns/c324e735-9aa1-4dde-b52a-1f8aa7fce4bc
Artifacts: https://oil-jenkins.canonical.com/artifacts/c324e735-9aa1-4dde-b52a-1f8aa7fce4bc/index.html

placement/1 waiting idle 1/lxd/10 10.246.167.127 8778/tcp 'shared-db' incomplete
  filebeat/56 active idle 10.246.167.127 Filebeat ready.
  hacluster-placement/1 active idle 10.246.167.127 Unit is ready and clustered
  landscape-client/53 maintenance idle 10.246.167.127 Need computer-title and juju-info to proceed
  logrotated/51 active idle 10.246.167.127 Unit is ready.
  nrpe/61 active idle 10.246.167.127 icmp,5666/tcp Ready
  placement-mysql-router/1 active idle 10.246.167.127 Unit is ready
  prometheus-grok-exporter/55 active idle 10.246.167.127 9144/tcp Unit is ready
  public-policy-routing/30 active idle 10.246.167.127 Unit is ready
  telegraf/55 active idle 10.246.167.127 9103/tcp Monitoring placement/1 (source version/commit 23.07)
  ubuntu-advantage/54 active idle 10.246.167.127 Attached (esm-apps,esm-infra)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.