hacluster may attempt to start services before the primary charm is ready and does not subsequently restart the service

Bug #1674683 reported by David Ames
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Invalid
Undecided
Unassigned

Bug Description

There is a possible race for services from primary charms that hacluster manages that require relation data from other charms such as mysql or rabbitmq.

The primary charms may need to wait until they have complete relation data before completing the ha relation.

Specific example seen in the wild:

nova-cloud-controller and hacluster managing nova-consoleauth.

Nova console auth fails to start
2017-03-20 22:12:07.931 162718 ERROR oslo_service.service ProgrammingError: (pymysql.err.ProgrammingError) (1146, u"Table 'nova.services' doesn't exist") [SQL: u'SELECT services.created_at AS services_created_at, services.updated_at AS services_updated_at, services.deleted_at AS services_deleted_at, services.deleted AS services_deleted, services.id AS services_id, services.host AS services_host, services.`binary` AS services_binary, services.topic AS services_topic, services.report_count AS services_report_count, services.disabled AS services_disabled, services.disabled_reason AS services_disabled_reason, services.last_seen_up AS services_last_seen_up, services.forced_down AS services_forced_down, services.version AS services_version \nFROM services \nWHERE services.deleted = %(deleted_1)s AND services.host = %(host_1)s AND services.`binary` = %(binary_1)s \n LIMIT %(param_1)s'] [parameters: {u'host_1': 'juju-58750c-0-lxd-2', u'param_1': 1, u'deleted_1': 0, u'binary_1': 'nova-consoleauth'}]2017-03-20 22:12:07.931 162718 ERROR oslo_service.service
2017-03-20 22:19:42.682 2524 WARNING oslo_reports.guru_meditation_report [-] Guru mediation now registers SIGUSR1 an

Corosync then never attempts to restart it.

Revision history for this message
David Ames (thedac) wrote :
Changed in charm-hacluster:
status: New → Triaged
importance: Undecided → High
milestone: none → 17.05
Revision history for this message
David Ames (thedac) wrote :
Revision history for this message
David Ames (thedac) wrote :
Revision history for this message
David Ames (thedac) wrote :

In this instance the problem is actually https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/166024

There may be a theoretical race condition possible here but the service should be restarted. I am going to mark this invalid until we see this in the wild unrelated to the nova-consoleauth issue.

Changed in charm-hacluster:
status: Triaged → Invalid
importance: High → Undecided
milestone: 17.05 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.