Action resume failed: Couldn''t resume: Unit is not in sync

Bug #1626450 reported by Björn Tillenius
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Percona Cluster Charm
Fix Released
Medium
James Page
percona-cluster (Juju Charms Collection)
Invalid
Medium
Unassigned

Bug Description

I have a 3-unit percona cluster and after successfully pausing one of the units, it fails to resume:

  Action resume failed: Couldn''t resume: Unit is not in sync

This is using cs:trusty/percona-cluster-244.

A while after the action has failed, the unit does get synced, without any interaction on my side.

I guess either the action shouldn't fail, or it should wait until the unit is synced. I would prefer the latter.

Tags: landscape
Revision history for this message
Björn Tillenius (bjornt) wrote :
Revision history for this message
James Page (james-page) wrote :

The resync operation is async to the start of the daemon, but yes I agree that the resume operation should wait for a reasonable amount of time for the unit to go from blocked->active as a result of the resync op.

Changed in percona-cluster (Juju Charms Collection):
status: New → Triaged
importance: Undecided → Medium
milestone: none → 16.10
Revision history for this message
James Page (james-page) wrote :

charm has a util for checking cluster state.

def cluster_in_sync():

we should plug this into the action.

James Page (james-page)
Changed in percona-cluster (Juju Charms Collection):
milestone: 16.10 → 17.01
James Page (james-page)
Changed in charm-percona-cluster:
importance: Undecided → Medium
status: New → Triaged
Changed in percona-cluster (Juju Charms Collection):
status: Triaged → Invalid
James Page (james-page)
Changed in charm-percona-cluster:
assignee: nobody → James Page (james-page)
milestone: none → 17.05
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-percona-cluster (master)

Fix proposed to branch: master
Review: https://review.openstack.org/441136

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (master)

Reviewed: https://review.openstack.org/441136
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=12af56c74693949d89d3e75526f156e34c2ee8f5
Submitter: Jenkins
Branch: master

commit 12af56c74693949d89d3e75526f156e34c2ee8f5
Author: James Page <email address hidden>
Date: Fri Mar 3 13:52:42 2017 +0000

    Recheck sync status when assessing status of unit

    After a resume operation, or after a boot following a prolonged
    period of downtime, it may take some time for the local unit
    to re-sync with its peers.

    Update the function that assesses local unit status to recheck
    sync status up to 10 times (with an increasing delay between
    recheck) before declaring that the unit is in a blocked state.

    Change-Id: Idaa960ade4c52c9ebba6c65a55bade6b22e90cdc
    Closes-Bug: 1626450

Changed in charm-percona-cluster:
status: In Progress → Fix Committed
James Page (james-page)
Changed in charm-percona-cluster:
milestone: 17.05 → 17.08
James Page (james-page)
Changed in charm-percona-cluster:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.