Instances can be stuck in BACKUP status

Bug #1252897 reported by Vipul Sabhaya
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
In Progress
Low
Yang Youseok

Bug Description

If a Backup is issued for an instance, and the backup hangs or the GuestAgent never processes the message, the Backup state can remain in NEW state. This prevents any other Backups from occurring on that instance, and leaves the Instance a BACKUP state from then on.

How we report instance state:

        ### Check if there is a backup running for this instance
        if Backup.running(self.id):
            return InstanceStatus.BACKUP

Option 1:
- Immediately set the Backup State to be BUILDING on the BackupAgent -- which means that _only_ the API ever sets a backup to NEW
- If the Backup state is NEW and the created date > some configured time, this means the Guest was not able to consume the message, so a periodic poll should set it to FAILED

Option 2:
- Implement a periodic task that finds all Backups in NEW or BUILDING state that have exceeded the backup duration window and marks them as failed.

Vipul Sabhaya (vipuls)
Changed in trove:
importance: Undecided → Critical
Revision history for this message
Denis M. (dmakogon) wrote :

Questions about Opt #2:
 - who will set duration window ?
- does two periodic task wound't load VM memory ?

I think it would be better not to use another one periodic task, Opt. #1 look easier.

Revision history for this message
Craig Vyvial (cp16net) wrote :

Well what if the agent got the message after the period configured?
Would the states ever change?

Changed in trove:
assignee: nobody → Nikhil Manchanda (slicknik)
milestone: none → icehouse-2
Revision history for this message
Denis M. (dmakogon) wrote :

It could get change state after restart.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to trove (master)

Fix proposed to branch: master
Review: https://review.openstack.org/65553

Changed in trove:
status: New → In Progress
Changed in trove:
milestone: icehouse-2 → next
Changed in trove:
importance: Critical → Wishlist
importance: Wishlist → Medium
Changed in trove:
importance: Medium → Low
Revision history for this message
Yang Youseok (ileixe) wrote :

Although, this bug is quite old, it seems to be appeared until recent version.

I think it's reasonable to consider NEW state not to RUNNING state because there is no way to revert the failed backup which guest-agent do not received for any reasons. (In my case, upgrading Liberty to Newton makes incompatible oslo.context and did not backup message stucking NEW state).

For caller side, this NEW state seems to be only used for blocking 'delete backup during running'.
Since it actually dost not start backup at all, it does not matters I think.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/556176

Changed in trove:
assignee: Nikhil Manchanda (slicknik) → Yang Youseok (ileixe)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on trove (master)

Change abandoned by Yang Youseok (<email address hidden>) on branch: master
Review: https://review.openstack.org/556176
Reason: Invalid commit.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.