migration: REAP failure handling issues

Bug #1667162 reported by Menno Finlay-Smits
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Christian Muirhead

Bug Description

REAPFAILED and an error are returned when the API call to Reap fails. This results in the worker restarting instead of ending up in REAPFAILED.

Also, consider the possibility of the migrationmaster being killed because of model doc removal, before it gets to set the phase to REAPFAILED or DONE (insert a sleep to check).

Changed in juju:
importance: Undecided → High
Revision history for this message
Christian Muirhead (2-xtian) wrote :

I don't understand this bug - the code for `doREAP` returns `REAPFAILED, nil` on error (and always has as far as I can tell).

I haven't tried provoking the worker being killed by the model removal yet.

Revision history for this message
Christian Muirhead (2-xtian) wrote :

D'oh sorry - it didn't always, but it has since December 15th 2016, before this bug was created.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Sorry for the confusion. I suspect this ticket was created from a list I was maintaining and I probably forgot that I'd already dealt with the first part.

The second part is still a possible issue though. doREAP ends up removing the model doc which means there's a race where `w.killed()` might return true before the migration phase gets set to DONE or REAPFAILED. I suspect the fix is to check w.killed() *after* the `SetPhase` call. Thoughts?

Revision history for this message
Christian Muirhead (2-xtian) wrote :

I'm trying it now but I don't think that'll work - the migrationmaster facade is a model-specific one, so calling SetPhase after the model has gone will probably fail.

Changed in juju:
status: Triaged → In Progress
assignee: nobody → Christian Muirhead (2-xtian)
Revision history for this message
Christian Muirhead (2-xtian) wrote :

PR to change the API Reap call to update the migration phase here: https://github.com/juju/juju/pull/7647

Revision history for this message
Christian Muirhead (2-xtian) wrote :
Changed in juju:
status: In Progress → Fix Committed
Tim Penhey (thumper)
Changed in juju:
milestone: 2.3.0 → 2.3-beta2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.