Restore from backup frequently fails, success is rare. There is no single cause for failure when restoring a single or HA state server. Aws is not better than Hp. The restore process does not generate a log of meaningful data to examine. What is certain is that the process is too brittle, there are many race conditions that are vaguely reported by restore as:
A. (During set up of the new bootstrap) error: cannot open state: no reachable servers
B. (During set up of the new bootstrap) Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable) E: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?")
C. (Starting Juju machine agent (jujud-machine-0)) error: cannot restore bootstrap machine: cannot get public address of bootstrap machine: cannot get machine 0: EOF
D. (After restore completes): juju status exits with an error, this happens for 10 minutes then the test gives up.
I am marking this as a critical regression because for several weeks, the restore tests always passed. In truth, Juju stable has never passed these tests.