LXD containers fail to download on a slow-ish internet connection

Bug #1885893 reported by Peter Jose De Sousa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Wishlist
Unassigned

Bug Description

Hi,

When deploying bundles on MAAS 2.7.1 or 2.8, lxd containers will begin to download as normal, but as more containers are started and start their respective installs more stress is applied to the network, and the later containers fail to download.

MAAS/Juju will retry this download like on line 21: https://pastebin.canonical.com/p/TTpNPRNS4V/

It will then fail again, and will not try again, and all remaining containers on that machine will subsequently fail (as there is only one container being downloaded for that machine).

Resulting in this result: https://pastebin.canonical.com/p/4MvjZjDVHh/

I am seeing this happen on internet connections around 40MBit/s to 100Mbit/s on multiple different MAAS deployments (Orangeboxes).

Many thanks,

Peter

Revision history for this message
Pen Gale (pengale) wrote :

I think the request here is to add a --limit or --sequential or --slow flag to juju deploy, to reduce the number of machines it will try to spin up simultaneously.

Either that, or we'd want to be able to control the timeout or number of retries when allocating a machine. (Overall, though, I think it would be preferred to limit the number of retries we need in the first place, rather than soaking the connection and then trying to handle the consequences after the fact.)

Changed in juju:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Peter Jose De Sousa (pjds) wrote :

Hi Pete,

I don't think this is a feature request, but more a degrading of features - when I watch the containers download, it will go to retry the failure, then the retry of the failure will fail in itself (it says it's going to retry 10 times, but realisitically tries 2 times).

Cheers

Revision history for this message
Alberto Donato (ack) wrote :

Sorry I don't have much context here, what's the relationship between MAAS and LXD containers being deployed there?

Changed in maas:
status: New → Incomplete
no longer affects: maas
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.