application units can't get resource from controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
When multiple units try and pull a resource(s) at once everything seems to lock up, some units are able to get the resource, and some fail pulling it from the controller. I have worked around this in my spark charm to some degree by putting units that can't get the resource in a blocked state and have them naturally retry again when its their time. This ends up working itself out e.g. all of my units end up eventually getting the resource, but its for sure an extreme hack.
This can be reproduced by running the following command:
juju deploy cs:~omnivector/
Exhibited in this juju show here https:/
The charm code that accounts for this demented block and return spin lock mechanism is here https:/
similarly for layer-hadoop-base, https:/
description: | updated |
description: | updated |
description: | updated |
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
I'm guessing the issue could be that if multiple units are requesting the
same resource, we aren't handling the caching and queuing on the controller
correctly. The controller should be downloading the resource on demand
(some charms have very large resources, so we don't want to cache them
unless they are needed). My guess is that multiple requests for the same
resource is causing confusion in the queuing system, and not having one
request start the download, and the rest be blocked until that is finished.
On Thu, Apr 25, 2019 at 6:15 AM james beedy <email address hidden> wrote:
> ** Description changed: spark --constraints "instance- type=t3. medium" /youtu. be/lirfA5a9Xik? t=1351 /github. com/omnivector- solutions/ layer-spark- master/ lib/charms/ layer/spark_ base.py# L27,L30 and here /github. com/omnivector- solutions/ layer-spark- master/ reactive/ spark_base. py#L76, L81 /github. com/omnivector- solutions hadoop- base/blob/ master/ lib/charms/ layer/hadoop_ base.py# L25,L28 /github. com/omnivector- solutions/ layer-hadoop- master/ lib/charms/ layer/hadoop_ base.py# L24,L28 spark --constraints "instance- type=t3. medium" /youtu. be/lirfA5a9Xik? t=1351 /github. com/omnivector- solutions/ layer-spark- master/ lib/charms/ layer/spark_ base.py# L27,L30 and here /github. com/omnivector- solutions/ layer- blob/master/ lib/charms/ layer/spark_ base.py# L27,L30 and here /github. com/omniv...
>
> When multiple units try and pull a resource(s) at once everything seems
> to lock up, some units are able to get the resource, and some fail
> pulling it from the controller. I have worked around this in my spark
> charm to some degree by putting units that can't get the resource in a
> blocked state and have them naturally retry again when its their time.
> This ends up working itself out e.g. all of my units end up eventually
> getting the resource, but its for sure an extreme hack.
>
> This can be reproduced by running the following command:
>
> juju deploy cs:~omnivector/
> -n 10
>
> Exhibited in this juju show here https:/
> +
> + The charm code that accounts for the block and return demented spin lock
> + mechanism is here https:/
> + base/blob/
> + https:/
> + base/blob/
> +
> + similarly for layer-hadoop-base, https:/
> + /layer-
> + and https:/
> + base/blob/
>
> ** Description changed:
>
> When multiple units try and pull a resource(s) at once everything seems
> to lock up, some units are able to get the resource, and some fail
> pulling it from the controller. I have worked around this in my spark
> charm to some degree by putting units that can't get the resource in a
> blocked state and have them naturally retry again when its their time.
> This ends up working itself out e.g. all of my units end up eventually
> getting the resource, but its for sure an extreme hack.
>
> This can be reproduced by running the following command:
>
> juju deploy cs:~omnivector/
> -n 10
>
> Exhibited in this juju show here https:/
>
> - The charm code that accounts for the block and return demented spin lock
> - mechanism is here https:/
> - base/blob/
> + The charm code that accounts for this demented block and return spin
> + lock mechanism is here https:/
> + spark-base/
> https:/