juju-core

No way to recover from provider errors

Bug #1187372 reported by Marco Ceppi on 2013-06-04

This bug report is a duplicate of: Bug #1227450: juju does not retry provisioning against transient provider errors. Edit Remove

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Triaged	Low	Unassigned

Bug Description

Using juju-core 1.11.1.1, deploying to OpenStack (HP Cloud), got this error:

  "8":
    agent-state-info: '(error: cannot set up groups: failed to create a security group
      with name: juju-ubuntu-discourse

I was able to delete old security groups from units that no longer exist (and get my sec groups limit increased) however I'm not able to remove the units/services attached to these machines or destroy the machines or request that juju try again in provisioning the machines. I'm stuck with a dirty status

machines:
  "0":
    agent-state: started
    agent-version: 1.11.1.1
    dns-name: 15.185.229.194
    instance-id: "1126293"
    series: precise
  "8":
    agent-state-info: '(error: cannot set up groups: failed to create a security group
      with name: juju-ubuntu-discourse

      caused by: request (https://az-3.region-a.geo-1.compute.hpcloudsvc.com/v1.1/10293909253024/os-security-groups)
      returned unexpected status: 400; error info: {"badRequest": {"message": "Quota
      exceeded, too many security groups.", "code": 400}})'
    instance-id: pending
    series: precise
services:
  mysql:
    charm: local:precise/mysql-306
    exposed: false
    life: dying
    units:
      mysql/0:
        agent-state: pending
        machine: "8"
  mysql-slave:
    charm: local:precise/mysql-306
    exposed: false
    life: dying
    units:
      mysql-slave/0:
        agent-state: pending
        machine: "9"

Tags:

Marco Ceppi (marcoceppi) on 2013-06-04

summary:

- No way to recover from provider limits
+ No way to recover from provider errors

Revision history for this message

Marco Ceppi (marcoceppi) wrote on 2013-06-05:

Another example

  "2":
    agent-state-info: '(error: Get https://s3.amazonaws.com/juju-dist/?marker=&delimiter=&prefix=tools%2Fjuju-:
      read tcp 72.21.214.159:443: connection reset by peer)'
    instance-id: pending
    series: precise

Nick Veitch (evilnick) on 2013-06-07

tags:

added: doc

Revision history for this message

William Reade (fwereade) wrote on 2013-06-15:

Those machines will become destroyable once the units on them are all removed; working as currently intended. Should this behaviour be changed to allow machine destruction to induce unit destruction (and, indeed, destruction of all containers and contained units, etc)?

That seems slightly drastic; but perhaps it's no worse than destroy-service destroying all relations automatically. However, it is somewhat complex to implement that behaviour cleanly, and it's unlikely to be a very high priority soon; will the workaround described above suffice for the time being?

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → Low

Revision history for this message

Stuart Bishop (stub) wrote on 2013-06-16: Re: [Bug 1187372] Re: No way to recover from provider errors

On Sun, Jun 16, 2013 at 4:29 AM, William Reade
<email address hidden> wrote:
> Those machines will become destroyable once the units on them are all
> removed; working as currently intended. Should this behaviour be changed
> to allow machine destruction to induce unit destruction (and, indeed,
> destruction of all containers and contained units, etc)?
>
> That seems slightly drastic; but perhaps it's no worse than destroy-
> service destroying all relations automatically. However, it is somewhat
> complex to implement that behaviour cleanly, and it's unlikely to be a
> very high priority soon; will the workaround described above suffice for
> the time being?

The behavior I saw was that the units could not be destroyed, because
they could never be setup, because the machine they are waiting on has
failed. Catch 22. Maybe that is a separate bug, and destroy-service
and remove-unit should be able to remove units that have not yet been
setup?

--
Stuart Bishop <email address hidden>

Curtis Hovey (sinzui) on 2013-10-25

tags:

added: docs
removed: doc

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1227450 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.