No way to recover from provider errors

Bug #1187372 reported by Marco Ceppi
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
juju-core
Triaged
Low
Unassigned

Bug Description

Using juju-core 1.11.1.1, deploying to OpenStack (HP Cloud), got this error:

  "8":
    agent-state-info: '(error: cannot set up groups: failed to create a security group
      with name: juju-ubuntu-discourse

      caused by: request (https://az-3.region-a.geo-1.compute.hpcloudsvc.com/v1.1/10293909253024/os-security-groups)
      returned unexpected status: 400; error info: {"badRequest": {"message": "Quota
      exceeded, too many security groups.", "code": 400}})'
    instance-id: pending
    series: precise
  "9":
    agent-state-info: '(error: cannot set up groups: failed to create a security group
      with name: juju-ubuntu-discourse

      caused by: request (https://az-3.region-a.geo-1.compute.hpcloudsvc.com/v1.1/10293909253024/os-security-groups)
      returned unexpected status: 400; error info: {"badRequest": {"message": "Quota
      exceeded, too many security groups.", "code": 400}})'
    instance-id: pending
    series: precise

I was able to delete old security groups from units that no longer exist (and get my sec groups limit increased) however I'm not able to remove the units/services attached to these machines or destroy the machines or request that juju try again in provisioning the machines. I'm stuck with a dirty status

machines:
  "0":
    agent-state: started
    agent-version: 1.11.1.1
    dns-name: 15.185.229.194
    instance-id: "1126293"
    series: precise
  "8":
    agent-state-info: '(error: cannot set up groups: failed to create a security group
      with name: juju-ubuntu-discourse

      caused by: request (https://az-3.region-a.geo-1.compute.hpcloudsvc.com/v1.1/10293909253024/os-security-groups)
      returned unexpected status: 400; error info: {"badRequest": {"message": "Quota
      exceeded, too many security groups.", "code": 400}})'
    instance-id: pending
    series: precise
  "9":
    agent-state-info: '(error: cannot set up groups: failed to create a security group
      with name: juju-ubuntu-discourse

      caused by: request (https://az-3.region-a.geo-1.compute.hpcloudsvc.com/v1.1/10293909253024/os-security-groups)
      returned unexpected status: 400; error info: {"badRequest": {"message": "Quota
      exceeded, too many security groups.", "code": 400}})'
    instance-id: pending
    series: precise
services:
  mysql:
    charm: local:precise/mysql-306
    exposed: false
    life: dying
    units:
      mysql/0:
        agent-state: pending
        machine: "8"
  mysql-slave:
    charm: local:precise/mysql-306
    exposed: false
    life: dying
    units:
      mysql-slave/0:
        agent-state: pending
        machine: "9"

Tags: docs
Marco Ceppi (marcoceppi)
summary: - No way to recover from provider limits
+ No way to recover from provider errors
Revision history for this message
Marco Ceppi (marcoceppi) wrote :

Another example

  "2":
    agent-state-info: '(error: Get https://s3.amazonaws.com/juju-dist/?marker=&delimiter=&prefix=tools%2Fjuju-:
      read tcp 72.21.214.159:443: connection reset by peer)'
    instance-id: pending
    series: precise

Nick Veitch (evilnick)
tags: added: doc
Revision history for this message
William Reade (fwereade) wrote :

Those machines will become destroyable once the units on them are all removed; working as currently intended. Should this behaviour be changed to allow machine destruction to induce unit destruction (and, indeed, destruction of all containers and contained units, etc)?

That seems slightly drastic; but perhaps it's no worse than destroy-service destroying all relations automatically. However, it is somewhat complex to implement that behaviour cleanly, and it's unlikely to be a very high priority soon; will the workaround described above suffice for the time being?

Changed in juju-core:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 1187372] Re: No way to recover from provider errors

On Sun, Jun 16, 2013 at 4:29 AM, William Reade
<email address hidden> wrote:
> Those machines will become destroyable once the units on them are all
> removed; working as currently intended. Should this behaviour be changed
> to allow machine destruction to induce unit destruction (and, indeed,
> destruction of all containers and contained units, etc)?
>
> That seems slightly drastic; but perhaps it's no worse than destroy-
> service destroying all relations automatically. However, it is somewhat
> complex to implement that behaviour cleanly, and it's unlikely to be a
> very high priority soon; will the workaround described above suffice for
> the time being?

The behavior I saw was that the units could not be destroyed, because
they could never be setup, because the machine they are waiting on has
failed. Catch 22. Maybe that is a separate bug, and destroy-service
and remove-unit should be able to remove units that have not yet been
setup?

--
Stuart Bishop <email address hidden>

Curtis Hovey (sinzui)
tags: added: docs
removed: doc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.