ec2 says agent-state-info: 'cannot run instances: No default subnet for availability zone: ''us-east-1e''. (InvalidInput)'

Bug #1388860 reported by Jay R. Wren
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins
1.20
Fix Released
High
Nate Finch
1.21
Fix Released
High
Andrew Wilkins

Bug Description

This is happening far too often for juju to not do something about it.

Almost 50% of deploys of bundles (because more machines means I am more likely to hit this error) have this issue.

It would be nice if there was a retry for this error.

    agent-state-info: 'cannot run instances: No default subnet for availability zone:
      ''us-east-1e''. (InvalidInput)'

Revision history for this message
Uros Jovanovic (uros-jovanovic) wrote :

Me too.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Which juju version are you running?

tags: added: deploy ec2-provider network
Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 1.21-alpha3
Revision history for this message
Curtis Hovey (sinzui) wrote :

I am making this bug critical as it might be an AWS behaviour juju needs to quickly support. If this issue is an intermittent AWS cloud issue, we will lower this bug to High.

Revision history for this message
Jay R. Wren (evarlast) wrote :

I'm using 1.21-alpha2.

I think others on this bug are using different versions.

Revision history for this message
Uros Jovanovic (uros-jovanovic) wrote :

I'm running 1.20.11-trusty-amd64. Using us-east-1 zone. Switching to eu-west-1 seems to be working fine.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.21-alpha3 → 1.21-beta1
Revision history for this message
John A Meinel (jameinel) wrote :

This sounds like a case of someone using default-vpc and it has a default subnet on most availability zones (a-d) but does *not* have a default subnet on us-east-1e.
Switching to eu-west-1 means you are using a different set of configured networks (either not in VPC or with a different set of subnets configured).

We are looking at configuring VPC on Amazon in this cycle, but I don't believe that bigger solution is reasonable for 1.21.

Its possible we could just skip zones that don't have a subnet configured?

This sort of failure sounds very specific to a given user's account and how their EC2 environment is configured, as such it will be hard for us to reproduce and fix. If it fails reasonably reliably for you, we can try to determine some workarounds and work with you to see if it fixes things.

Revision history for this message
Jay R. Wren (evarlast) wrote : Re: [Bug 1388860] Re: ec2 says agent-state-info: 'cannot run instances: No default subnet for availability zone: ''us-east-1e''. (InvalidInput)'

I did not think I had any VPC, but you prompted me to check and indeed I
had one. I've since deleted it. Hopefully I'll never see this message again.

Still, the behavior from juju seemed to place new machines into the same AZ
every time a new machine was added. This meant no new machines were usable.
It would be nice to document that juju either does or does not choose an AZ
or under what conditions it chooses an AZ.

On Wed, Nov 5, 2014 at 11:06 PM, John A Meinel <email address hidden>
wrote:

> This sounds like a case of someone using default-vpc and it has a default
> subnet on most availability zones (a-d) but does *not* have a default
> subnet on us-east-1e.
> Switching to eu-west-1 means you are using a different set of configured
> networks (either not in VPC or with a different set of subnets configured).
>
> We are looking at configuring VPC on Amazon in this cycle, but I don't
> believe that bigger solution is reasonable for 1.21.
>
> Its possible we could just skip zones that don't have a subnet
> configured?
>
> This sort of failure sounds very specific to a given user's account and
> how their EC2 environment is configured, as such it will be hard for us
> to reproduce and fix. If it fails reasonably reliably for you, we can
> try to determine some workarounds and work with you to see if it fixes
> things.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1388860
>
> Title:
> ec2 says agent-state-info: 'cannot run instances: No default
> subnet for availability zone: ''us-east-1e''. (InvalidInput)'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1388860/+subscriptions
>

Andrew Wilkins (axwalk)
Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I've reproduced the error in eu-west-1, where I have a default VPC, by deleting the default subnet from one of the AZs. I'm going to change Juju to (for now) skip over AZs where the region has default VPC but the AZ has no default subnet.

@jam and anyone else wanting to test this sort of thing in the future, see "Availability" at http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/default-vpc.html. Unless you test all the regions, you've probably got a default VPC in one of the regions. Just be aware that deleting a default subnet is not something you revert without contacting AWS support.

> Still, the behavior from juju seemed to place new machines into the same AZ
> every time a new machine was added. This meant no new machines were usable.

Not sure what you mean here. Why does it mean no new machines were usable?

Revision history for this message
Andrew Wilkins (axwalk) wrote :

>> Still, the behavior from juju seemed to place new machines into the same AZ
>> every time a new machine was added. This meant no new machines were usable.
>
> Not sure what you mean here. Why does it mean no new machines were usable?

Oh, I think I understand now. Juju will attempt to provision into the AZs in order of least-to-most populated (per service). Since the error returned wasn't classified as one that should cause Juju to retry in the next AZ, it would just fail and then the next attempt would do exactly the same thing. Will be fixed by skipping over.

Andrew Wilkins (axwalk)
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
John A Meinel (jameinel) wrote :

1.20 also has the round-robin logic, so will need to learn this as well.

Ian Booth (wallyworld)
Changed in juju-core:
milestone: 1.21-beta1 → 1.22
Revision history for this message
Jay R. Wren (evarlast) wrote :

Andrew, Awesome! I'm very excited that a good fix was found for this. Huge
Thanks.

One comment about the round robin logic. When a machine is created for the
first unit of a service, it would be nice if no AZ was selected.

From
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#using-regions-availability-zones-launching

"When you launch your initial instances, we recommend that you accept the
default Availability Zone, because this enables us to select the best
Availability Zone for you based on system health and available capacity."

Thanks again.

On Thu, Nov 6, 2014 at 6:30 AM, Ian Booth <email address hidden> wrote:

> ** Also affects: juju-core/1.21
> Importance: Undecided
> Status: New
>
> ** Changed in: juju-core/1.21
> Importance: Undecided => High
>
> ** Changed in: juju-core/1.21
> Status: New => Fix Committed
>
> ** Changed in: juju-core/1.21
> Assignee: (unassigned) => Andrew Wilkins (axwalk)
>
> ** Changed in: juju-core
> Milestone: 1.21-beta1 => 1.22
>
> ** Changed in: juju-core/1.21
> Milestone: None => 1.21-beta1
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1388860
>
> Title:
> ec2 says agent-state-info: 'cannot run instances: No default
> subnet for availability zone: ''us-east-1e''. (InvalidInput)'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1388860/+subscriptions
>

Revision history for this message
Andrew Wilkins (axwalk) wrote :

No worries Jay. Thanks for the suggestion, I think that's reasonable. I'd go a little bit further, and say that whenever we provision an instance and no AZ is considered a better candidate than any other (from Juju's POV), then we should not specify it. I'll raise a bug for this.

Revision history for this message
Jay R. Wren (evarlast) wrote :

YES! YES!

It makes me very happy to hear about this change. Huge Thanks!

On Thu, Nov 6, 2014 at 8:58 PM, Andrew Wilkins <<email address hidden>
> wrote:

> No worries Jay. Thanks for the suggestion, I think that's reasonable.
> I'd go a little bit further, and say that whenever we provision an
> instance and no AZ is considered a better candidate than any other (from
> Juju's POV), then we should not specify it. I'll raise a bug for this.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1388860
>
> Title:
> ec2 says agent-state-info: 'cannot run instances: No default
> subnet for availability zone: ''us-east-1e''. (InvalidInput)'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1388860/+subscriptions
>

Curtis Hovey (sinzui)
no longer affects: juju-core/1.20
Revision history for this message
Andrew Wilkins (axwalk) wrote :

This does still affect 1.20. See lp:1398060.

Revision history for this message
Curtis Hovey (sinzui) wrote :
Curtis Hovey (sinzui)
Changed in juju-core:
importance: Critical → High
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.