Juju does not support EC2 with no default VPC

Bug #1321442 reported by Willem Roos
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Dimiter Naydenov

Bug Description

Some AWS accounts have no default VPC. It appears Juju does not support this. Eg:

C:\>juju bootstrap
WARNING ignoring environments.yaml: using bootstrap config in file "<path removed>\\Juju\\environments\\amazon.jenv"
Launching instance
ERROR bootstrap failed: cannot start bootstrap instance: cannot set up groups: No default VPC for this user (VPCIdNotSpecified)
ERROR cannot start bootstrap instance: cannot set up groups: No default VPC for this user (VPCIdNotSpecified)

Willem Roos (wroos)
description: updated
Revision history for this message
Curtis Hovey (sinzui) wrote :

juju doesn't require VPC. Also amazon.jenv should have been destroyed when juju failed to bootstrap.

I think something else is something werong with the config. Can you provide a redacted version of your environments.yaml?

Changed in juju-core:
status: New → Incomplete
Revision history for this message
Willem Roos (wroos) wrote :

Here it is:

---- 8< ----
default: amazon
environments:
    openstack:
        type: openstack
    hpcloud:
        type: openstack
    manual:
        type: manual
        bootstrap-host: somehost.example.com
    maas:
        type: maas
        maas-server: 'http://192.168.1.1/MAAS/'
        maas-oauth: '<add your OAuth credentials from MAAS here>'
    local:
        type: local
    joyent:
      type: joyent
    amazon:
        type: ec2
        region: eu-west-1
        access-key: <access key here>
        secret-key: <secret key here>
    azure:
        type: azure
        location: West US
        management-subscription-id: <00000000-0000-0000-0000-000000000000>
        management-certificate-path: /home/me/azure.pem
        storage-account-name: abcdefghijkl
---- 8< ----

Juju won’t bootstrap because I have no default VPC in AWS.

Connecting to AWS is fine:

---- 8< ----
C:\> ec2-describe-regions
REGION eu-west-1 ec2.eu-west-1.amazonaws.com
REGION sa-east-1 ec2.sa-east-1.amazonaws.com
REGION us-east-1 ec2.us-east-1.amazonaws.com
REGION ap-northeast-1 ec2.ap-northeast-1.amazonaws.com
REGION us-west-2 ec2.us-west-2.amazonaws.com
REGION us-west-1 ec2.us-west-1.amazonaws.com
REGION ap-southeast-1 ec2.ap-southeast-1.amazonaws.com REGION ap-southeast-2 ec2.ap-southeast-2.amazonaws.com
---- 8< ----

Attached also the output of

C:\> juju bootstrap -v --debug

Revision history for this message
John A Meinel (jameinel) wrote :

This is strange, as our default regions for our shared account (where we do a lot of deploying) doesn't have default VPC either.

I know we're starting to do more networking work which will require default VPC, but it should only require it if you try to use the advanced networking.

I went ahead and deleted your attachment, because it contained your secret keys, but I'll post a sanitized copy.

Revision history for this message
John A Meinel (jameinel) wrote :

sanitized version of previous attachment.

Changed in juju-core:
importance: Undecided → High
assignee: nobody → Dimiter Naydenov (dimitern)
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

After some research, I think I've found the problem. See this thread:
https://forums.aws.amazon.com/thread.jspa?messageID=466803&tstart=0

It appears the user is in VPC-only mode for this EC2 region, and their default VPC got deleted. Hence, trying to create a security group without specifying a subnet id while bootstrapping fails, because it has to be set, but the default is missing.

This is a special case it seems, I'm not sure how often users are in the same situation.

We can fix it and allow the user to specify a VPC and subnet ids in environments.yaml (or perhaps as bootstrap arguments?), but changing the existing provider code is not trivial. I'd very much like to avoid this until container networking is done.

We can easily detect now if we'll run into this problem during bootstrap - when ec2.AccountAttributes("supported-platforms") lists just "VPC", and display some helpful message like "contact AWS support to restore your default VPC" :) It's not easy, I've been trying for a few days now.

The complexity comes from the need to make all existing EC2 calls VPC-aware, but in order not to make having a VPC a requirement, the calls will only use it when set.

Curtis Hovey (sinzui)
Changed in juju-core:
status: Incomplete → Triaged
importance: High → Medium
Changed in juju-core:
assignee: Dimiter Naydenov (dimitern) → nobody
tags: added: ec2-provider network
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

We still do not support the case where there's no default VPC on the account. We *do* support the case when there's no VPC support at all though (e.g. when using a classic EC2 pre-VPC account, but that's becoming less of an issue as it's not possible to create a legacy EC2-Classic account anymore).

I propose to have the following changes implemented:
1. Add a "vpc-id" optional environments.yaml config setting, which is not set by default (the current behavior is preserved - i.e. assume the existence of a default VPC and discover its id during bootstrap).
2. When set though, Juju will validate the given id (e.g. vpc-a1b2c3d4) and ensure it passes some sanity checks, bailing out with an error early on failure. Checks should include, at minimum:
2.1. The given VPC has one subnet per availability zone in the chosen region (this is needed to make sure the automatic availability zone distribution logic for service units will still work).
2.2. Each of these subnets have "automatic public IP" attribute set (this is needed to ensure we can expose services with units in those zones).
2.3. There are other things that can prevent juju from using the given VPC, like restrictive VPC routes, network ACLs or rules, but at this point we should assume the user knows what they're doing when specifying a VPC id
3. Once the specified vpc-id is validated, ensure the EC2 provider uses the id when:
3.1. Running instances - in addition to the zone, the VPC id needs to be explicitly specified.
3.2. Creating/using security groups - pass the VPC id explicitly on creation, use group ids instead of names (as required by AWS VPC API).
4. Displaying the VPC id for each machine in juju status, to give the user some feedback.

I think we should bump this in priority and plan to fix it in the 1.25 release line.

Changed in juju-core:
milestone: none → 1.25.0
assignee: nobody → Dimiter Naydenov (dimitern)
importance: Medium → High
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Can we have the case where there is only one VPC, but it isn't default? In that case can we just auto-pick it?

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

No, unfortunately just any VPC won't do. The absence of a default VPC either means it the default VPC was deleted on purpose (for newer AWS accounts) or didn't exist in the first place (for older AWS accounts). In the former case having a non-default VPC means the user configured it manually, so we need to verify Juju can use it (see the points above about subnets checks), and the I think it's better for the user to explicitly tell Juju to use the VPC.

We can still try to discover all VPCs and verify each of them in order to display a nicer message (e.g. VPC "vpc-xxxxxxxx" does not meet Juju requirements: no usable subnets in zones: us-east-1a, us-east-1c. Please, create subnets with MapPublicIPOnLaunch attribute set in those zones, or specify "vpc-id" of a compatible VPC: vpc-yyyyzzzz, vpc-ttttvvvvv).

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Some updates. I've managed so successfully bootstrap a juju environment on a non-default VPC while manual testing my fix. I've discovered the following additional requirements for a non-default VPC must be met (in addition to having at least one subnet per AZ with MapPublicIPOnLaunch set): the VPC needs an Internet Gateway (IGW) attached, and the routing table for the VPC must use the IGW as a default route.

I'm in the process of updating goamz to support discovery of IGWs and routes for a VPC, after which I'll do some more testing and propose the fix.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.25-alpha1 → 1.25-beta1
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I'm putting this on hold for now, but I'll revisit it soon as part of fixing network model MVP in 1.25.0 release.

Changed in juju-core:
status: In Progress → Won't Fix
status: Won't Fix → Triaged
Revision history for this message
Kapil Thangavelu (hazmat) wrote :

please don't require public internet access on subnets that sort of kills the ability to use this in typical vpc envs that segment network activity, ie. db and app servers subnet tiers do not have public connectivity, but front end web do.

Changed in juju-core:
milestone: 1.25-beta1 → 1.25-beta2
Revision history for this message
Cheryl Jennings (cherylj) wrote :

Dimiter, was this fixed as part of other work for 1.25.0 as mentioned in comment #11?

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

No, unfortunately. However, we do plan to fix this in one of the next releases (between Jan and Apr, 2016).

Changed in juju-core:
milestone: 1.25-beta2 → 1.25.1
Changed in juju-core:
milestone: 1.25.1 → 1.26-alpha1
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.26-alpha1 → 1.26-alpha2
Changed in juju-core:
milestone: 1.26-alpha2 → 1.26-beta1
Changed in juju-core:
milestone: 1.26-beta1 → 2.0-alpha2
Changed in juju-core:
milestone: 2.0-alpha2 → 2.0-alpha3
Changed in juju-core:
milestone: 2.0-alpha3 → 2.0-beta4
Changed in juju-core:
milestone: 2.0-beta4 → 2.0-rc1
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta5 → 2.0-rc1
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta6 → 2.0-beta7
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Some updates on the approach and scope of this fix. Currently Juju just assumes a default VPC exists, even though we still verify if it's there in order to decide whether networking is supported or not. The end goal will be for Juju to allow more complex networking setups on AWS, with a customized VPC and subnets, including support for "private" subnets (where instances start without an automatic public IP and are not accessible from the internet). In order to get there, we first need to ensure we can be explicit about which VPC ID should be used throughout provider/ec2. To provide comparable UX on a non-default VPC juju needs to verify whether the VPC is configured to at least allow access to it from the internet, so juju client can connect to the controller. If that's not true Juju can't really work out-of-the box in this case, as it won't even be able to complete bootstrapping. To ensure VPC can be accessed from the outside, at least one of its subnets route tables (or the main VPC route table) must have an Internet Gateway linked to it. Another requirement obviously is that the VPC has at least one usable subnet. Juju can verify both of these before trying to bootstrap and give a meaningful error message to the user how to rectify the situation.

So the fix I'm working on will allow:
1. with no user-specified vpc-id, detecting the default VPC ID and using it explicitly where needed to make AWS API calls
2. once a vpc-id is chosen at boostrap, it cannot be changed for the lifetime of the model
3. with a user-specified vpc-id (e.g. --config vpc-id=vpc-a1b2c3d4 passed at bootstrap), at minimum validate the following:
3.1. the vpc with that ID exists (for the user - i.e. it might not be visible due to lack of permissions)
3.2. the vpc has an internet gateway attached to it
3.3. the vpc has at least one subnet, which can be accessed from the internet (and use that subnet for the bootstrap node)
4. provide a way for the user to force Juju to use a vpc which otherwise will have failed checks 3.2 or 3.3. above (e.g. --config force-vpc-id ?) Unless forced, juju will error out and refuse to bootstrap

The above should enable a lot more complex VPC networking scenarios, while preserving backwards-compatible UX as much as possible, and allow a power user to configure the VPC how they want. It won't guarantee Juju will work with a non-default VPC *exactly the same way* as with a default VPC, and that perhaps deserves a WARNING during bootstrap.

Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

It's surprisingly hard to properly test the changes, but I have a most of the basic stuff working (live tested only, WIP):

https://github.com/juju/juju/pull/5309/files

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I think the fix above is ready for review now.

Changed in juju-core:
importance: High → Critical
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Fix is ready to land and needs approval, after a couple of reviews, lots of live testing and a ton of unit testing added.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

The proposed fix has landed on master. However, after a discussion with rogpeppe and jwren today, I'll propose a follow-up that allows changing the vpc-id for hosted models.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

To clarify, the approach in the fix is correct as far as the controller is concerned, but it does not allow to use a different VPC ID for hosted models, hence the needed follow-up.

Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta7 → none
milestone: none → 2.0-beta7
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.