spaces and subnet constraints do not work on AWS with bundles

Bug #1659639 reported by Samuel Cozannet on 2017-01-26
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
Witold Krecicki
2.2
High
John A Meinel
2.3
High
Witold Krecicki

Bug Description

Context:

* Deploying in AWS in a VPC that has 2 public subnets, 2 private subnets

Sequence:
juju bootstrap \
 --config "vpc-id=vpc-56416e32" \
 --constraints "instance-type=m3.medium root-disk=128G" \
 --bootstrap-series=xenial \
 --credential default \
 --config vpc-id-force=true \
 aws/eu-west-1 \
 k8s-aws
juju add-space public
added space "public" with no subnets
juju add-space private
added space "private" with no subnets
juju add-subnet subnet-ca4f7cbc private
added subnet with ProviderId "subnet-ca4f7cbc" in space "private"
juju add-subnet subnet-8fb68aeb private
added subnet with ProviderId "subnet-8fb68aeb" in space "private"
juju add-subnet subnet-c56d5eb3 public
added subnet with ProviderId "subnet-c56d5eb3" in space "public"
juju add-subnet subnet-9cd2eef8 public
added subnet with ProviderId "subnet-9cd2eef8" in space "public"

Using bundle:

series: xenial
services:
  easyrsa:
     charm: cs:~containers/easyrsa-6
    num_units: 1
    contraints: "instance-type=t2.medium spaces=private"
  etcd:
    charm: cs:~containers/etcd-23
    num_units: 3
    contraints: "instance-type=m3.medium spaces=private"
  flannel:
    charm: cs:~containers/flannel-10
  kubeapi-load-balancer:
    charm: cs:~containers/kubeapi-load-balancer-6
    expose: true
    num_units: 1
    contraints: "instance-type=t2.medium spaces=public"
  kubernetes-master:
    charm: cs:~containers/kubernetes-master-11
    num_units: 1
    contraints: "instance-type=m3.medium spaces=private"
  kubernetes-worker:
    charm: cs:~containers/kubernetes-worker-13
    expose: true
    num_units: 3
    contraints: "instance-type=m4.large root-disk=64G spaces=public"

(removed unecessary stuff)

Output:

juju status --color
...

Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0 10.0.251.199 Certificate Authority connected.
etcd/0* active idle 1 10.0.252.10 2379/tcp Healthy with 3 known peers.
etcd/1 active idle 2 10.0.251.202 2379/tcp Healthy with 3 known peers.
etcd/2 active idle 3 10.0.251.164 2379/tcp Healthy with 3 known peers.
kubeapi-load-balancer/0* active idle 4 34.248.132.128 443/tcp Loadbalancer ready.
kubernetes-master/0* active idle 5 10.0.251.41 6443/tcp Kubernetes master running.
  flannel/0* active idle 10.0.251.41 Flannel subnet 10.1.103.1/24
kubernetes-worker/0* active idle 6 10.0.252.198 80/tcp,443/tcp Kubernetes worker running.
  flannel/1 active idle 10.0.252.198 Flannel subnet 10.1.72.1/24
kubernetes-worker/1 active idle 7 34.249.72.81 80/tcp,443/tcp Kubernetes worker running.
  flannel/2 active idle 34.249.72.81 Flannel subnet 10.1.93.1/24
kubernetes-worker/2 active idle 8 10.0.251.46 80/tcp,443/tcp Kubernetes worker running.
  flannel/3 active idle 10.0.251.46 Flannel subnet 10.1.29.1/24

Machine State DNS Inst id Series AZ
0 started 10.0.251.199 i-0a04399664027aac4 xenial eu-west-1a
1 started 10.0.252.10 i-0cf6b3b1c445702fb xenial eu-west-1b
2 started 10.0.251.202 i-0d73bc0747307383f xenial eu-west-1a
3 started 10.0.251.164 i-03260cd5bfca2ec36 xenial eu-west-1a
4 started 34.248.132.128 i-02900c2dac8cebd70 xenial eu-west-1b
5 started 10.0.251.41 i-00daea7e56b303982 xenial eu-west-1a
6 started 10.0.252.198 i-0f4dcd06804f1260b xenial eu-west-1b
7 started 34.249.72.81 i-0f20a12712404c4bc xenial eu-west-1a
8 started 10.0.251.46 i-0d556ffdaaf7a3e27 xenial eu-west-1a

So machines 6 to 8 should be in public and only one of them is. Also, all machines deployed with m3.medium and did not respect the constraints

Adding units has the same consequences, they get allocated to random subnets.

Expected behavior:
* Units are assigned to the proper subnet(s) in the proper space.

Note: I have had the same behavior with both 2.0.2 and 2.1 beta4.

OK I think I identified the rogue section of code in this.

provider/ec2/environ.go, in the StartInstance section, line 464 and down.

Here there is a reverse selection of subnets depending on the AZ, to map to spaces. The comment section states "a subnet in EC2 can span a single AZ, so here we build the reverse map zonesToSubnet (...)

It is not because a subnet can only span a single AZ that an AZ can span a single subnet. For example, a private and a public subnet can co-exist in the same AZ.

Therefore, when there is a constraint on network spaces, it is not enough to list the subnets of the AZ and pic a random one like the code does for now.
What is required is to find all subnets that are at the same time in the AZ and in the requested space.

line 502:
  } else if args.Constraints.HaveSpaces() {
   subnetIDsForZone, subnetErr = findSubnetIDsForAvailabilityZone(zone, args.SubnetsToZones)
  }

needs to be changed to reflect that intersection of space and AZ.

Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.2.0
Marco Ceppi (marcoceppi) on 2017-01-31
tags: added: adoption cdk kubernetes
John A Meinel (jameinel) wrote :

SubnetsToZones should only contain a mapping of subnets that are in the requested space, mapping to Zones that have those subnets.

It should *not* contain all subnets in the zone. So by the time we get to that code we should only have Private (or Public) subnets listed.

It is supposedly built up in:
apiserver/provisioner/provisioninginfo.go line 216 in "machineSubnetsAndZones".

Where it first asks "give me the Space" and then subnets := space.Subnets(), which it then walks to find subnet.AvailabilityZone() later on.

Given the symptoms it sounds like we have a bug, but probably not right there.

John A Meinel (jameinel) on 2017-02-07
summary: - spaces and subnet constraints do not work on AWS
+ spaces and subnet constraints do not work on AWS with bundles
John A Meinel (jameinel) on 2017-02-20
Changed in juju:
milestone: 2.2.0 → 2.1.1
Anastasia (anastasia-macmood) wrote :

This work will addressed at a later date than originally intended.
I am triaging this into 2.3 at this stage.

Changed in juju:
milestone: 2.1.1 → 2.3.0
Anastasia (anastasia-macmood) wrote :

Re-triaging to 2.2 :)

Changed in juju:
milestone: 2.3.0 → 2.2.0
Download full text (5.7 KiB)

Hello,

Is this still confirmed in 2.2? It's really an annoying issue, I have to
manually step every deploy out of bundles, and we're starting to productize
PoCs for k8s, which will heavily rely on this.

Thanks,
Best,
Sam

--
Samuel Cozannet
Cloud, Big Data and IoT Strategy Team
Business Development - Cloud and ISV Ecosystem
Changing the Future of Cloud
Ubuntu <http://ubuntu.com> / Canonical UK LTD <http://canonical.com> / Juju
<https://jujucharms.com>
<email address hidden>
mob: +33 616 702 389
skype: samnco
Twitter: @SaMnCo_23
[image: View Samuel Cozannet's profile on LinkedIn]
<https://es.linkedin.com/in/scozannet>

On Fri, Feb 24, 2017 at 12:25 PM, Anastasia <<email address hidden>
> wrote:

> Re-triaging to 2.2 :)
>
> ** Changed in: juju
> Milestone: 2.3.0 => 2.2.0
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1659639
>
> Title:
> spaces and subnet constraints do not work on AWS with bundles
>
> Status in juju:
> Triaged
>
> Bug description:
> Context:
>
> * Deploying in AWS in a VPC that has 2 public subnets, 2 private
> subnets
>
> Sequence:
> juju bootstrap \
> --config "vpc-id=vpc-56416e32" \
> --constraints "instance-type=m3.medium root-disk=128G" \
> --bootstrap-series=xenial \
> --credential default \
> --config vpc-id-force=true \
> aws/eu-west-1 \
> k8s-aws
> juju add-space public
> added space "public" with no subnets
> juju add-space private
> added space "private" with no subnets
> juju add-subnet subnet-ca4f7cbc private
> added subnet with ProviderId "subnet-ca4f7cbc" in space "private"
> juju add-subnet subnet-8fb68aeb private
> added subnet with ProviderId "subnet-8fb68aeb" in space "private"
> juju add-subnet subnet-c56d5eb3 public
> added subnet with ProviderId "subnet-c56d5eb3" in space "public"
> juju add-subnet subnet-9cd2eef8 public
> added subnet with ProviderId "subnet-9cd2eef8" in space "public"
>
> Using bundle:
>
> series: xenial
> services:
> easyrsa:
> charm: cs:~containers/easyrsa-6
> num_units: 1
> contraints: "instance-type=t2.medium spaces=private"
> etcd:
> charm: cs:~containers/etcd-23
> num_units: 3
> contraints: "instance-type=m3.medium spaces=private"
> flannel:
> charm: cs:~containers/flannel-10
> kubeapi-load-balancer:
> charm: cs:~containers/kubeapi-load-balancer-6
> expose: true
> num_units: 1
> contraints: "instance-type=t2.medium spaces=public"
> kubernetes-master:
> charm: cs:~containers/kubernetes-master-11
> num_units: 1
> contraints: "instance-type=m3.medium spaces=private"
> kubernetes-worker:
> charm: cs:~containers/kubernetes-worker-13
> expose: true
> num_units: 3
> contraints: "instance-type=m4.large root-disk=64G spaces=public"
>
> (removed unecessary stuff)
>
> Output:
>
> juju status --color
> ...
>
> Unit Workload Agent Machine Public address
> Ports Message
> easyrsa/0* active idle 0 ...

Read more...

John A Meinel (jameinel) wrote :

I just confirmed it locally withseries: xenial
services:
  ul:
    charm: "cs:~jameinel/ubuntu-lite-4"
    num_units: 2
    to:
      - "0"
      - "lxd:0"
    bindings:
      "": dbc

2 things surfaced:

1) It does know about the binding, because the container fails to provision because it can't join the 'dbc' space.
2) But it doesn't provision the host machine in the right space. If I do:
juju deploy cs:~jameinel/ubuntu-lite-4 --bind dbc

Then it does the right thing and grabs a machine from my 172.30.100/24 subnet, but via the bundle the machine got 172.30.0/24.

Changed in juju:
assignee: nobody → John A Meinel (jameinel)
status: Triaged → In Progress
John A Meinel (jameinel) wrote :

You can work around this by passing a constraint for the machine:
series: xenial
services:
  ul-b:
    charm: "cs:~jameinel/ubuntu-lite-4"
    num_units: 2
    to:
      - "0"
      - "lxd:0"
    bindings:
      "": dbc
machines:
  "0":
    constraints: "arch=amd64 spaces=dbc"

This correctly provisions into a correct subnet.

I'm seeing weird behavior where the machine doesn't end up with a proper configuration for cloud-init. It comes up but never tries to install Juju.

Also, after having done "juju deploy ubuntu-lite --bind space" if I then do "juju add-unit" the added unit is *not* coming up in the right subnets. Hopefully the 'deploy-from-a-bundle' is the same bug as the "add-unit" bug.

John A Meinel (jameinel) wrote :

This is weird... It feels like we're trying to use a space, but end up using the wrong one for some reason.

$ juju spaces
Space Subnets
db
dbb
dbc 172.30.100.0/24
       172.30.101.0/24
       172.30.102.0/24
pub 172.30.2.0/24
pub-2 172.30.200.0/24
       172.30.201.0/24
       172.30.202.0/24

(dbc is the 100 range, pub-2 is the 200 range)

$ juju deploy --bind pub-2
$ juju show-machine 5
machines:
  "5":
...
    network-interfaces:
      eth0:
        ip-addresses:
        - 172.30.1.209
        mac-address: 02:59:e4:10:5f:11
        is-up: true
$ juju deploy --bind dbc
$ juju show-machine 5
machines:
  "6":
...
    network-interfaces:
      eth0:
        ip-addresses:
        - 172.30.200.111
        mac-address: 06:bd:24:3c:ac:67
        space: pub-2

So the instance that was supposed to be in "dbc" ended up in pub-2 and the machine that was supposed to be in 'pub-2' ended up in "".

Download full text (6.6 KiB)

I found a way to overcome this, but it's really a workaround.

If you use a --to zone=us-east-1a AND a constraint on the space, then
machines go to the right subnet.

Hopefully it will help.
++
Sam

--
Samuel Cozannet
Cloud, Big Data and IoT Strategy Team
Business Development - Cloud and ISV Ecosystem
Changing the Future of Cloud
Ubuntu <http://ubuntu.com> / Canonical UK LTD <http://canonical.com> / Juju
<https://jujucharms.com>
<email address hidden>
mob: +33 616 702 389
skype: samnco
Twitter: @SaMnCo_23
[image: View Samuel Cozannet's profile on LinkedIn]
<https://es.linkedin.com/in/scozannet>

On Tue, Mar 21, 2017 at 3:05 PM, John A Meinel <email address hidden>
wrote:

> This is weird... It feels like we're trying to use a space, but end up
> using the wrong one for some reason.
>
> $ juju spaces
> Space Subnets
> db
> dbb
> dbc 172.30.100.0/24
> 172.30.101.0/24
> 172.30.102.0/24
> pub 172.30.2.0/24
> pub-2 172.30.200.0/24
> 172.30.201.0/24
> 172.30.202.0/24
>
> (dbc is the 100 range, pub-2 is the 200 range)
>
> $ juju deploy --bind pub-2
> $ juju show-machine 5
> machines:
> "5":
> ...
> network-interfaces:
> eth0:
> ip-addresses:
> - 172.30.1.209
> mac-address: 02:59:e4:10:5f:11
> is-up: true
> $ juju deploy --bind dbc
> $ juju show-machine 5
> machines:
> "6":
> ...
> network-interfaces:
> eth0:
> ip-addresses:
> - 172.30.200.111
> mac-address: 06:bd:24:3c:ac:67
> space: pub-2
>
> So the instance that was supposed to be in "dbc" ended up in pub-2 and
> the machine that was supposed to be in 'pub-2' ended up in "".
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1659639
>
> Title:
> spaces and subnet constraints do not work on AWS with bundles
>
> Status in juju:
> In Progress
>
> Bug description:
> Context:
>
> * Deploying in AWS in a VPC that has 2 public subnets, 2 private
> subnets
>
> Sequence:
> juju bootstrap \
> --config "vpc-id=vpc-56416e32" \
> --constraints "instance-type=m3.medium root-disk=128G" \
> --bootstrap-series=xenial \
> --credential default \
> --config vpc-id-force=true \
> aws/eu-west-1 \
> k8s-aws
> juju add-space public
> added space "public" with no subnets
> juju add-space private
> added space "private" with no subnets
> juju add-subnet subnet-ca4f7cbc private
> added subnet with ProviderId "subnet-ca4f7cbc" in space "private"
> juju add-subnet subnet-8fb68aeb private
> added subnet with ProviderId "subnet-8fb68aeb" in space "private"
> juju add-subnet subnet-c56d5eb3 public
> added subnet with ProviderId "subnet-c56d5eb3" in space "public"
> juju add-subnet subnet-9cd2eef8 public
> added subnet with ProviderId "subnet-9cd2eef8" in space "public"
>
> Using bundle:
>
> series: xenial
> services:
> easyrsa:
> charm: cs:~containers/easyrsa-6
> num_units: 1
> contraints: "instance-type=t2.medium spaces=private"
> etcd:
> charm: cs:~containers/etcd-23
> num_units: 3
>...

Read more...

Curtis Hovey (sinzui) on 2017-03-24
Changed in juju:
milestone: 2.2-beta1 → 2.2-beta2
Curtis Hovey (sinzui) on 2017-03-30
Changed in juju:
milestone: 2.2-beta2 → 2.2-beta3
Download full text (5.3 KiB)

Hello,

Can we put this on high priority? It is a mandatory fix to get a very
important prospect on board. We have a demo with them on 10th of April, I
would prefer to have it fixed for peace of mind by then.

Thx
Sam

On Mar 30, 2017 22:51, "Curtis Hovey" <email address hidden> wrote:

> ** Changed in: juju
> Milestone: 2.2-beta2 => 2.2-beta3
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1659639
>
> Title:
> spaces and subnet constraints do not work on AWS with bundles
>
> Status in juju:
> In Progress
>
> Bug description:
> Context:
>
> * Deploying in AWS in a VPC that has 2 public subnets, 2 private
> subnets
>
> Sequence:
> juju bootstrap \
> --config "vpc-id=vpc-56416e32" \
> --constraints "instance-type=m3.medium root-disk=128G" \
> --bootstrap-series=xenial \
> --credential default \
> --config vpc-id-force=true \
> aws/eu-west-1 \
> k8s-aws
> juju add-space public
> added space "public" with no subnets
> juju add-space private
> added space "private" with no subnets
> juju add-subnet subnet-ca4f7cbc private
> added subnet with ProviderId "subnet-ca4f7cbc" in space "private"
> juju add-subnet subnet-8fb68aeb private
> added subnet with ProviderId "subnet-8fb68aeb" in space "private"
> juju add-subnet subnet-c56d5eb3 public
> added subnet with ProviderId "subnet-c56d5eb3" in space "public"
> juju add-subnet subnet-9cd2eef8 public
> added subnet with ProviderId "subnet-9cd2eef8" in space "public"
>
> Using bundle:
>
> series: xenial
> services:
> easyrsa:
> charm: cs:~containers/easyrsa-6
> num_units: 1
> contraints: "instance-type=t2.medium spaces=private"
> etcd:
> charm: cs:~containers/etcd-23
> num_units: 3
> contraints: "instance-type=m3.medium spaces=private"
> flannel:
> charm: cs:~containers/flannel-10
> kubeapi-load-balancer:
> charm: cs:~containers/kubeapi-load-balancer-6
> expose: true
> num_units: 1
> contraints: "instance-type=t2.medium spaces=public"
> kubernetes-master:
> charm: cs:~containers/kubernetes-master-11
> num_units: 1
> contraints: "instance-type=m3.medium spaces=private"
> kubernetes-worker:
> charm: cs:~containers/kubernetes-worker-13
> expose: true
> num_units: 3
> contraints: "instance-type=m4.large root-disk=64G spaces=public"
>
> (removed unecessary stuff)
>
> Output:
>
> juju status --color
> ...
>
> Unit Workload Agent Machine Public address
> Ports Message
> easyrsa/0* active idle 0 10.0.251.199
> Certificate Authority connected.
> etcd/0* active idle 1 10.0.252.10
> 2379/tcp Healthy with 3 known peers.
> etcd/1 active idle 2 10.0.251.202
> 2379/tcp Healthy with 3 known peers.
> etcd/2 active idle 3 10.0.251.164
> 2379/tcp Healthy with 3 known peers.
> kubeapi-load-balancer/0* active i...

Read more...

John A Meinel (jameinel) wrote :

(it is on high, as indicated by the 'In Progress' being on my next-important-bug list.)

Download full text (5.1 KiB)

Thx, much appreciated :)

On Mar 31, 2017 07:35, "John A Meinel" <email address hidden> wrote:

> (it is on high, as indicated by the 'In Progress' being on my next-
> important-bug list.)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1659639
>
> Title:
> spaces and subnet constraints do not work on AWS with bundles
>
> Status in juju:
> In Progress
>
> Bug description:
> Context:
>
> * Deploying in AWS in a VPC that has 2 public subnets, 2 private
> subnets
>
> Sequence:
> juju bootstrap \
> --config "vpc-id=vpc-56416e32" \
> --constraints "instance-type=m3.medium root-disk=128G" \
> --bootstrap-series=xenial \
> --credential default \
> --config vpc-id-force=true \
> aws/eu-west-1 \
> k8s-aws
> juju add-space public
> added space "public" with no subnets
> juju add-space private
> added space "private" with no subnets
> juju add-subnet subnet-ca4f7cbc private
> added subnet with ProviderId "subnet-ca4f7cbc" in space "private"
> juju add-subnet subnet-8fb68aeb private
> added subnet with ProviderId "subnet-8fb68aeb" in space "private"
> juju add-subnet subnet-c56d5eb3 public
> added subnet with ProviderId "subnet-c56d5eb3" in space "public"
> juju add-subnet subnet-9cd2eef8 public
> added subnet with ProviderId "subnet-9cd2eef8" in space "public"
>
> Using bundle:
>
> series: xenial
> services:
> easyrsa:
> charm: cs:~containers/easyrsa-6
> num_units: 1
> contraints: "instance-type=t2.medium spaces=private"
> etcd:
> charm: cs:~containers/etcd-23
> num_units: 3
> contraints: "instance-type=m3.medium spaces=private"
> flannel:
> charm: cs:~containers/flannel-10
> kubeapi-load-balancer:
> charm: cs:~containers/kubeapi-load-balancer-6
> expose: true
> num_units: 1
> contraints: "instance-type=t2.medium spaces=public"
> kubernetes-master:
> charm: cs:~containers/kubernetes-master-11
> num_units: 1
> contraints: "instance-type=m3.medium spaces=private"
> kubernetes-worker:
> charm: cs:~containers/kubernetes-worker-13
> expose: true
> num_units: 3
> contraints: "instance-type=m4.large root-disk=64G spaces=public"
>
> (removed unecessary stuff)
>
> Output:
>
> juju status --color
> ...
>
> Unit Workload Agent Machine Public address
> Ports Message
> easyrsa/0* active idle 0 10.0.251.199
> Certificate Authority connected.
> etcd/0* active idle 1 10.0.252.10
> 2379/tcp Healthy with 3 known peers.
> etcd/1 active idle 2 10.0.251.202
> 2379/tcp Healthy with 3 known peers.
> etcd/2 active idle 3 10.0.251.164
> 2379/tcp Healthy with 3 known peers.
> kubeapi-load-balancer/0* active idle 4 34.248.132.128
> 443/tcp Loadbalancer ready.
> kubernetes-master/0* active idle 5 10.0.251.41
> 6443/tcp Kube...

Read more...

Changed in juju:
milestone: 2.2-beta3 → 2.2-beta4
Changed in juju:
milestone: 2.2-beta4 → 2.2-rc1
Changed in juju:
milestone: 2.2-rc1 → 2.2.0
Ian Booth (wallyworld) on 2017-06-06
Changed in juju:
milestone: 2.2.0 → 2.3-alpha1
John A Meinel (jameinel) wrote :

Removing the milestone (avoid kick-the-can), but leave it as a potential fix for the 2.2 series.

Changed in juju:
milestone: 2.3-beta1 → 2.3-beta2
Changed in juju:
milestone: 2.3-beta2 → none
Anastasia (anastasia-macmood) wrote :

@John A Meinel (jameinel),

Are you really working on the fix? If not, this report should probably not say "In Progress"... Also could you please fix the series? I do not think that we are likely to fix this for 2.2.x

John A Meinel (jameinel) wrote :

I'm not sure which milestone this was specifically completed in. But Witold already created a fix where the bindings defined in a bundle should become machine constraints.

Changed in juju:
status: In Progress → Fix Released
assignee: John A Meinel (jameinel) → Witold Krecicki (wpk)
milestone: none → 2.4-beta1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers