Vsphere does not support container addressability

Bug #1592811 reported by Larry Michel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Wishlist
John A Meinel

Bug Description

I am using vsphere as provider and I am using placement to machines so I can condense. What ends up happening is that networking is setup on these units with this 10.0.4.0 network which I cannot access. For one of my services, I forgot to do placement and that unit got a dhcp address from my dhcp server and I could only access that unit.

This is from the bundle:
series: trusty
services:
  ceilometer:
    charm: cs:trusty/ceilometer
    num_units: 1
    to:
    - lxd:3
  ceilometer-agent:
    charm: cs:trusty/ceilometer-agent
  cinder:
    charm: cs:trusty/cinder
    num_units: 1
    hwreqs:
      storage: None
    options:
      block-device: None
      ceph-osd-replication-count: 1
      glance-api-version: 2
      overwrite: 'true'
      remove-missing-force: true
    to:
    - 0
machines:
  0:
    series: trusty
  1:
    series: trusty
  2:
    series: trusty
  3:
    series: trusty

After all the placement statements were removed in the 2nd bundle, deployment needed 15 units (compared with 5 for dense) and it ran out of resources on the last 2 which was expected. However, I could access all the units since they got DHCP addresses from DHCP server in my network.

I don't know if this issue is for all dense deployment or just deployment with sphere as provider so keeping this specific to vsphere.

I have included both bundles and status in the attachment.

This is density:

model: default
machines:
  "0":
    juju-status:
      current: started
      since: 15 Jun 2016 06:37:11Z
      version: 2.0-beta8
    dns-name: 10.0.4.23
    instance-id: juju-72bde5-0
    machine-status:
      current: pending
      since: 15 Jun 2016 06:34:40Z
    series: trusty
    containers:
      0/lxd/0:
        juju-status:
          current: started
          since: 15 Jun 2016 06:42:22Z
          version: 2.0-beta8
        dns-name: 10.0.4.23
        instance-id: juju-72bde5-0-lxd-0
        machine-status:
          current: running
          message: Container started
          since: 15 Jun 2016 06:38:28Z
        series: trusty
      0/lxd/1:
        juju-status:
          current: started
          since: 15 Jun 2016 06:48:20Z
          version: 2.0-beta8
        dns-name: 10.0.4.181
        instance-id: juju-72bde5-0-lxd-1
        machine-status:
          current: running
          message: Container started
          since: 15 Jun 2016 06:39:16Z
        series: trusty

and this is without machine placement:

model: default
machines:
  "0":
    juju-status:
      current: started
      since: 15 Jun 2016 12:15:28Z
      version: 2.0-beta8
    dns-name: 10.245.42.253
    instance-id: juju-493f7f-0
    machine-status:
      current: pending
      since: 15 Jun 2016 12:12:46Z
    series: trusty
    hardware: arch=amd64 cpu-cores=2 cpu-power=2000 mem=2000M root-disk=8192M
  "1":
    juju-status:
      current: started
      since: 15 Jun 2016 12:16:43Z
      version: 2.0-beta8
    dns-name: 10.245.42.8
    instance-id: juju-493f7f-1
    machine-status:
      current: pending
      since: 15 Jun 2016 12:12:54Z
    series: trusty
    hardware: arch=amd64 cpu-cores=2 cpu-power=2000 mem=2000M root-disk=8192M
  "2":
    juju-status:
      current: started
      since: 15 Jun 2016 12:17:42Z
      version: 2.0-beta8
    dns-name: 10.245.44.30
    instance-id: juju-493f7f-2
    machine-status:
      current: pending
      since: 15 Jun 2016 12:13:01Z
    series: trusty
    hardware: arch=amd64 cpu-cores=2 cpu-power=2000 mem=2000M root-disk=8192M
...
  "13":
    juju-status:
      current: error
      message: 'Can''t create instance in any of availability zones, last error: The
        amount of CPU resource available in the parent resource pool is insufficient
        for the operation.'
      since: 15 Jun 2016 12:25:47Z
    instance-id: pending
    machine-status:
      current: pending
      since: 15 Jun 2016 12:14:24Z
    series: trusty
  "14":
    juju-status:
      current: error
      message: 'Can''t create instance in any of availability zones, last error: The
        amount of CPU resource available in the parent resource pool is insufficient
        for the operation.'
      since: 15 Jun 2016 12:26:27Z
    instance-id: pending
    machine-status:
      current: pending
      since: 15 Jun 2016 12:14:33Z
    series: trusty

Versions:

jenkins@s9-lmic-trusty:~/vmware$ dpkg -l *juju*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-====================-===============-===============-==============================================
ii juju 2.0-beta8-0ubun all next generation service orchestration system
ii juju-2.0 2.0-beta8-0ubun amd64 Juju is devops distilled - client
un juju-core <none> <none> (no description available)
rc juju-core2 2.0-beta1-0ubun amd64 Juju is devops distilled - client
ii juju-deployer 0.8.0~bzr185~55 amd64 A tool to deploy complex stacks of services us
un juju2 <none> <none> (no description available)
ii python-jujuclient 0.50.3-1~ubuntu amd64 Python API client for juju-core
un python2.7-jujuclient <none> <none> (no description available)
jenkins@s9-lmic-trusty:~/vmware$ uname -a
Linux s9-lmic-trusty 3.13.0-87-generic #133-Ubuntu SMP Tue May 24 18:32:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Larry Michel (lmic) wrote :
Larry Michel (lmic)
summary: - 2.0 beta8: networking broken for dense deployment and vsphere as
+ 2.0 beta8: networking issues for dense deployment and vsphere as
provider
Revision history for this message
Cheryl Jennings (cherylj) wrote : Re: 2.0 beta8: networking issues for dense deployment and vsphere as provider

It almost looks like the lxd addresses are not getting filtered from the machines hosting lxd containers. If you can get it, it would be helpful to see from the hosts that are getting the 10.0.4.* addresses:

1 - /var/log/cloud-init-output.log
2 - the contents of /etc/network/interfaces

Changed in juju-core:
status: New → Incomplete
Revision history for this message
Larry Michel (lmic) wrote :
Download full text (8.2 KiB)

Cheryl, I cannot get to these nodes. They are also being assigned 10.0.4.* addresses:

jenkins@s9-lmic-trusty:~/vmware$ juju status
[Services]
NAME STATUS EXPOSED CHARM
ceilometer unknown false cs:trusty/ceilometer-236
ceilometer-agent false cs:trusty/ceilometer-agent-232
cinder maintenance false cs:trusty/cinder-253
glance maintenance false cs:trusty/glance-250
heat maintenance false cs:trusty/heat-233
keystone maintenance false cs:trusty/keystone-255
mongodb unknown false cs:trusty/mongodb-37
mysql unknown false cs:trusty/mysql-38
neutron-api maintenance false cs:trusty/neutron-api-242
neutron-gateway maintenance false cs:trusty/neutron-gateway-228
neutron-openvswitch false cs:trusty/neutron-openvswitch-235
nova-cloud-controller maintenance false cs:trusty/nova-cloud-controller-285
nova-vmware maintenance false local:trusty/nova-compute-vmware-132
nsx-transport-node false local:trusty/nsx-transport-node-2
openstack-dashboard maintenance false cs:trusty/openstack-dashboard-240
rabbitmq-server unknown false cs:trusty/rabbitmq-server-47
swift-proxy maintenance false cs:trusty/swift-proxy-53
swift-storage unknown false cs:trusty/swift-storage-229

[Relations]
SERVICE1 SERVICE2 RELATION TYPE
ceilometer ceilometer cluster peer
ceilometer ceilometer-agent ceilometer-service regular
ceilometer keystone identity-service regular
ceilometer mongodb database regular
ceilometer rabbitmq-server amqp regular
cinder cinder cluster peer
cinder glance image-service regular
cinder keystone identity-service regular
cinder mysql shared-db regular
cinder nova-cloud-controller cinder-volume-service regular
cinder rabbitmq-server amqp regular
glance glance cluster peer
glance keystone identity-service regular
glance mysql shared-db regular
glance nova-cloud-controller image-service regular
glance nova-vmware image-service regular
glance swift-proxy object-store regular
heat heat ...

Read more...

Changed in juju-core:
status: Incomplete → New
Revision history for this message
Larry Michel (lmic) wrote :

This was for trying to ssh to machine 1.. That one showed the correct address so I am confused about the 10.0.3.1 address:

jenkins@s9-lmic-trusty:~/vmware$ juju ssh 0
ssh: connect to host 10.0.3.1 port 22: Connection timed out
jenkins@s9-lmic-trusty:~/vmware$ juju ssh 1
ssh: connect to host 10.0.4.187 port 22: Connection timed out
jenkins@s9-lmic-trusty:~/vmware$

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Is there any way to get the IP through the provider and ssh to it directly?

We may also be able to look at the machine-0.log to find what addresses were set for the machine.

There's also an old PR to filter lxd addresses that I pinged John about (since it looks like it had been forgotten). It would help, if this is indeed the problem: https://github.com/juju/juju/pull/5035

Revision history for this message
Larry Michel (lmic) wrote : Re: [Bug 1592811] Re: 2.0 beta8: networking issues for dense deployment and vsphere as provider

I was able to get the IP addresses through the provider and ssh into the
host and from the host, I was able to get to the lxd containers. Please see
the attached logs Cheryl.

Btw, I did see / out of space on that first node, and I opened bug 1594865
for it.

On Mon, Jun 20, 2016 at 10:12 PM, Cheryl Jennings <
<email address hidden>> wrote:

> Is there any way to get the IP through the provider and ssh to it
> directly?
>
> We may also be able to look at the machine-0.log to find what addresses
> were set for the machine.
>
> There's also an old PR to filter lxd addresses that I pinged John about
> (since it looks like it had been forgotten). It would help, if this is
> indeed the problem: https://github.com/juju/juju/pull/5035
>
>

Revision history for this message
Cheryl Jennings (cherylj) wrote : Re: 2.0 beta8: networking issues for dense deployment and vsphere as provider

I wonder if the provider is actually returning the bridge addresses as provider addresses, based on the image of the console showing that the machine has addresses: 10.245.35.33, 10.0.4.1 and 10.0.3.1.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Yes, I see in a simplified log that the provider is giving juju the wrong addresses. What's really interesting is that it seems to flip back and forth between the IP of a hosted container, and the address for every device on the machine:

http://paste.ubuntu.com/17661459/

We'll need to dig into the govmomi code to figure out what's going on.

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.0.0
Revision history for this message
Larry Michel (lmic) wrote :

Cheryl,

I've increased the root-disk to 30GB using contraint root-disk=30G, so we shouldn't run out of space with the logs. I also verified that VM disk size was set to 30GB in latest deployment.

https://pastebin.canonical.com/159351/

I'll leave in this state for you to access.

tags: added: oil-2.0
Revision history for this message
Anastasia (anastasia-macmood) wrote :

OIL can't deploy to a cluster - marking as Critical.

Changed in juju-core:
importance: High → Critical
milestone: 2.0.0 → 2.0-beta15
Changed in juju-core:
assignee: nobody → Richard Harding (rharding)
Revision history for this message
Anastasia (anastasia-macmood) wrote :

There is a similar bug for openstack -https://bugs.launchpad.net/juju-core/+bug/1555808

Changed in juju-core:
milestone: 2.0-beta15 → 2.0-beta16
David Britton (dpb)
tags: added: landscape
Revision history for this message
Andrew McDermott (frobware) wrote :

Larry, is it possible to get access to a vsphere setup to try and reproduce this?

Changed in juju-core:
assignee: Richard Harding (rharding) → Andrew McDermott (frobware)
Changed in juju-core:
status: Triaged → In Progress
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta16 → none
milestone: none → 2.0-beta16
Changed in juju:
status: In Progress → Confirmed
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-beta16 → 2.0-beta17
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-beta17 → 2.0-beta18
Changed in juju:
milestone: 2.0-beta18 → 2.0-rc1
Revision history for this message
Larry Michel (lmic) wrote :
Download full text (4.2 KiB)

Spoke to Ryan Harper about this issue. These address are from LXD and per my observation and my conversation with Ryan, it looks like juju is not briding the "lxd containers with the base network adapter".

I will attach in next comment /etc/network/interfaces* from the host and one of the LXD containers that were started.

Here's ifconfig -a output from host:

ubuntu@ubuntuguest:~$ ifconfig -a
br-int Link encap:Ethernet HWaddr 2a:f6:42:ab:29:4c
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

eth0 Link encap:Ethernet HWaddr 00:50:56:87:7f:00
          inet addr:10.245.61.59 Bcast:10.245.63.255 Mask:255.255.192.0
          inet6 addr: fe80::250:56ff:fe87:7f00/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:905460 errors:0 dropped:0 overruns:0 frame:0
          TX packets:575091 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:494370660 (494.3 MB) TX bytes:161798787 (161.7 MB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:23855 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23855 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2082356 (2.0 MB) TX bytes:2082356 (2.0 MB)

lxdbr0 Link encap:Ethernet HWaddr fe:68:f0:04:3b:3e
          inet addr:10.0.0.1 Bcast:0.0.0.0 Mask:255.255.255.0
          inet6 addr: fe80::8876:28ff:fe0c:38de/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:171738 errors:0 dropped:0 overruns:0 frame:0
          TX packets:147601 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:42889954 (42.8 MB) TX bytes:123654711 (123.6 MB)

ovs-system Link encap:Ethernet HWaddr 06:dc:bb:54:d2:db
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

veth2EFDMX Link encap:Ethernet HWaddr fe:68:f0:04:3b:3e
          inet6 addr: fe80::fc68:f0ff:fe04:3b3e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:171738 errors:0 dropped:0 overruns:0 frame:0
          TX packets:147602 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:45294286 (45.2 MB) TX bytes:123654789 (123.6 MB)

and from container:

ubuntu@ubuntuguest:~$ ssh 10.0.0.137
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-93-generic x86_64)

 * Documentation: https://help.ubuntu.com/

 System information disabled due to load higher than 2.0

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

19 p...

Read more...

Revision history for this message
Larry Michel (lmic) wrote :

/etc/interfaces* tarballs for both vm and the lxd container:

ubuntu@ubuntuguest:~$ sudo lxc list
sudo: unable to resolve host ubuntuguest
+---------------------+---------+-------------------+------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+---------------------+---------+-------------------+------+------------+-----------+
| juju-facfb5-1-lxd-0 | RUNNING | 10.0.0.137 (eth0) | | PERSISTENT | 0 |
+---------------------+---------+-------------------+------+------------+-----------+

Revision history for this message
Larry Michel (lmic) wrote :

reattaching.. previous was from container only.

Changed in juju:
milestone: 2.0-rc1 → 2.0.0
Curtis Hovey (sinzui)
Changed in juju:
status: Confirmed → Triaged
Changed in juju:
milestone: 2.0.0 → 2.1.0
Revision history for this message
Andrew McDermott (frobware) wrote :

I removed myself as the assignee as I'm not working this bug atm.

Changed in juju:
assignee: Andrew McDermott (frobware) → nobody
Changed in juju:
assignee: nobody → Richard Harding (rharding)
Changed in juju:
milestone: 2.1.0 → 2.2.0
Larry Michel (lmic)
tags: added: cdo-qa-blocker
Revision history for this message
John A Meinel (jameinel) wrote :

So is the root comment just this:
  Spoke to Ryan Harper about this issue. These address are from LXD and per my observation and my conversation with Ryan, it looks like juju is not briding the "lxd containers with the base network adapter".

Some open questions about how things "should" work:

1) There are many cases of providers where this is just not allowed (most of the big clouds don't like extra MAC addresses in their network that aren't the machines that were explicitly provisioned.)
2) Where are the IP addresses getting assigned if we did bridge to the host adapter? Just DHCP provided addresses? Is this stable, standard practice with vSphere?
3) Because of (1) we *don't* default to bridging to the host adapter. We have had a couple of requests to do so (another request for it has to do with manually provisioned machines).

There is quite a bit of work to be done to make reasoned, determined choice to select a host device to bridge. Things to deal with what device when there is more than one device, etc. We don't currently track "spaces" in vSphere. If we implement that, and then have a better feel for whether we have an explicit location for where we get IP addresses (2), then it would be reasonable to bridge when we have an expectation for it to work.

Changed in juju:
assignee: Richard Harding (rharding) → John A Meinel (jameinel)
Curtis Hovey (sinzui)
Changed in juju:
importance: Critical → Wishlist
summary: - 2.0 beta8: networking issues for dense deployment and vsphere as
- provider
+ Vsphere does not support container addressability
Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
Anastasia (anastasia-macmood) wrote :

I agree. Marking it as a duplicate of bug # 1614364

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.