cloud-init vSphere cloud provider DHCP unique hostname issue

Bug #1746455 reported by Calvin Hartwell on 2018-01-31
292
This bug affects 5 people
Affects Status Importance Assigned to Milestone
cloud-init
High
Chad Smith

Bug Description

We've had an issue recently when provisioning workloads on vSphere. Here is what happens:

1) Set vSphere as the cloud provider in juju (bootstrap)

2) Deploy workload on vSphere, in our case, Canonical Kubernetes (CDK)

3) Machines are deployed using VMDK, but when they boot the initial hostname is Ubuntu (for all machines) until juju changes it during deployment.

4) DHCP gives each machine a lease, but the DNS entries are all messed up because of identical hostnames.

5) If the machines are rebooted they will receive the correct DNS entries. Note that juju will set the correct hostname eventually, but not when the initial DHCP lease is given to the host.

Basically cloud-init is not giving newly provisioned machines on vSphere their correct hostname before the DHCP lease is given, meaning that every instance during DHCP has the hostname 'ubuntu' which messes up the entries on the DNS server.

Related Bugs:

 * https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1600766
 * https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/480
 * https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/171

Related branches

tags: added: cloud-init juju k8s kubernetes
tags: added: dhcp vsphere
removed: kubernetes
description: updated
Chris Gregan (cgregan) on 2018-02-01
tags: added: cdo-qa-blocker
Chris Gregan (cgregan) on 2018-02-01
tags: removed: cdo-qa-blocker
tags: added: cdo-qa-blocker
Chad Smith (chad.smith) on 2018-02-01
Changed in cloud-init:
status: New → In Progress
assignee: nobody → Chad Smith (chad.smith)
importance: Undecided → High
David Britton (davidpbritton) wrote :

In trying to reproduce this issue, I'm currently stuck on: https://bugs.launchpad.net/juju/+bug/1747048

David Britton (davidpbritton) wrote :

Hi Calvin --

First of all, it's very weird that dhcp is allowing clients to set their hostnames. This is not the way typical clouds work, in fact, the issue that is similar with LXD triggered a change in LXD to no longer allow this. It creates a very flaky situation on the cloud itself if instances can hijack DHCP/DNS. That said, perhaps this configuration is common in the vsphere world. I'm not sure.

Can you give setup details of the vsphere that is being used and answer the following questions?

1) What is providing DHCP and DNS?
2) What networks do you have configured/are used with conjure-up & juju?
3) What version of vsphere is being used?
4) What version of Juju/conjure-up is being used?
5) What is the specific failure when you deploy CDK with conjure-up. Like, I choose all the options, click deploy with default instance sizes, then what? What fails that should be working?

Thanks.

Changed in cloud-init:
status: In Progress → Incomplete
Download full text (6.6 KiB)

Hi David,

Cloud is a totally different story to the VMware world in this case, but
this bug should really easy to reproduce. It sounds familiar to your LXD
issue. Sorry for the delayed response, I was travelling back home from the
customer site for last 5 hours.

1) DNS and DHCP and provided by a Windows DNS server on the same subnet as
the rest of the hosts. This configuration could be easily replicated using
a regular Ubuntu setup as a DHCP and DNS server.

The workflow is like this:

    - Juju is configured to use a VMware cloud provider. We add a new unit
to the model, lets say for regular Ubuntu.
    - This causes a new VM to be provisioned on VMware. Juju
creates/uploads/deploys a vmdk file which is used to deploy the new VM.
This VMDK appears to be cached and is not generated everytime a host is
provisioned for speed, so it must contain a static hostname.
    - Once the new VM boots, DHCP gives the machine a lease for an IP
address but also creates/updates the entry in DNS for that machine.

   The third step is where this breaks, because for some reason every VM on
boot has the hostname ubuntu, even though juju is passing the correct
hostname to cloud-init during boot, so the DNS server has a ton of IP
address entries for 'ubuntu' all relating to different hosts.

   The CDK workload will deploy, but when it comes to actually resolving
stuff, it won't function at all. K8s is heavily reliant on correct DNS
entries, so this doesn't work too well.

   This means that the customer has a few options, none of which are good:
      - reboot the machines to update the DNS entries by forcing DHCP to
update leases for new hostnames,
      - precreate ALL mac address entries in DHCP (unfeasible)
      - wait for DHCP leases to expire/set some short DHCP lease expiration
time so they are correctly (similar to fix 1, but does not require reboot
but has other downsides).

2) Conjure-up is not used at all. The jump host VM (running Ubuntu with the
juju client), the juju controller and all the machines managed by that
controller are all on the same subnet. Very flat network, everything can
contact everything else, etc.

3) The latest version of vSphere, I believe version 6.5

4) Conjure-up is not being used, I believe the latest version of Juju for
16.04 LTS is being used but I can check again on Monday. I think its 2.2.X
or 2.3.X going from memory?

5) Again, conjure-up is not used at all for the deployment (I am not going
to debate that in this topic). Please deploy the actual CDK bundle with
Juju instead: https://jujucharms.com/canonical-kubernetes/

Trying this with conjure-up just adds more mess to the mix, plus the
customer is not using it.

Infact, the CDK bundle should deploy correctly, the issue is not related to
the deployment of CDK itself but rather a timing issue between juju and
cloud-init where the newly provisioned VM does not have its hostname
correctly set before it receives its DHCP lease, causing the entries in DNS
to be incorrect.

Why does this matter? When you actually use CDK, it is highly reliant on
correct DNS entries, which don't exist until the machines are rebooted with
the correct hostnames, which causes DHCP to updat...

Read more...

Tim Penhey (thumper) wrote :

Is there a config value in cloud-init to say 'reboot once cloud-init done' ?

John A Meinel (jameinel) wrote :

There is
http://cloudinit.readthedocs.io/en/latest/topics/examples.html#reboot-poweroff-when-finished

Which mentions you can do something similar via "runcmd", which would mean
that with Heather's changes to pass through cloud-init you could add
"reboot", or possibly set something about power_state.

John
=:->

On Mon, Feb 5, 2018 at 12:28 AM, Tim Penhey <email address hidden>
wrote:

> Is there a config value in cloud-init to say 'reboot once cloud-init
> done' ?
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1746455
>
> Title:
> cloud-init vSphere cloud provider DHCP unique hostname issue
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1746455/+subscriptions
>

Scott Moser (smoser) wrote :

> - This causes a new VM to be provisioned on VMware. Juju
> creates/uploads/deploys a vmdk file which is used to deploy the new
> VM. This VMDK appears to be cached and is not generated everytime a
> host is provisioned for speed, so it must contain a static hostname.

The OVF disk contains a hostname... this is very much the platform
telling cloud-init what the hostname should be.

Lets call this 'platform-provided hostname'

> The third step is where this breaks, because for some reason every VM on
> boot has the hostname ubuntu, even though juju is passing the correct
> hostname to cloud-init during boot, so the DNS server has a ton of IP
> address entries for 'ubuntu' all relating to different hosts.

I assume that juju is sending this in cloud-config inside user-data,
the key being essentially:
  #cloud-config
  hostname: foo

Lets call this 'user-provided hostname' as it comes in user-data.

every VM has the hostname 'ubuntu' in /etc/hostname. Cloud-init does
not change this until after the that initial dhcp is done. As you've
found a reboot (or "bounce" of the network device) would have dhcp
re-read the hostname and update the dns records.

The user-provided should (and does) override the platform-provided
hostname.

We have basically 3 possible sources of hostname information:
a.) platform-provided (meta-data)
b.) direct user-provided user-data
c.) indirect user-provided hostname.
I assume juju doesn't do this, but it could provide user-data that
consisted of:
  #include http://some.resource/some/path
  #include http://some.other.resource/some/path

where one of the the urls there contains:
   #cloud-config
   hostname: foo

cloud-init doesn't currently process *any* user-data until after
networking is configured.

We have a fairly straight forward plan to fix 'a' with some re-factoring.
Fixing 'b' requires more re-factoring, essentially "peeking" at user-data
before all network resources are up.
Fixing 'c' is more work, and would probably involve cloud-init somehow
bouncing interfaces, or some other mechanism for convincing dhcp to
update its lease information.

Calvin Hartwell (calvinh) wrote :
Download full text (4.7 KiB)

Excellent breakdown of the bug scenario Scott, do you have any timeline on
the fixes for A, B or C?

I can update the customer based on your findings. Sorry for the lack of
responses all, I am still on-site working on other things.

Cheers,

- Calvin

On Mon, Feb 5, 2018 at 7:23 PM, Scott Moser <email address hidden>
wrote:

> > - This causes a new VM to be provisioned on VMware. Juju
> > creates/uploads/deploys a vmdk file which is used to deploy the new
> > VM. This VMDK appears to be cached and is not generated everytime a
> > host is provisioned for speed, so it must contain a static hostname.
>
> The OVF disk contains a hostname... this is very much the platform
> telling cloud-init what the hostname should be.
>
> Lets call this 'platform-provided hostname'
>
> > The third step is where this breaks, because for some reason every VM on
> > boot has the hostname ubuntu, even though juju is passing the correct
> > hostname to cloud-init during boot, so the DNS server has a ton of IP
> > address entries for 'ubuntu' all relating to different hosts.
>
> I assume that juju is sending this in cloud-config inside user-data,
> the key being essentially:
> #cloud-config
> hostname: foo
>
> Lets call this 'user-provided hostname' as it comes in user-data.
>
> every VM has the hostname 'ubuntu' in /etc/hostname. Cloud-init does
> not change this until after the that initial dhcp is done. As you've
> found a reboot (or "bounce" of the network device) would have dhcp
> re-read the hostname and update the dns records.
>
> The user-provided should (and does) override the platform-provided
> hostname.
>
> We have basically 3 possible sources of hostname information:
> a.) platform-provided (meta-data)
> b.) direct user-provided user-data
> c.) indirect user-provided hostname.
> I assume juju doesn't do this, but it could provide user-data that
> consisted of:
> #include http://some.resource/some/path
> #include http://some.other.resource/some/path
>
> where one of the the urls there contains:
> #cloud-config
> hostname: foo
>
> cloud-init doesn't currently process *any* user-data until after
> networking is configured.
>
> We have a fairly straight forward plan to fix 'a' with some re-factoring.
> Fixing 'b' requires more re-factoring, essentially "peeking" at user-data
> before all network resources are up.
> Fixing 'c' is more work, and would probably involve cloud-init somehow
> bouncing interfaces, or some other mechanism for convincing dhcp to
> update its lease information.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746455
>
> Title:
> cloud-init vSphere cloud provider DHCP unique hostname issue
>
> Status in cloud-init:
> Incomplete
>
> Bug description:
> We've had an issue recently when provisioning workloads on vSphere.
> Here is what happens:
>
> 1) Set vSphere as the cloud provider in juju (bootstrap)
>
> 2) Deploy workload on vSphere, in our case, Canonical Kubernetes (CDK)
>
> 3) Machines are deployed using VMDK, but when they boot the initial
> hostname is Ubuntu (for all machines) until juju changes it during
> ...

Read more...

David Britton (davidpbritton) wrote :

Hi Calvin --

We don't typically give ETAs on bug reports. Please ping me directly if you want more information.

This issue is currently incomplete because we cannot reproduce due to an upstream issue in juju. We are working with that team to identify what the problem is to get a reproduction.

Thanks.

Changed in cloud-init:
status: Incomplete → In Progress
assignee: Chad Smith (chad.smith) → David Britton (davidpbritton)
Calvin Hartwell (calvinh) wrote :

Hi Dave,

No problem - if you need help reproducing the issue I am available mid next
week, I can reproduce this in the CPE lab if required.

Cheers

On Mon, Feb 5, 2018 at 11:44 PM, David Britton <email address hidden>
wrote:

> Hi Calvin --
>
> We don't typically give ETAs on bug reports. Please ping me directly if
> you want more information.
>
> This issue is currently incomplete because we cannot reproduce due to an
> upstream issue in juju. We are working with that team to identify what
> the problem is to get a reproduction.
>
> Thanks.
>
> ** Changed in: cloud-init
> Status: Incomplete => In Progress
>
> ** Changed in: cloud-init
> Assignee: Chad Smith (chad.smith) => David Britton (davidpbritton)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746455
>
> Title:
> cloud-init vSphere cloud provider DHCP unique hostname issue
>
> Status in cloud-init:
> In Progress
>
> Bug description:
> We've had an issue recently when provisioning workloads on vSphere.
> Here is what happens:
>
> 1) Set vSphere as the cloud provider in juju (bootstrap)
>
> 2) Deploy workload on vSphere, in our case, Canonical Kubernetes (CDK)
>
> 3) Machines are deployed using VMDK, but when they boot the initial
> hostname is Ubuntu (for all machines) until juju changes it during
> deployment.
>
> 4) DHCP gives each machine a lease, but the DNS entries are all messed
> up because of identical hostnames.
>
> 5) If the machines are rebooted they will receive the correct DNS
> entries. Note that juju will set the correct hostname eventually, but
> not when the initial DHCP lease is given to the host.
>
>
> Basically cloud-init is not giving newly provisioned machines on vSphere
> their correct hostname before the DHCP lease is given, meaning that every
> instance during DHCP has the hostname 'ubuntu' which messes up the entries
> on the DNS server.
>
> Related Bugs:
>
> * https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1600766
> * https://github.com/juju-solutions/bundle-canonical-
> kubernetes/issues/480
> * https://github.com/juju-solutions/bundle-canonical-
> kubernetes/issues/171
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1746455/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=cloud-init; status=In Progress; importance=High;
> <email address hidden>;
> Launchpad-Bug-Tags: cdo-qa-blocker cloud-init cpe-onsite dhcp juju k8s
> vsphere
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: calvinh davidpbritton jameinel smoser thumper
> Launchpad-Bug-Reporter: Calvin Hartwell (calvinh)
> Launchpad-Bug-Modifier: David Britton (davidpbritton)
> Launchpad-Message-Rationale: Subscriber
> Launchpad-Message-For: calvinh
>

--
E: <email address hidden>
W: http://calvinhartwell.com
T: (+44) 07534801542

Tim Penhey (thumper) wrote :

Which version of juju are you using? In 2.3.1, a model config option was added to add custom cloud-init sections for advanced usage (and to all us to work around situations like this).

Some config could be added to the model that would get cloud-init to reboot once it has finished processing, that would allow the machine to come up with the right hostname.

David Britton (davidpbritton) wrote :

On Tue, Feb 06, 2018 at 12:05:27AM -0000, Calvin Hartwell wrote:
> No problem - if you need help reproducing the issue I am available mid
> next week, I can reproduce this in the CPE lab if required.

Hi Calvin -- if you have access to a vsphere that repeats this, by all
means, I would love a reproduction environment.

Currently my vsphere I have access to is blocked on a bootstrap error
that I'm working with the juju team separately.

--
David Britton <email address hidden>

Calvin Hartwell (calvinh) wrote :
Download full text (3.2 KiB)

Hi David,

I am on-site giving a workshop to a customer this week, but I will put this
into my schedule for next week or the week after when I have spare cycles.

It should be quite easy to replicate in our lab, I just need to setup the
DHCP server and install a test cluster with vSphere.

I'll message you on IRC when I start to work on this again.

Cheers,

- Calvin

On Wed, Feb 7, 2018 at 5:50 PM, David Britton <email address hidden>
wrote:

> On Tue, Feb 06, 2018 at 12:05:27AM -0000, Calvin Hartwell wrote:
> > No problem - if you need help reproducing the issue I am available mid
> > next week, I can reproduce this in the CPE lab if required.
>
> Hi Calvin -- if you have access to a vsphere that repeats this, by all
> means, I would love a reproduction environment.
>
> Currently my vsphere I have access to is blocked on a bootstrap error
> that I'm working with the juju team separately.
>
> --
> David Britton <email address hidden>
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746455
>
> Title:
> cloud-init vSphere cloud provider DHCP unique hostname issue
>
> Status in cloud-init:
> In Progress
>
> Bug description:
> We've had an issue recently when provisioning workloads on vSphere.
> Here is what happens:
>
> 1) Set vSphere as the cloud provider in juju (bootstrap)
>
> 2) Deploy workload on vSphere, in our case, Canonical Kubernetes (CDK)
>
> 3) Machines are deployed using VMDK, but when they boot the initial
> hostname is Ubuntu (for all machines) until juju changes it during
> deployment.
>
> 4) DHCP gives each machine a lease, but the DNS entries are all messed
> up because of identical hostnames.
>
> 5) If the machines are rebooted they will receive the correct DNS
> entries. Note that juju will set the correct hostname eventually, but
> not when the initial DHCP lease is given to the host.
>
>
> Basically cloud-init is not giving newly provisioned machines on vSphere
> their correct hostname before the DHCP lease is given, meaning that every
> instance during DHCP has the hostname 'ubuntu' which messes up the entries
> on the DNS server.
>
> Related Bugs:
>
> * https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1600766
> * https://github.com/juju-solutions/bundle-canonical-
> kubernetes/issues/480
> * https://github.com/juju-solutions/bundle-canonical-
> kubernetes/issues/171
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1746455/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=cloud-init; status=In Progress; importance=High;
> <email address hidden>;
> Launchpad-Bug-Tags: cdo-qa-blocker cloud-init cpe-onsite dhcp juju k8s
> vsphere
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: calvinh davidpbritton jameinel smoser thumper
> Launchpad-Bug-Reporter: Calvin Hartwell (calvinh)
> Launchpad-Bug-Modifier: David Britton (davidpbritton)
> Launchpad-Message-Rationale: Subscriber
> Launchpad-Message-For: calvinh
>

-- ...

Read more...

Changed in cloud-init:
status: In Progress → Incomplete
Ashley Lai (alai) wrote :

I just ran the CDK on vsphere and can't reproduce the issue.

juju version: 2.3.3-xenial-amd64
VMware-ESXi-6.0.0-build-3620759

@Calvin what version of ESXi are you running?

Calvin Hartwell (calvinh) wrote :

Hi Ashley,

Version 6.5.X was used for this deployment. What are you using to provide
DHCP and DNS for your guest host?

Cheers,

- Calvin

On Thu, Feb 15, 2018 at 9:19 PM, Ashley Lai <email address hidden>
wrote:

> I just ran the CDK on vsphere and can't reproduce the issue.
>
> juju version: 2.3.3-xenial-amd64
> VMware-ESXi-6.0.0-build-3620759
>
> @Calvin what version of ESXi are you running?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746455
>
> Title:
> cloud-init vSphere cloud provider DHCP unique hostname issue
>
> Status in cloud-init:
> Incomplete
>
> Bug description:
> We've had an issue recently when provisioning workloads on vSphere.
> Here is what happens:
>
> 1) Set vSphere as the cloud provider in juju (bootstrap)
>
> 2) Deploy workload on vSphere, in our case, Canonical Kubernetes (CDK)
>
> 3) Machines are deployed using VMDK, but when they boot the initial
> hostname is Ubuntu (for all machines) until juju changes it during
> deployment.
>
> 4) DHCP gives each machine a lease, but the DNS entries are all messed
> up because of identical hostnames.
>
> 5) If the machines are rebooted they will receive the correct DNS
> entries. Note that juju will set the correct hostname eventually, but
> not when the initial DHCP lease is given to the host.
>
>
> Basically cloud-init is not giving newly provisioned machines on vSphere
> their correct hostname before the DHCP lease is given, meaning that every
> instance during DHCP has the hostname 'ubuntu' which messes up the entries
> on the DNS server.
>
> Related Bugs:
>
> * https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1600766
> * https://github.com/juju-solutions/bundle-canonical-
> kubernetes/issues/480
> * https://github.com/juju-solutions/bundle-canonical-
> kubernetes/issues/171
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1746455/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=cloud-init; status=Incomplete; importance=High;
> <email address hidden>;
> Launchpad-Bug-Tags: cdo-qa-blocker cloud-init cpe-onsite dhcp juju k8s
> vsphere
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: alai calvinh davidpbritton jameinel smoser
> thumper
> Launchpad-Bug-Reporter: Calvin Hartwell (calvinh)
> Launchpad-Bug-Modifier: Ashley Lai (alai)
> Launchpad-Message-Rationale: Subscriber
> Launchpad-Message-For: calvinh
>

--
E: <email address hidden>
W: http://calvinhartwell.com
T: (+44) 07534801542

Ashley Lai (alai) wrote :

Looks like dhcp is using dynamic allocation. Let me know if you need other info and how I can find it as I was not the one setup the network. On the vsphere client, there is nothing listed under "Network Protocol Profiles".

ubuntu@juju-4225b0-5:~$ sudo dhclient -d -nw ens192
Internet Systems Consortium DHCP Client 4.3.3
Copyright 2004-2015 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/ens192/00:50:56:86:83:5f
Sending on LPF/ens192/00:50:56:86:83:5f
Sending on Socket/fallback
DHCPREQUEST of 10.245.57.30 on ens192 to 255.255.255.255 port 67 (xid=0x28e0feea)
DHCPACK of 10.245.57.30 from 10.245.31.3
RTNETLINK answers: File exists
bound to 10.245.57.30 -- renewal in 252 seconds.
DHCPREQUEST of 10.245.57.30 on ens192 to 10.245.31.3 port 67 (xid=0x28e0feea)
DHCPACK of 10.245.57.30 from 10.245.31.3
bound to 10.245.57.30 -- renewal in 254 seconds.
DHCPREQUEST of 10.245.57.30 on ens192 to 10.245.31.3 port 67 (xid=0x28e0feea)
DHCPACK of 10.245.57.30 from 10.245.31.3
bound to 10.245.57.30 -- renewal in 246 seconds.
DHCPREQUEST of 10.245.57.30 on ens192 to 10.245.31.3 port 67 (xid=0x28e0feea)
DHCPACK of 10.245.57.30 from 10.245.31.3
bound to 10.245.57.30 -- renewal in 231 seconds.
Timeout, server 10.245.57.30 not responding.
ubuntu@vsphere96683fcc-8542-485c-aa54-55b6028acad1:~$

Note that this is a regression: this used to work in the past, it stopped working a few days/weeks ago.

We're also hitting against this issue. Hostname via dhcp is currently the only way to get resolvable hostnames with Juju on vsphere. Because of this, we're unable to deploy the canonical distribution of kubernetes.

Is there a workaround for this?

Changed in cloud-init:
assignee: David Britton (davidpbritton) → Chad Smith (chad.smith)
Changed in cloud-init:
status: Incomplete → In Progress
Calvin Hartwell (calvinh) wrote :

Hi Merlijn,

Are you able to describe your setup and test scenario in more detail? It
will help for testing purposes.

Thanks

On Tue, Feb 20, 2018 at 9:45 AM, Merlijn Sebrechts <
<email address hidden>> wrote:

> Note that this is a regression: this used to work in the past, it
> stopped working a few days/weeks ago.
>
> We're also hitting against this issue. Hostname via dhcp is currently
> the only way to get resolvable hostnames with Juju on vsphere. Because
> of this, we're unable to deploy the canonical distribution of
> kubernetes.
>
> Is there a workaround for this?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746455
>
> Title:
> cloud-init vSphere cloud provider DHCP unique hostname issue
>
> Status in cloud-init:
> Incomplete
>
> Bug description:
> We've had an issue recently when provisioning workloads on vSphere.
> Here is what happens:
>
> 1) Set vSphere as the cloud provider in juju (bootstrap)
>
> 2) Deploy workload on vSphere, in our case, Canonical Kubernetes (CDK)
>
> 3) Machines are deployed using VMDK, but when they boot the initial
> hostname is Ubuntu (for all machines) until juju changes it during
> deployment.
>
> 4) DHCP gives each machine a lease, but the DNS entries are all messed
> up because of identical hostnames.
>
> 5) If the machines are rebooted they will receive the correct DNS
> entries. Note that juju will set the correct hostname eventually, but
> not when the initial DHCP lease is given to the host.
>
>
> Basically cloud-init is not giving newly provisioned machines on vSphere
> their correct hostname before the DHCP lease is given, meaning that every
> instance during DHCP has the hostname 'ubuntu' which messes up the entries
> on the DNS server.
>
> Related Bugs:
>
> * https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1600766
> * https://github.com/juju-solutions/bundle-canonical-
> kubernetes/issues/480
> * https://github.com/juju-solutions/bundle-canonical-
> kubernetes/issues/171
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1746455/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=cloud-init; status=Incomplete; importance=High;
> <email address hidden>;
> Launchpad-Bug-Tags: cdo-qa-blocker cloud-init cpe-onsite dhcp juju k8s
> vsphere
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: alai calvinh davidpbritton jameinel
> merlijn-sebrechts smoser thumper
> Launchpad-Bug-Reporter: Calvin Hartwell (calvinh)
> Launchpad-Bug-Modifier: Merlijn Sebrechts (merlijn-sebrechts)
> Launchpad-Message-Rationale: Subscriber
> Launchpad-Message-For: calvinh
>

--
E: <email address hidden>
W: http://calvinhartwell.com
T: (+44) 07534801542

Jason Hobbs (jason-hobbs) wrote :

Calvin, Merlijn, anyone else affected by this; the server team has been unable to reproduce this in house.

Please provide more information on your test setup, as requested by David in comment #4.

Changed in cloud-init:
status: In Progress → Incomplete
Chad Smith (chad.smith) wrote :

While the additional details from Merlijn will be helpful in diagnosing this issue to make sure we nail it, I have a branch in progress that will address the symptoms from cloud-init's side of the equation.

We've been able to track through our lab-local dhcp server in our lab the DHCPREQUESTS in syslog coming across from a newly deployed ubuntu units. The initial dhcp coming from a juju deployed ubuntu unit requests "ubuntu" as the hostname which will muck with DDNS setups. We can reorder when we set hostname in cloud-init to be before network ever comes up if the datasource provides the metadata in init-local timeframe.

This linked cloud-init branch below results in only dhcp requests going out with hostname which match the OVF seed iso's juju-preferred hostname. No more 'ubuntu' dhcp hostname requests. I plan on linking this branch to this bug Monday when I wrap up a couple of unit tests.

https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+ref/set-hostname-before-network

Sorry for the delay, we had a big outage last week which gobbled up most of my time.

I think the important part of the isc-dhcp-server config is the lease time. Making the lease time longer increases the impact of the bug. The default lease time (10 minutes) causes the VM to renew the lease after 5 minutes, now with the correct hostname. A long lease time (1 week) means that the DNS will be incorrect for multiple days.

So as a workaround, we've set the lease time on 5 minutes, which causes the clients to renew the lease after 2.5 minutes. This means that the DNS is fixed before Kubernetes is installed.

Chris Gregan (cgregan) on 2018-03-02
Changed in cloud-init:
status: Incomplete → In Progress
Changed in cloud-init:
status: In Progress → Incomplete
David Britton (davidpbritton) wrote :

Even though we cannot reproduce this, we think fixing the initial DHCP request as Chad mentioned in #23 is at least an improvement and should not be risky. Hopefully it will fix the issue.

Changed in cloud-init:
status: Incomplete → In Progress

I don't understand why you still cannot reproduce this? If the initial DHCP request is incorrect and the DHCP lease time is 2 weeks, then the DHCP will be incorrect for 1 week. I'm not sure what extra information I can provide?

* then the DNS will be incorrect for 1 week

Christian Reis (kiko) wrote :

Merlijn, thanks for the input so far -- we're pushing to get this resolved from our end as well so we shouldn't be too far off. If you have the cycles and can try out the patch Chad has posted we'd like to know whether it makes your problem go away entirely or partially.

Christian

I have some time to test the patch, but I have no idea how I can tell Juju to use them. Any ideas?

Ashley Lai (alai) wrote :

@Merlijn - you can follow the following steps to test the patch.

1. Create vsphere_cloud.yaml file
$ cat vsphere_cloud.yaml
clouds:
  vsphere_cloud:
    type: vsphere
    auth-types: [userpass]
    endpoint: <your_vsphere_client_ip>
    regions:
      dc0: {}

2. Create vsphere_creds.yaml file
$ cat vsphere_creds.yaml
credentials:
  vsphere_cloud:
    vsphere_cloud:
      auth-type: userpass
      password: <your_password>
      user: <your_username>

3. Run the commands below:
juju add-cloud vsphere_cloud vsphere_cloud.yaml
juju add-credential -f vsphere_creds.yaml
juju --debug bootstrap vsphere_cloud k8s-vsphere
juju add-model k8s vsphere_cloud
juju deploy -m k8s-vsphere:k8s canonical-kubernetes

Ashley Lai (alai) wrote :

I forgot to mention that in vsphere_cloud.yaml file dc0 is our data center, you should replace it with yours. Thanks !!

Chad Smith (chad.smith) on 2018-03-15
Changed in cloud-init:
status: In Progress → Fix Committed
Jason Hobbs (jason-hobbs) wrote :

Now that the fix is landed, will it be SRU'd to xenial?

Jason Hobbs (jason-hobbs) wrote :

dpb let me know on IRC that this will be SRU'd as part of the next release, 18.2

This bug is believed to be fixed in cloud-init in 18.2. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Chad Smith (chad.smith) on 2018-04-16
information type: Public → Public Security
hatifnatt (m-hatifnatt) wrote :

Does hostname will be updated in local stage only if hostname present in meta-data? Is there any example config in NoCloud format which guarantee that hostname will be set before network init?
I can't get it working, stub hostname always leak.
There is config and local stage log https://paste.ubuntu.com/p/P9qKRcjWmv/
This test is done on Debian 9 with custom build package version 18.2-14, but I get exactly same result with https://cloud-images.ubuntu.com/bionic/20180518/ this image.

I don't wont to reopen this issue until I will be sure there is nothing wrong on my side.

Scott Moser (smoser) wrote :

@hatifnatt,

Currently, providing the hostname in user-data will have no effect. You'd need to provide the hostname in meta-data.

your paste does not provide meta-data at all, but what you want to do is

$ cat meta-data
local-hostname: your-hostname
instance-id: some-random-uuid

hatifnatt (m-hatifnatt) wrote :

@smosers thanks for answer!
I did not provide meta-data because it's only contain uuid.
After I created that paste I also tested with metadata like
$cat meta-data
instance-id: some-random-uuid
hostname: your-hostname

and it does not work, now you pointed that 'local-hostname' must be used and it works now. But it's not clear from docs, what the difference between 'hostname' and 'local-hostname' in meta-data, or may be I missed that part if you can point me to the right part of the docs it would be great.

To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Duplicates of this bug

Other bug subscribers