Juju bootstrap vmware vsphere not working with vsan datastore

Bug #1800940 reported by Kent Williams
46
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Tim McNamara
Kubernetes Control Plane Charm
Invalid
Undecided
Unassigned

Bug Description

I am unable to successfully bootstrap a vmware vsphere controller using vsan as the datastore. Bootstrapping works fine if vsan is not used as datastore.

I came across another person that encountered the exact same issue here: https://askubuntu.com/questions/1020107/juju-bootstrap-not-working-with-vsan?newreg=d3c6766987f74e15a59866e2621f6c5f

When the VM attempts to boot, network interfaces attempt a PXE boot, but it is never able to boot from the disk file.

-Kent

Kent Williams (k3nt)
summary: - Juju bootstrap not working with vsan
+ Juju bootstrap vmware vsphere not working with vsan datastore
Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.5.0
status: New → Triaged
importance: Undecided → High
Kent Williams (k3nt)
description: updated
Revision history for this message
Kent Williams (k3nt) wrote :

Is there any more information I can provide to assist in this bug resolution?

Changed in juju:
milestone: 2.5.0 → 2.5-beta2
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.5-beta2 → 2.5-beta3
Changed in juju:
milestone: 2.5-beta3 → 2.5-rc1
Revision history for this message
Watteel (piwi3910) wrote :

same issue here, you are not alone

Revision history for this message
Calvin Hartwell (calvinh) wrote :

Blocking some customer deployments (https://bugs.launchpad.net/juju/+bug/1807953).

Changed in juju:
milestone: 2.5-rc1 → 2.5.1
Revision history for this message
Kris Applegate (kris-applegate) wrote :

I'd love to assist with resolving this bug. I'm relatively new to juju, but this is blocking some of the work we're doing in the Dell EMC Customer Solution Centers. I have access to environments to assist in troubleshooting, but I'll need help. Please let me know what I can do.

Revision history for this message
Kent Williams (k3nt) wrote :

Is there any indication that this will actually make it into the 2.5.1 release?

Ian Booth (wallyworld)
tags: added: vsphere-provider
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.5.1 → 2.5.2
Changed in juju:
milestone: 2.5.2 → 2.5.3
Revision history for this message
Daniel Bidwell (bidwell) wrote :

juju bootstrap downloads a *.vmdk file. The cdrom points to an ovfenv-juju-*.iso but fails to boot from it and drops down to the vmware pxe boot which of course times out and failes.

Could the fact that it is using vmware hardware version 10 instead of version 14 have anything to do with it?

Revision history for this message
Daniel Bidwell (bidwell) wrote :

upgrading the vmware hardware version to 11,13, or 14 does not make any difference in it's behaviour.

Changed in juju:
milestone: 2.5.3 → 2.5.4
Changed in juju:
milestone: 2.5.4 → 2.5.5
Changed in juju:
milestone: 2.5.6 → 2.5.8
Changed in juju:
milestone: 2.5.8 → 2.5.9
Revision history for this message
Tim McNamara (tim-clicks) wrote :

Adding kubernetes-master-charm as it is marked as affecting a duplicate bug report.

Revision history for this message
George Kraft (cynerva) wrote :

I'm marking this as invalid for kubernetes-master, since the issue occurs during Juju bootstrap or machine deployment, before kubernetes-master ever runs.

Changed in charm-kubernetes-master:
status: New → Incomplete
status: Incomplete → Invalid
Revision history for this message
Kent Williams (k3nt) wrote :

Agreed, don't see the kubernetes-master charm being relevant for this bug.

Revision history for this message
Frederik Nordahl Jul Sabroe (frederikns) wrote :

Thanks for pointing me to the duplicate issue.

Is there anyway I can get the definitions for the VMs that juju tries to start? Because then I could possible help with debugging the issue from the VMware end.

Is there anything else I can do to help out with debugging this?

Changed in juju:
status: Triaged → In Progress
Revision history for this message
Tim McNamara (tim-clicks) wrote :

Thanks for the offer of assistance Frederik. I missed the email notification when it came through and apologise for not updating this ticket sooner.

I have been able to identify the change that has introduced this problem[0]. As context, Juju does quite a complicated dance when creating a virtual machine. We want to make sure that we're using a Ubuntu certified cloud image[1], but we also wish to enable some parameters that may be specified by configuration parameters. So we download the image, tweak some parameters in its metadata file, then upload the disk along with the new metadata to VSAN.

It looks as though the metadata file that Juju generates is incorrect and the VMDK uploaded into the vSAN does not get imported correctly. I'll update the ticket when I've discovered where the fault lies.

[0] https://github.com/juju/juju/pull/7856
[1] https://cloud-images.ubuntu.com/

Revision history for this message
Frederik Nordahl Jul Sabroe (frederikns) wrote :

Brilliant, let me know if there's anything I can assist you with, testing maybe?

Revision history for this message
Tim McNamara (tim-clicks) wrote :

Good news. I was able to make some progress late last week and was able to bootstrap Juju onto a vSphere/ESXi instance backed by vSAN.

I'm cautiously optimistic that we'll be able to release this fix in the next few weeks. There is still some clean up work to do before it is merged.

Thanks for the offer to test Frederik. If you're still interested, you can build my development branch[0] and experiment. Things should just work.

Here were the two commands that successfully ran for me:

    juju bootstrap vsphere
    juju deploy ubuntu

[0] https://github.com/timClicks/juju/tree/develop-vsphere--vmdk-vsan

Changed in juju:
assignee: nobody → Tim McNamara (tim-clicks)
milestone: 2.5.9 → 2.7-beta1
Revision history for this message
Daniel Bidwell (bidwell) wrote : Re: [Bug 1800940] Re: Juju bootstrap vmware vsphere not working with vsan datastore

This is wonderful news. Is there any chance that they can update the
vsphere images from hardware version 10 to 14?

On Mon, 2019-07-22 at 01:06 +0000, Tim McNamara wrote:
> Good news. I was able to make some progress late last week and was
> able
> to bootstrap Juju onto a vSphere/ESXi instance backed by vSAN.
>
> I'm cautiously optimistic that we'll be able to release this fix in
> the
> next few weeks. There is still some clean up work to do before it is
> merged.
>
> Thanks for the offer to test Frederik. If you're still interested,
> you
> can build my development branch[0] and experiment. Things should just
> work.
>
> Here were the two commands that successfully ran for me:
>
> juju bootstrap vsphere
> juju deploy ubuntu
>
> [0] https://github.com/timClicks/juju/tree/develop-vsphere--vmdk-vsan
>
> ** Changed in: juju
> Assignee: (unassigned) => Tim McNamara (tim-clicks)
>
> ** Changed in: juju
> Milestone: 2.5.9 => 2.7-beta1
>
--
Daniel R. Bidwell | <email address hidden>
Sr. Systems Architect | chief Information Security Officer
Andrews University | Information Technology Services
If two always agree, one of them is unnecessary.
Karma is getting what you deserve,
mercy is not getting what you deserve
grace is getting what you do not deserve.
In theory, theory and practice are the same.
In practice, they are not.

Revision history for this message
Frederik Nordahl Jul Sabroe (frederikns) wrote :

Fantastic! I'll give it a try tomorrow!

Thank you so much for looking in to this problem.

Revision history for this message
Frederik Nordahl Jul Sabroe (frederikns) wrote :

Unfortunately our VMware trial license has expired, so I have to get our IT department to help me, before I can test out the changes. :-(

I have built juju from source, so I'm crossing my fingers that IT will be able to help out soon.

Revision history for this message
John A Meinel (jameinel) wrote :

We tried updating the hardware version in the past, but the Ubuntu images
then failed to boot (kernel failure midway through the boot process). We
might have better success with the new code path, so it is probably worth
testing again.

On Mon, Jul 22, 2019 at 5:40 AM Daniel Bidwell <email address hidden> wrote:

> This is wonderful news. Is there any chance that they can update the
> vsphere images from hardware version 10 to 14?
>
> On Mon, 2019-07-22 at 01:06 +0000, Tim McNamara wrote:
> > Good news. I was able to make some progress late last week and was
> > able
> > to bootstrap Juju onto a vSphere/ESXi instance backed by vSAN.
> >
> > I'm cautiously optimistic that we'll be able to release this fix in
> > the
> > next few weeks. There is still some clean up work to do before it is
> > merged.
> >
> > Thanks for the offer to test Frederik. If you're still interested,
> > you
> > can build my development branch[0] and experiment. Things should just
> > work.
> >
> > Here were the two commands that successfully ran for me:
> >
> > juju bootstrap vsphere
> > juju deploy ubuntu
> >
> > [0] https://github.com/timClicks/juju/tree/develop-vsphere--vmdk-vsan
> >
> > ** Changed in: juju
> > Assignee: (unassigned) => Tim McNamara (tim-clicks)
> >
> > ** Changed in: juju
> > Milestone: 2.5.9 => 2.7-beta1
> >
> --
> Daniel R. Bidwell | <email address hidden>
> Sr. Systems Architect | chief Information Security Officer
> Andrews University | Information Technology Services
> If two always agree, one of them is unnecessary.
> Karma is getting what you deserve,
> mercy is not getting what you deserve
> grace is getting what you do not deserve.
> In theory, theory and practice are the same.
> In practice, they are not.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1800940
>
> Title:
> Juju bootstrap vmware vsphere not working with vsan datastore
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/charm-kubernetes-master/+bug/1800940/+subscriptions
>

Revision history for this message
Frederik Nordahl Jul Sabroe (frederikns) wrote :

I finally got our vSphere back up and running, and gave this a try, however I ran into some issues with it.

I documented my problems here on the github pull request: https://github.com/juju/juju/pull/10461#issuecomment-516336205

Revision history for this message
Tim McNamara (tim-clicks) wrote :

Good news. We've been able to land a change that enabled Juju to support vSAN volumes. I've added testing instructions for the curious in our forum: https://discourse.jujucharms.com/t/1929

Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Daniel Bidwell (bidwell) wrote :
Download full text (3.7 KiB)

I have found a solution. The juju vsphere images load with
/etc/default/grub containing the following line which is causing the
problem.

GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0"
If I change it to:
GRUB_CMDLINE_LINUX_DEFAULT="maybe-ubiquity"
I can update the vmware hardware version to 13 or 14 with no trouble and
they boot normally.

I am not certain what the maybe-ubiquity does, but I am certain that the
problem lies with the "console=tty1 console=ttyS0".

On Wed, Jul 24, 2019 at 3:41 AM John A Meinel <email address hidden>
wrote:

> We tried updating the hardware version in the past, but the Ubuntu images
> then failed to boot (kernel failure midway through the boot process). We
> might have better success with the new code path, so it is probably worth
> testing again.
>
> On Mon, Jul 22, 2019 at 5:40 AM Daniel Bidwell <email address hidden>
> wrote:
>
> > This is wonderful news. Is there any chance that they can update the
> > vsphere images from hardware version 10 to 14?
> >
> > On Mon, 2019-07-22 at 01:06 +0000, Tim McNamara wrote:
> > > Good news. I was able to make some progress late last week and was
> > > able
> > > to bootstrap Juju onto a vSphere/ESXi instance backed by vSAN.
> > >
> > > I'm cautiously optimistic that we'll be able to release this fix in
> > > the
> > > next few weeks. There is still some clean up work to do before it is
> > > merged.
> > >
> > > Thanks for the offer to test Frederik. If you're still interested,
> > > you
> > > can build my development branch[0] and experiment. Things should just
> > > work.
> > >
> > > Here were the two commands that successfully ran for me:
> > >
> > > juju bootstrap vsphere
> > > juju deploy ubuntu
> > >
> > > [0] https://github.com/timClicks/juju/tree/develop-vsphere--vmdk-vsan
> > >
> > > ** Changed in: juju
> > > Assignee: (unassigned) => Tim McNamara (tim-clicks)
> > >
> > > ** Changed in: juju
> > > Milestone: 2.5.9 => 2.7-beta1
> > >
> > --
> > Daniel R. Bidwell | <email address hidden>
> > Sr. Systems Architect | chief Information Security Officer
> > Andrews University | Information Technology Services
> > If two always agree, one of them is unnecessary.
> > Karma is getting what you deserve,
> > mercy is not getting what you deserve
> > grace is getting what you do not deserve.
> > In theory, theory and practice are the same.
> > In practice, they are not.
> >
> > --
> > You received this bug notification because you are subscribed to juju.
> > Matching subscriptions: juju bugs
> > https://bugs.launchpad.net/bugs/1800940
> >
> > Title:
> > Juju bootstrap vmware vsphere not working with vsan datastore
> >
> > To manage notifications about this bug go to:
> >
> >
> https://bugs.launchpad.net/charm-kubernetes-master/+bug/1800940/+subscriptions
> >
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1800940
>
> Title:
> Juju bootstrap vmware vsphere not working with vsan datastore
>
> Status in Kubernetes Master Charm:
> Invalid
> Status in juju:
> In Progress
>
> Bug description:
> I am unable to successfully bootstrap a vmwar...

Read more...

Revision history for this message
Daniel Bidwell (bidwell) wrote :
Download full text (3.6 KiB)

Doing a little more verifying of my solution shows that the "maybe-
ubiquity" is not needed at all. I just removed the "console=tty1
console=ttyS0" and it works fine.

On Tue, 2019-09-03 at 11:47 -0400, Daniel Bidwell wrote:
> I have found a solution. The juju vsphere images load with
> /etc/default/grub containing the following line which is causing the
> problem.
>
> GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0"
> If I change it to:
> GRUB_CMDLINE_LINUX_DEFAULT="maybe-ubiquity"
> I can update the vmware hardware version to 13 or 14 with no trouble
> and they boot normally.
>
> I am not certain what the maybe-ubiquity does, but I am certain that
> the problem lies with the "console=tty1 console=ttyS0".
>
> On Wed, Jul 24, 2019 at 3:41 AM John A Meinel <<email address hidden>
> > wrote:
> > We tried updating the hardware version in the past, but the Ubuntu
> > images
> > then failed to boot (kernel failure midway through the boot
> > process). We
> > might have better success with the new code path, so it is probably
> > worth
> > testing again.
> >
> > On Mon, Jul 22, 2019 at 5:40 AM Daniel Bidwell <<email address hidden>
> > >
> > wrote:
> >
> > > This is wonderful news. Is there any chance that they can update
> > the
> > > vsphere images from hardware version 10 to 14?
> > >
> > > On Mon, 2019-07-22 at 01:06 +0000, Tim McNamara wrote:
> > > > Good news. I was able to make some progress late last week and
> > was
> > > > able
> > > > to bootstrap Juju onto a vSphere/ESXi instance backed by vSAN.
> > > >
> > > > I'm cautiously optimistic that we'll be able to release this
> > fix in
> > > > the
> > > > next few weeks. There is still some clean up work to do before
> > it is
> > > > merged.
> > > >
> > > > Thanks for the offer to test Frederik. If you're still
> > interested,
> > > > you
> > > > can build my development branch[0] and experiment. Things
> > should just
> > > > work.
> > > >
> > > > Here were the two commands that successfully ran for me:
> > > >
> > > > juju bootstrap vsphere
> > > > juju deploy ubuntu
> > > >
> > > > [0]
> > https://github.com/timClicks/juju/tree/develop-vsphere--vmdk-vsan
> > > >
> > > > ** Changed in: juju
> > > > Assignee: (unassigned) => Tim McNamara (tim-clicks)
> > > >
> > > > ** Changed in: juju
> > > > Milestone: 2.5.9 => 2.7-beta1
> > > >
> > > --
> > > Daniel R. Bidwell | <email address hidden>
> > > Sr. Systems Architect | chief Information Security
> > Officer
> > > Andrews University | Information Technology Services
> > > If two always agree, one of them is unnecessary.
> > > Karma is getting what you deserve,
> > > mercy is not getting what you deserve
> > > grace is getting what you do not deserve.
> > > In theory, theory and practice are the same.
> > > In practice, they are not.
> > >
> > > --
> > > You received this bug notification because you are subscribed to
> > juju.
> > > Matching subscriptions: juju bugs
> > > https://bugs.launchpad.net/bugs/1800940
> > >
> > > Title:
> > > Juju bootstrap vmware vsphere not working with vsan datastore
> > >
> > > To manage notifications about this bug go to:
> > >
> > >
> > https://bugs...

Read more...

Revision history for this message
Tim McNamara (tim-clicks) wrote :

Fantastic, I knew it would be an easy configuration setting somewhere. We did look!

Our patch also landed in the 2.6.8 release, rather than waiting for 2.7.0 to arrive (I'll update the ticket). So it's great to know that there is an easy option for people who are hesitant to upgrade as well

Changed in juju:
milestone: 2.7-beta1 → 2.6.7
status: Fix Committed → Fix Released
Revision history for this message
Daniel Bidwell (bidwell) wrote :

I am now testing bootstraping a controller on vmware with a
vsanDatastore and am getting unusual behavior that is causing trouble.

The controller vm is being generated with 2 network interfaces on the
same network with the exact same mac address. This causes intermittent
access to the controller vm.

Where do I find the template that it is using to generate the vm so I
can tell it to only generate one nic?

On Tue, 2019-09-03 at 20:26 +0000, Tim McNamara wrote:
> Fantastic, I knew it would be an easy configuration setting
> somewhere.
> We did look!
>
> Our patch also landed in the 2.6.8 release, rather than waiting for
> 2.7.0 to arrive (I'll update the ticket). So it's great to know that
> there is an easy option for people who are hesitant to upgrade as
> well
>
> ** Changed in: juju
> Milestone: 2.7-beta1 => 2.6.7
>
> ** Changed in: juju
> Status: Fix Committed => Fix Released
>
--
Daniel R. Bidwell | <email address hidden>
Sr. Systems Architect | chief Information Security Officer
Andrews University | Information Technology Services
If two always agree, one of them is unnecessary.
Karma is getting what you deserve,
mercy is not getting what you deserve
grace is getting what you do not deserve.
In theory, theory and practice are the same.
In practice, they are not.

Revision history for this message
Christian Muirhead (2-xtian) wrote :

Hi Daniel - I don't see multiple nics on the controller machine (or other machines) when I bootstrap. Is it possible you're using a different stream that has a customised OVA with multiple nics?
The vsphere provider isn't space-aware, so it seems that in that situation we'll only generate a mac for the first nic and then the second nic might be left with whatever was the default.

You can find the templates that're used for all the VMs the controller creates in the folder "juju-vmdks/<controller-uuid>/<series>/" - can you see the extra nic on that template?

Thanks!

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

I get the same behavour as Daniel Bidwell above.

* 2 nics get created with same mac.
* In the generated template in vsphere, I don't see 2 nics (I will double check this, but I fairly sure its not there)
* When deploying machines through the controller - they also get 2 nics with the same MAC:s as the controller node (!).
* 4 nics get created if I supply bootstrap parameters "primary-network" + "external-network" as proposed in the documentation.

I wrote up a detailed description of my observations in the juju discourse forum here: https://discourse.jujucharms.com/t/vsphere-bootstrap-creates-2-nics-with-same-mac/2044

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

UPDATE (on the issue of bootstrap creates 2 nics):

I reverted to version 2.6.6 of the juju snap.

"sudo snap revert --revision=8594 juju" #juju 2.6.6

Then I performed the exact same bootstrap command as I did with 2.6.8. The resulting controller now gets properly a single nic and subsequent deploys also renders into working machines.

The workaround seems to be reverting to 2.6.6.

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

Another observation here is that the directory: /tmp/juju-juju-vsphere-bionic created on the client-side of juju as part of a bootstrap command will be left in place.

The consequence of that, is that other users trying to bootstrap against the same cloud-substrate will fail. We hit that problem and it was non trivial to fix and would also require root-privilege to remove the directory or change permissions.

Suggestion: Remove the /tmp/juju-juju-vsphere-bionic directory as part of cleaning up the bootstrap command so to prevent this from happening.

Revision history for this message
Christian Muirhead (2-xtian) wrote :

Hi Erik and Daniel - I've created a different bug to track work on this multiple-clashing-nics issue: https://bugs.launchpad.net/juju/+bug/1844125

Revision history for this message
Ian Booth (wallyworld) wrote :

The directory "/tmp/juju-juju-vsphere-bionic" is used to guard against multiple bootstraps done simultaneously on a client from trying to access the image cache at the same time. ie I could have 2 client terminals open and run bootstrap in each.

We can't delete it at the end of a bootstrap because there could be another bootstrap running. But we do change the ownership of the directory to the current user rather than leaving it as root.

It is created in the (os specific) temp dir, so one option is to have a "bootstrap" script which sets the TMPDIR env var to something other than /tmp, eg a subdir under /tmp

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

Although this is likely a rare use-case, in a multiuser setup this will cause problems. So, having a user-unique location for temporary data is a must.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

2.6.7 release never went out. This fix was instead released in 2.6.8 - re-targeting this report.

Changed in juju:
milestone: 2.6.7 → 2.6.8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.