VMWare Guest OS Customization will fail for Ubuntu 18.04 Server LiveCD

Bug #1793715 reported by vmware-gos-Yuhua on 2018-09-21
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Undecided
Unassigned
open-vm-tools (Ubuntu)
Undecided
Unassigned

Bug Description

This issue occurs when:
1) There is conflict between cloud-init and VMware guest OS customization.
   Customization fails if cloud-init present for Ubuntu 18.04 server liveCD edition. This affects only the liveCD server edition, but not the server/desktop edition if installed into disk.

   And guest customization package is cleaned after powering on the customized virtual machine, customization process can NOT complete correctly.

2) Sometimes systemd services execution sequence on booting virtual machine causes failure.

vmware-gos-Yuhua (yhzou) wrote :

Currently we have workaround for it.

Workaround:

1. set cloud-init or perl scripts as the customization engine
   1) if you want to set cloud-init as the customization engine by:
      Set “disable_vmware_customization: false" in "/etc/cloud.cfg"

   2) if you want to set perl script as customization engine, you should disable or remove
      cloud-init

      Disable cloud-init service by running this command:
      sudo touch /etc/cloud/cloud-init.disabled

      Remove cloud-init package and purge the config files by running these commands:
      sudo apt-get purge cloud-init

2. Open the /usr/lib/tmpfiles.d/tmp.conf file.
   Go to the line 11 and add the prefix #.

   or example:
   #D /tmp 1777 root root -

3. If you have open-vm-tools installed, open the /lib/systemd/system/open-vm-tools.service file.
   Add “After=dbus.service” under [Unit].

Joshua Powers (powersj) wrote :

Can you provide additional details as to what caused the need for step 3 of the workaround (e.g. modifying open-vm-tools service file)?

What issue was seen?
Any logs that show the issue?

asfak ali (asfak2) on 2018-09-22
Changed in open-vm-tools (Ubuntu):
status: New → Fix Released

I did not feel the discussion was over, why did you close that asfak?
It the KB article updated, is there anything than can be done in Ubuntu (see Joshs questions in comment #2).

Changed in open-vm-tools (Ubuntu):
status: Fix Released → Confirmed
Pengpeng Sun (pengpengs) wrote :

As to Joshua's question:
Customization of hostname is using "hostnamectl set-hostname theNewHostName" which requires 'dbus.service' is running. We found the dbus.service might not be running when perl/cloud-init executes the hostnamectl command on Ubuntu18.04 (Open-vm-tools v10.2.0).
As the workaround, We publish this KB to tell customer about this issue. Here is a thread that customer met and workaround this issue: https://github.com/vmware/open-vm-tools/issues/240

Joshua Powers (powersj) wrote :

Per the GitHub issue [1] and the published workaround [2] does open-vm-tools in Bionic and newer releases need to add the following to the open-vm-tools.service file?

"After=dbus.service"

[1] https://github.com/vmware/open-vm-tools/issues/240
[2] https://kb.vmware.com/s/article/56409

Pengpeng Sun (pengpengs) wrote :

Hi Joshua,

Sorry for the late response.
Yes, adding "After=dbus.service" to open-vm-tools.service file is the workaround.

Thanks for the info Pengpeng.
The problem with that is that it will make cloud-init rather late.

Currently (this is the latest state after many fixup for just those reasons of being too early/too late):

Note: Disambiguation of service names used below:

vgauth - open-vm-tools.vgauth.service
vmtoolsd - open-vm-tools.service
fs - systemd-remount-fs.service
tmp - systemd-tmpfiles-setup.service
cloud-init-l - cloud-init-local.service
cloud-init - cloud-init.service

 fs -> vmtoolsd -> cloud-init-l
               ^
 tmp --|
 vgauth --|
 apparmor --|

The problem is that all those services are meant to be super early and therefore have DefaultDependencies=No.

Dbus on the other hand is not that super early and adding
  After=dbus.service
Would move it way back in the initialization order which also you don't wan't AFAIK (bug 1667831 to have open-vm-tools before cloud-init-local was by Sankar from VMWare).

To be sure about it let me ask two questions:
- are your scripts triggered by cloud init
- if so in which config module/phase as I'm sure I have done cloud-init driven hostname changed?
- OR are your scripts triggered by open-vm-tools directly?

@cloud-init Team:
- if we'd end up to make open-vm-tools after dbus.service that would move cloud-init-local.service much later, could you outline the problems with that OR if you are ok with it state so?

Furthermore lets talk in the call today to make sure we are on the same page for this.

I was discussing with Ryan about it and as expected just adding "After=dbus.service" to open-vm-tools.service will for sure trigger random behavior.
This is will create a unresolvable dependency because:
- dbus has DefaultDependencies which makes it after sysinit.target
- Cloud-init-local needs to be early to make changes required before
  certain targets start e.g. Before=sysinit.target
- But open-vm-tools is before cloud-init-local
That is a conflicting requirement and systemd will kick one out of the loop at best.

We would be interested if you could share how to set up VMware to make use of that hostnamectl so that we can see and debug the case on our own as well.
And furthermore the interactions with cloud-init.

Worth a try might be to try making open-vm-tools "After=dbus.socket" which would provide the socket that hostnamectl needs but NOT needing it to fully complete.
If it does not work let us know and we can give up that approach.
If it works for your current needs we still need to ensure it does not inherit the dependencies, after a boot with that modification please check the journal if you can find any message like:
  "Breaking ordering cycle by deleting"
Example from another case
 "Job systemd-tmpfiles-setup.service/start deleted to break ordering cycle starting with var-lib-mysql.mount/start"

In general it seems vmtoolsd is doing many things.
It provides services e.g. through vmware-rpctool which some tools need to use early (so it needs to be early)
And it tries to do configuration e.g. hostnamectl which needs to be late.
Cloud init was split into stages for the same reason and it appears that the current issue vmtoolsd being torn apart having to be early & late at the same time is due to similar issues.
You might consider (surely a longer effort) to split things up as well to provide early services early and later functions late without those conflicts.

Pengpeng Sun (pengpengs) wrote :

Thanks a lot for the info, Christian.
Let me answer your questions firstly:
- are your scripts triggered by cloud init
NO.
- if so in which config module/phase as I'm sure I have done cloud-init driven hostname changed?
- OR are your scripts triggered by open-vm-tools directly?
VMware Guest OS Customization works with either pure 'perl' scripts or cloud-init as the customization engine.
For 'perl' scripts customization, scripts are triggered by open-vm-tools.service, the 'hostnamectl' is the cmd in scripts to set hostname. And 'hostnamectl' has dependency on 'dbus.service'.
For 'cloud-init' customization, only 2 config files are copied to a folder by open-vm-tools.service, and then cloud-init-local.service read the config file from that folder to do the customization. This is the reason of "bug 1667831 to have open-vm-tools before cloud-init-local was by Sankar from VMWare".
And AFAIK, cloud-init is not using 'hostnamectl' to set hostname for Ubuntu distro. so when do the 'cloud-init' customization on Ubuntu, there is no dependency on 'dbus.service'.

And Yes, I also noticed the cycle dependency was introduced if add 'After=dbus.service' to 'open-vm-tools.service' when cloud-init*.services are also enabled.(https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1796875/comments/8)
So the 'after=dbus.service' should only be added when using perl scripts as customization engine.

see details in the workaround KB for Ubuntu18.04 Live Server https://kb.vmware.com/s/article/54986
For Set cloud-init as the customization engine:
   no need add 'after=dbus.service' to open-vm-tools.service
For Set perl script as the customization engine:
   need add 'after=dbus.service' to open-vm-tools.service
   and also disable or remove cloud-init

Thanks for the confirmations!

The service can only have one set of dependencies, since it is the same package no matter if some component on your deployment chain chooses between:
 A) For Set cloud-init as the customization engine:
 B) For Set perl script as the customization engine:

IMHO all paths should use (A) through cloud-init, but I'm sure you had reasons to also have (B).
Therefore since you likely want/need to keep (B) still I'd think that the right solution for that is (as suggested before) to split the functionality of vmtoolsd/open-vm-tools.service.

Split it into:
- one providing services e.g. rpc used by other components that runs early (essentially as we have it today)
- one executing the (B) customization scripts late as you'd need it to make the guarantees for (B)

As I have mentioned before a lessons learned from cloud-init that you might use is that some changes have to be early (unable to change later) and some late (need some components up).
Therefore an even better split is to break (B) into early and late stages. E.g. considering all your legacy (B) late but one might want to add early script in the future.

Or to re-iterate just use the already existing path through cloud-init if you can.

Pengpeng Sun (pengpengs) wrote :

Thanks for the suggestion!
Actually we are working on using (A) only and by default. Before that happens, (B) needs be kept.
Instead of splitting vmtoolsd/open-vm-tools services, we might fix this issue in scripts, make sure when 'hostnamectl' cmd executes, the dbus.service is running.

Sounds good.
In the meantime have you tried:
  After=dbus.socket
Would that be enough for your needs?
But at the same time not trigger the "Breaking ordering cycle"?

Pengpeng Sun (pengpengs) wrote :

Yes, I just tried 'After=dbus.socket' on my Ubuntu18.10, it works.
We need do some verifications and then update our KB if it works on the other Ubuntu versions.

Ok, if 'After=dbus.socket' works for you that is great.
If this does NOT add the ordering cycle issue we could even maybe make that part of the package.
We'd need to do some testing but first you could check and confirm - with 'After=dbus.socket' do you still not see the "Breaking ordering cycle" warning?

vmware-gos-Yuhua (yhzou) wrote :

GOS Customazition does't work with cloud-init engine when add "After=dbus.socket" in open-vm-tools.service

Pengpeng Sun (pengpengs) wrote :

Hi Christian,

According to #15, please do NOT make "After=dbus.socket" as a part of open-vm-tools package.
While we will update VMware KBs to change from "After=dbus.service" to "After=dbus.socket" for (B).
 A) For Set cloud-init as the customization engine:
 B) For Set perl script as the customization engine:

Ok,
but mid term I'd love to see you to split open-vm-tools into early and late stages which would allow the VMWare KB to be removed and just work for everyone.

OTOH you might consider deprecating and then removing (B) in favor of only (A) a solution to the same problem and needing less development effort.

To summarize the discussion and close this for now:

- as outlined in comment #16 there are two paths in vmtools
  (A) being the new code using cloud-init is NOT affected
  (B) being the old code path being affected with own customization code
  The problem is that for some customizations this has to run early, and for some others things
  it has to run late - that is the Dilemma here.

- Path (A) does not have a problem and is the way to go in the future
  Cloud-init provides all that is needed to configure the system

- (B) is there for compatibility with older setups

- To fix (B) there are two options:
  - modify the config (that is what the KB article is about)
    That is the least nice option, but the benefit is that it costs no effort
  - open-vm-tools could split vmtools customization code into two
    (or more) pieces like cloud-init already does
    Those pieces could then run early/late as needed
    This would be a major dev effort for a code path that is considered
    "the old one" and probably not worth the effort.

So from a Ubuntu perspective all is discussed and done.
I'm marking the bug tasks as Invalid.

From a VMWare perspective the options are:
- keep the KB article up (we can bikeshed about wording)
- spend dev time on (B) to be split
- put time on further deprecating and dropping path (B) (recommended)

Changed in open-vm-tools (Ubuntu):
status: Confirmed → Opinion
Changed in cloud-init (Ubuntu):
status: New → Invalid
Maher AlAsfar (malasfar) wrote :

what's the update on this as this continue to be an issue for customers using Ubuntu and cloud-init on vmware ? have we like reached a solution in terms of outlining the steps needed to be done for this to work ?

Thank you

Maher AlAsfar (malasfar) wrote :

I have been spending so many cycle on this, here are the Testing Tasks around Ubuntu 16.04 Customization with Cloud-init
and its expected behaviour for vSphere Provisioning

Using After=dbus.socket doesn't work so stick the KB i mentioned below

if you want to do it yourself its Test 2 with DHCP and Test 4 for Static IP
Please let me know if you were successful or have any comments, questions or suggestions
-----
- Create an Ubuntu 16.04 Virtual Machine on vSphere 6.7 U2.
- Apply the latest updates and upgrades -> sudo apt-get update && sudo apt-get upgrades

Environment Facts
- DHCP is available on the network when preparing the template and later for provisioning.
- Cloud Assembly Static IP range also available for vSphere deployments.

Test 1
=======
behavior for deploying VMs using DHCP from CAS before installing cloud-init

Tasks Invoked by vCenter

- Clone Virtual Machine from Template
- Reconfigure Virtual Machine
- Customize Virtual Machine Guest OS
- Power on Virtual Machine
 During the boot time the network state is Disconnected
- Customization Starts and Succeeds, listed in the VM Events
- VM Reboots
    During the boot time the network state is Connected
- IP provided from DHCP
- Host-name Updated

Note: No Customization Specification used here.

Test 2
======
behavior for deploying VMs using DHCP from CAS After installing cloud-init

Tasks Invoked by vCenter

- Clone Virtual Machine from Template
- Reconfigure Virtual Machine
- Customize Virtual Machine Guest OS
- Power on Virtual Machine
 During the boot time the network state is Connected
- Customization doesn't Start at all, so its not listed in the VM Events
- IP provided from DHCP
- Host-name is not Updated
- Cloud Config Code executes successfully

Note: No Customization Specification used here.

Test 3
======
behavior for deploying VMs using Static IP from CAS before installing cloud-init

Tasks Invoked by vCenter

- Clone Virtual Machine from Template
- Reconfigure Virtual Machine
- Customize Virtual Machine Guest OS
- Power on Virtual Machine
 During the boot time the network state is disconnected
- Customization does start, listed in the VM Events and successful
- VM reboots and network shows as connected
- IP provided from Static IP Range
- Host-name is Updated

Note: No Customization Specification used here.

Test 4
======
behavior for deploying VMs using Static IP from CAS After installing cloud-init
following KB https://kb.vmware.com/s/article/59687

Tasks Invoked by vCenter

- Clone Virtual Machine from Template
- Reconfigure Virtual Machine
- Customize Virtual Machine Guest OS
- Power on Virtual Machine
 During the boot time the network state is disconnected

- Customization does start and successful, listed in the VM Events
- VM reboots and the network shows as connected
- IP provided from Static IP Range
- Host-name is Updated
- Cloud config executes fine

Note: No Customization Specification used here.

Pengpeng Sun (pengpengs) wrote :

@Maher AlAsfar (malasfar)
What's cloud-init version on the Ubuntu16.04? The version need be greater or equal to 18.2 to make customization work.
What's open-vm-tools version on the Ubuntu16.04?

If you customize Ubuntu with cloud-init, please do NOT add any "After=dbus.xxxxx" to open-vm-tools.service file, since it will create dependency cycle during booting.

And If it's possible, I suggest use Ubuntu18.04, not Ubuntu16.04 to work with cloud-init

Maher AlAsfar (malasfar) wrote :

Hi @Pengpeng Sun (pengpengs)

This is where i am with Ubuntu 16.04

Behaviour for deploying Ubuntu 16.04 VMs using Static IP from Automation tool After installing cloud-init

Template Tasks Taken
====================
Deploy Ubuntu 16.04 from ISO

sudo apt-get update && sudo apt-get upgrade

sudo apt-get install cloud-init

sudo dpkg-reconfigure cloud-init -> Selecting only OVF and NONE as data sources. since the automation tool maps an iso image to pass the user data to cloud-init

Following KB https://kb.vmware.com/s/article/59687
But using After=dbus.socket instead of After=dbus.service

Using Open-VM-Tools 10304 (10.2.0) Using ESXi 6.7 Update 2 and later (VM version 15) for the VM Hardware

Tasks Invoked by vCenter when provisioning from Automation tool
===============================================================
Clone Virtual Machine from Template

Reconfigure Virutal Machine

Customize Virutal Machine Guest OS

Power on Virtual Machine During the boot time the network state is disconnected

Customization does start and successful, listed in the VM Events

VM reboots and the network shows as connected

IP provied from Static IP Range

Hostname is Updated

Cloud config fail to executes in the right order ( Before Network Settings are Applied ) most of the time but magically sometimes it does execute fine after the network is up

When it doesn't work, looking at the cloud-init logs that Cloud-init start way too early when the network isn't setup / started yet when it executes.

Looking for a way to make cloud-init executes after VMware Customization .. i thought the KB i mentioned above would do that but it only help getting the customization to complete successfully because if you don't add the After=dbus.socket to the open-vm-tools.service , everything else fails which is way worse since the network will be never connected and the customization will error out in the VM Events where the logs shows the same exact error mentioned in the KB above.

Ofcourse this has a completely different dynamics when your testing with Ubuntu 18.04 which i will get into once i figure Ubuntu 16.04 out.

Maher AlAsfar (malasfar) wrote :

@Pengpeng Sun (pengpengs)

Just tested Ubuntu 18.04 ( same issue ) logs attached

Behaviour for deploying Ubuntu 18 VMs using Static IP from Automation tool
Automation tool maps an image iso to pass cloud-init user data

Template work done
--------------

Deploy Ubuntu 18.04 from iso
sudo apt-get udpate && sudo apt-get upgrade

Cloud-init installed by default

following KB https://kb.vmware.com/s/article/56409
But used After=dbus.socket instead of After=dbus.service

sudo cloud-init clean --log

Open-VM-Tools 10346 (10.3.10)
Compatibility: ESXi 6.7 Update 2 and later (VM version 15) for the VM Hardware

Tasks Invoked by vCenter when profivsioning from CAS

- Clone Virtual Machine from Template
- Reconfigure Virutal Machine
- Customize Virutal Machine Guest OS
- Power on Virtual Machine
 During the boot time the network state is disconnected

- Customization does start and successful, listed in the VM Events
- VM reboots and the network now is connected
- IP provied from Static IP Range
- Hostname is Updated
- Cloud config code fail to executes in the right order

i have attached the logs

Maher AlAsfar (malasfar) wrote :

anyone can look at the logs i provided.. here we have a successful customization where the VM gets IP-ed , hostname is updated after the customization reboot. the problem now is that Cloud-init execute i think before the reboot when there is no network ..

please if you have the time . take a look at the log provided or recommend a workaround where it forces cloud-init to execute after the customization .. Thanks again.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.