VMware Guest OS Customization Fail for Ubuntu 16.04 with Cloud-init 19.1

Bug #1833623 reported by Maher AlAsfar
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Hello,

I yet to find an answer and i know there has been many bugs reported in the past around this issue and i really need to know what we need to do to get this to work, so here it is :

Situation

An automation tool is used to provision Ubuntu 16 VMs to vSphere/ESXi 6.7 U2

Steps I Followed to setup my Ubuntu 16.04 template

- Install fresh Ubuntu 16.04 Server
- run sudo apt update && sudo apt upgrade
- install Cloud init
- run dpkg-reconfigure cloud-init and only select OVF, NoCloud and None as a datasource for vSphere to Speed up Boot time since its only targeted to be deployed on vSphere and not any public cloud.
- Using DHCP

Result

The server is deployed from the template to vSphere, ends-up with No Customization, No Hostname update, connected network via DHCP and Cloud-init executes the Cloud Config code that install an x app successfully

of course this is not the desired state. where we want customization for security reasons and so the hostname to be updated

Problem

Configuring the automation tool to use Static IP Range and vCenter Customization Specification instead prevents the VMs of acquiring the IP from the Static IP Range and the VM network ends up in a disconnected state which results in cloud-init failing to execute the cloud config code

Following https://kb.vmware.com/s/article/56409 fixes the customization part where the VM gets the IP it needs from the Static IP Range, gets an updated hostname and a connected network but .. Cloud-init fails to execute the cloud config code

we also tried the suggestion mentioned in a pervious bug here https://bugs.launchpad.net/ubuntu/+source/open-vm-tools/+bug/1793715

and that is to use After=dbus.socket instead of After=dbus.service but the outcome was the same . works on the customization side but Cloud-init fails to execute the cloud config code.

looking for best approach here where the VM customization is handled by vmware and Cloud Config code execution being handled by Cloud-init without a conflict. so everyone is happy.

thank you.

Maher

Maher AlAsfar (malasfar)
tags: added: vmware
tags: added: cloud-init
tags: added: customization
Maher AlAsfar (malasfar)
description: updated
Revision history for this message
Ryan Harper (raharper) wrote :

Hi,

Could you run:

cloud-init collect-logs

and attach the tarball it creates?

Changed in cloud-init (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Maher AlAsfar (malasfar)
description: updated
description: updated
Revision history for this message
Maher AlAsfar (malasfar) wrote :

Sure thing! Attached

thank you for looking into this

this is a two machine deployment of Wordpress from an ubuntu 16.04 templated. Web and DB VMs . the tar ball is from the DB machine using a static IP Range for connectivity and the setting mentioned in the https://kb.vmware.com/s/article/56409 without Customization Specification that fact that Static IP force customization on the vSphere side.

Result

1- When the Server is deployed and boots it takes 5 min waiting on the A Start job is running for Raise network interfaces
2. After that it says it Failed to start Raise network Interface and continues to boot
3. Machines reboot and we see successful customization in the VM Event. Host name gets updated , machine gets IP from Static IP and boots really fast the 2nd time
4. Cloud-init Cloud Config Code below never makes it. i m guess to the fact that the machine rebooted and had no connectivity before that for cloud-init to execute which explains maybe why it works in DHCP without cutomization.

Cloud Config Code in question

#cloud-config
        repo_update: true
        repo_upgrade: all

        packages:
         - mysql-server

        runcmd:
         - sed -e '/bind-address/ s/^#*/#/' -i /etc/mysql/mysql.conf.d/mysqld.cnf
         - service mysql restart
         - mysql -e "GRANT ALL PRIVILEGES ON *.* TO '${input.username}'@'%' IDENTIFIED BY '${input.userpassword}';"
         - mysql -e "FLUSH PRIVILEGES;"

Revision history for this message
Maher AlAsfar (malasfar) wrote :

Anyone else can help with this we are trying to make cloud-init work when provisioning a ubuntu 16 on a sphere.

Revision history for this message
Ryan Harper (raharper) wrote :
Download full text (3.4 KiB)

Hrm, something looks a bit odd. In the collect logs, I can see that cloud-init found an OVF attached to the instance, however in the cloud-init.log there appears to be an existing boot of cloud-init already present:

2019-06-21 15:33:41,226 - main.py[DEBUG]: Execution continuing, no previous run detected that would allow us to stop early.
2019-06-21 15:33:41,226 - handlers.py[DEBUG]: start: init-network/check-cache: attempting to read from cache [trust]
2019-06-21 15:33:41,227 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
2019-06-21 15:33:41,246 - util.py[DEBUG]: Read 7696 bytes from /var/lib/cloud/instance/obj.pkl
2019-06-21 15:33:41,273 - stages.py[DEBUG]: restored from cache: DataSourceNone

Which looks wrong to me. I suspect this is some template image? And if so, it doesn't appear to have been cleaned up.

Also, the journal is reporting errors:

Jun 21 11:34:27.836406 mysql-mcm547807-109524397596 systemd[1]: Set hostname to <mysql-mcm547807-109524397596>.
Jun 21 11:34:27.836424 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found ordering cycle on sysinit.target/start
Jun 21 11:34:27.836509 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found dependency on cloud-init.service/start
Jun 21 11:34:27.836536 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found dependency on cloud-init-local.service/start
Jun 21 11:34:27.836555 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found dependency on open-vm-tools.service/start
Jun 21 11:34:27.836574 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found dependency on dbus.socket/start
Jun 21 11:34:27.836591 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found dependency on sysinit.target/start
Jun 21 11:34:27.836617 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Breaking ordering cycle by deleting job cloud-init.service/start
Jun 21 11:34:27.836636 mysql-mcm547807-109524397596 systemd[1]: cloud-init.service: Job cloud-init.service/start deleted to break ordering cycle starting with sysinit.target/start
Jun 21 11:34:27.836655 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found ordering cycle on sysinit.target/start
Jun 21 11:34:27.836673 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found dependency on cloud-init-local.service/start
Jun 21 11:34:27.836696 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found dependency on open-vm-tools.service/start
Jun 21 11:34:27.836714 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found dependency on dbus.socket/start
Jun 21 11:34:27.836732 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Found dependency on sysinit.target/start
Jun 21 11:34:27.836751 mysql-mcm547807-109524397596 systemd[1]: sysinit.target: Breaking ordering cycle by deleting job cloud-init-local.service/start
Jun 21 11:34:27.836773 mysql-mcm547807-109524397596 systemd[1]: cloud-init-local.service: Job cloud-init-local.service/start deleted to break ordering cycle starting with sysinit.target/start
Jun 21 11:34:27.836791 mysql-mcm547807-109524397596 systemd[1]: Created slice User and Session Slice.

Which means cloud-init isn't quite working due to...

Read more...

Revision history for this message
Maher AlAsfar (malasfar) wrote :

hi Ryan . thank you for taking the time.

this is a VMware Ubuntu 16.04 VM that later used as a template where we installed cloud-init top . as we prepare the template we do reboot the VM after .. i normally delete the cloud-init* logs from /var/log and maybe i didn't here... unless there is a clean up process that need to be followed after install cloud-init that i didnt do.

once the template is ready users can deploy VMs from this standard Template now... when they do machine customization happens on first boot and i m guessing that's when cloud-init start executing but due to the customization the machine does reboot .. would this explain what's your seeing in the logs.

is there a way we can delay cloud-init from executing until for example the next reboot .. or somehow build some kind of dependency that it execute after the VM customization process ?

Revision history for this message
Maher AlAsfar (malasfar) wrote :

i guessing based on https://bugs.launchpad.net/ubuntu/+source/open-vm-tools/+bug/1793715
i can't really disable cloud-init cause i need to execute the cloud config code that I'm pushing when i m provisioning the VM.. this leaves me that i need to set cloud-init to do the customization here . Correct ?

Revision history for this message
Ryan Harper (raharper) wrote :

I think you'll want to run:

cloud-init clean --logs

After doing your changes to the images; this will remove most cloud-init state.

--
cloud-init will always run on each boot that it detects that it has data to process. So on VMWare, if an OVF iso is provided, or the vmdata is set in the DMI tables; then cloud-init runs.

Each datasource reads the instance-id, which is just a unique string to identify the instance and cloud-init remembers this value such that on reboots we don't duplicate "first boot" operations. However, if the image is captured and booted somewhere else (where it would get a different instance id) then it would do the firstboot things again.

--

You shouldn't need to disable cloud-init; but you likely need to clean-up after your first boot sequence.

Revision history for this message
Maher AlAsfar (malasfar) wrote :

Hi Ryan

With the automation tool we are using we are passing the cloud config user data information via an attached ISO via the vm CD-ROM .. should using dpkg-reconfigure cloud-init command to select only OVF as a data source should be enough in this case since we are only provision to VMware vSphere? or do i need to select all available data sources and not really touch that ?

Revision history for this message
Maher AlAsfar (malasfar) wrote :

Any Luck on the 2nd log collection based on what i followed in Step #9

Revision history for this message
Maher AlAsfar (malasfar) wrote :

Another approach i followed. is https://kb.vmware.com/s/article/59557 where i m disabling guest customization with cloud-init

- Ubuntu 16.04 provisioned
- Network connected
- Ip provided from DHCP
- Hostname was updated this time
- Cloud config code wasn't deployed <--- Problem

logs attached.

Revision history for this message
Maher AlAsfar (malasfar) wrote :

Test 14 Using DHCP

1- Install a fresh Ubuntu 16.04 Server
2- run sudo apt update && sudo apt upgrade
3- install Cloud init
4- if any delete /var/log/cloud-ini* from the VM before shutting it down and converting it to a template
5- run dpkg-reconfigure cloud-init and only select OVF only.
6- run cloud-init clean --logs
7- shutdown the VM and convert to template
8- provision a VM using DHCP from the template

Result

- Ubuntu 16.04 provisioned
- Network connected
- Ip provided from DHCP
- Hostname is not updated as Customization isn't triggered <-- Problem 1
- Cloud config Code was deployed successfully

Revision history for this message
Maher AlAsfar (malasfar) wrote :

Anyone i can work with live via a WebEx / zoom session on this, dedicated a couple of hours please email me at <email address hidden> and i ll send you an invite .. i just think this way its much a faster way to troubleshoot this and get to a resolution

Revision history for this message
Maher AlAsfar (malasfar) wrote :

Hi Ryan

Can you please check the logs for this.. this a rising issue that is happen quite often from customers i have dealt with.

Configuration
===============
VMs to be provisioned on vSphere 6.7 U2
template is Ubuntu 16.04.6 LTS with Cloud-init 19.1 and Open-vm-tools 10304 (10.2.0)

sudo dpkg-reconfigure cloud-init and selected OVF and NONE
/lib/systemd/system/Open-vm-tools.service Contain:

[Unit]
Description=Service for virtual machines hosted on VMware
Documentation=http://open-vm-tools.sourceforge.net/about.php
ConditionVirtualization=vmware
DefaultDependencies=no
Before=cloud-init-local.service
After=local-fs.target
After=dbus.socket

[Service]
ExecStart=/usr/bin/vmtoolsd
TimeoutStopSec=5

[Install]
WantedBy=multi-user.target

Cloud Config Code contain :
      cloudConfig: |
        #cloud-config

        repo_update: true
        repo_upgrade: all

        packages:
         - mysql-server

ran sudo cloud-init clean --log before converting to a template.

Result
=======

Tasks Invoked by vCenter when provisioning by the automation tool from the prepared template

- Clone Virtual Machine from Template
- Reconfigure Virutal Machine
- Customize Virutal Machine Guest OS
- Power on Virtual Machine
 During the boot time the network state is disconnected
 Waits 5 min on A start job is running for Raise network interfaces during post then continues to boot
- VMware Customization starts and successful, listed in the VM Events in vCenter
- VM reboots and the network shows as connected now
- IP provied from Static IP Range
- Hostname is Updated
- Cloud config fails to execute in the right order at it seems it ran when there was no network connectivity and before the vmware customization to finish.

Note: No vCenter Customization Specification used here.

Logs attached can you provide any recommendation on how to fix this or if you can confirm if its a bug today.

i m really trying here and i just need a little support

thank you

Revision history for this message
Maher AlAsfar (malasfar) wrote :

Folks like so far until someone says otherwise there using a static IP when provisioning a VM on vSphere from a template that includes cloud-init just doesn't work.. is anyone open to have a WebEx/zoom session to look into maybe understanding why this is still an issue.

i have tried open-vm-tools vs VMware tools . i have tried After=dbus.service vs After=dbus.socket and this is just around Ubuntu 16.04 because with 18.04 and above its a whole different experience of its not working either. even other distros.

Thanks..

Revision history for this message
Maher AlAsfar (malasfar) wrote :

Just tested Ubuntu 18.04 ( same issue ) logs attached

Behaviour for deploying Ubuntu 18 VMs using Static IP from Automation tool
Automation tool maps an image iso to pass cloud-init user data

Template work done
--------------

Deploy Ubuntu 18.04 from iso
sudo apt-get udpate && sudo apt-get upgrade

Cloud-init installed by default

following KB https://kb.vmware.com/s/article/56409
But used After=dbus.socket instead of After=dbus.service

sudo cloud-init clean --log

Open-VM-Tools 10346 (10.3.10)
Compatibility: ESXi 6.7 Update 2 and later (VM version 15) for the VM Hardware

Tasks Invoked by vCenter when profivsioning from CAS

- Clone Virtual Machine from Template
- Reconfigure Virutal Machine
- Customize Virutal Machine Guest OS
- Power on Virtual Machine
 During the boot time the network state is disconnected

- Customization does start and successful, listed in the VM Events
- VM reboots and the network now is connected
- IP provied from Static IP Range
- Hostname is Updated
- Cloud config code fail to executes

i have attached the logs. not sure if this ever going to work ..

Revision history for this message
Joshua Powers (powersj) wrote :

Maher,

Thanks for the various attempts and options that you have given this. At this time it is obvious that trying to do both VMware and cloud-init customization is broken. We worked with VMware to produce a KB [1] to state that and give the options of using one or the other. In the mean time we continue to work with VMware on how to get both options working.

As this is essentially a feature request to get both customization options working I am not sure we have any further input as this is known to not be working.

Is there another scenario that I might have missed that was not covered in your initial bug report? Thanks!

[1] https://kb.vmware.com/s/article/59557?lang=en_US

Revision history for this message
Maher AlAsfar (malasfar) wrote :

Thanks Joshua .. thats pretty much it yes

So if i need to use Cloud Config / User data to execute scripts i must use cloud-init as well to do the customizization instead of using Perl ( VMware Customization ) baed on the provided KB [1]. The problem is when i do that i have no network, no IP, no updated hostname and of course no connectivity for anything to work.

what i also noticed is anytime i m using Staic IP during the provisioing from the automation too it triggers a VMware Customization, i don't even have to specify a customization Specification in vCenter for it to kick that off.

[1] https://kb.vmware.com/s/article/59557?lang=en_US

Revision history for this message
Maher AlAsfar (malasfar) wrote :

I have Setup the cloud-init as the customization engine, based on this KB [2] instead which is similar to above mentioned KB [1] but it with two extra steps Item 3 and 4. i also made sure i have removed the After=Dbus.service

[2] https://kb.vmware.com/s/article/54986

The result is i see Customization event happening and successful in the VM Event ( so the VM still rebooted, is that Expected even though we configured cloud-init to be the customization engine? ), Network connected, Static IP provided and hostname updated but none of the userdata Cloud config made it through like for example installing mysql-server package.

at this point i m confident there is no solution here for cloud-init to work on VMware unless someone else tells me otherwise. i will be more focused on introducing Ansible open source to deliver packages , etc and remove cloud-init completely from the template

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for cloud-init (Ubuntu) because there has been no activity for 60 days.]

Changed in cloud-init (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.