cloud-init and cloud-init-local services boot sequence does not obey dependency settings

Bug #1796875 reported by Pengpeng Sun
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Observed the cloud-init.service and cloud-init-local service boot sequence does NOT obey the dependency setting in these .service files.

According to the dependency [Before/After] settings on these service files, the correct sequence should be cloud-init.service After cloud-init-local.service

I did some testing on the following combination:
Ubuntu18.10 daily build + cloud-init 18.4-0 FAIL
Ubuntu18.04.1 + cloud-init 18.4-0 FAIL
Ubuntu18.04.1 + cloud-init 18.3-9 FAIL
Ubuntu18.04GA + cloud-init 18.3-9 PASS

This could be a regression from Ubuntu18.04.1, please help to check it.

On the PASS test, the sequence is:
cloud-init.service +2.317s
└─systemd-networkd-wait-online.service @17.308s +1.706s
  └─systemd-networkd.service @17.223s +83ms
    └─network-pre.target @17.219s
      └─cloud-init-local.service @2.791s +14.426s
        └─systemd-remount-fs.service @1.792s +994ms
          └─systemd-journald.socket @1.719s
            └─system.slice @1.712s
              └─-.slice @1.698s

On the Fail test, the sequence is:
cloud-init-local.service
└─open-vm-tools.service @2min 6.601s
  └─dbus.service @2min 6.593s
    └─basic.target @2min 6.576s
      └─paths.target @2min 6.566s
        └─acpid.path @2min 6.557s
          └─sysinit.target @2min 6.556s
            └─cloud-init.service @2min 2.386s +4.169s
              └─systemd-journald.socket @1.012s
                └─system.slice @998ms
                  └─-.slice @998ms

Revision history for this message
Pengpeng Sun (pengpengs) wrote :

Link with VMWare bug: 2212195

Revision history for this message
Scott Moser (smoser) wrote :

Hi,
Can you please attach the .tar.gz file created by
cloud-init collect-logs

in success and failure cases.
Also, please include the open-vm-tools version in both.

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Revision history for this message
Scott Moser (smoser) wrote :

when you've provided that input, set the status back to 'New'

Revision history for this message
Pengpeng Sun (pengpengs) wrote :

Attach cloud-init logs from UBUNTU18.04GA PASS test

Revision history for this message
Pengpeng Sun (pengpengs) wrote :

Attach cloud-init logs from UBUNTU18.10 daily build FAIL test

Changed in cloud-init (Ubuntu):
status: Incomplete → New
Revision history for this message
Pengpeng Sun (pengpengs) wrote :

@Scott Thanks for your input, please find the cloud-init logs for both PASS and FAIL test.

Revision history for this message
Pengpeng Sun (pengpengs) wrote :

VMTools version from UBUNTU18.10 daily build FAIL test
$ vmtoolsd -v
VMware Tools daemon, version 10.3.0.5330 (build-8931395)

VMTools version from UBUNTU18.04.1 FAIL test
$ vmtoolsd -v
VMware Tools daemon, version 10.2.0.1608 (build-7253323)

VMTools version from UBUNTU18.04GA PASS test
$ vmtoolsd -v
VMware Tools daemon, version 10.2.0.1608 (build-7253323)

Revision history for this message
Pengpeng Sun (pengpengs) wrote :

@Scott, I found the dependency error was caused by a workaround of us, it might not the root cause of failure on Ubuntu18.04.1 and Ubuntu18.10

After remove the workaround, I got the cloud-init logs again, please check the new attachment cloud-init-18.04.1-FAIL.tar.gz, there are some Tracebacks in it.

Revision history for this message
Scott Moser (smoser) wrote :

@Pengpeng,

Looking at the logs, the core of the issue seems to be the error in journal.txt and
cloud-init-output.log:
 IsADirectoryError: [Errno 21] Is a directory: '/var/lib/cloud/instance'

I'm not really sure how that got that way.
I had seen a bug like this before but have never been able to reproduce or understand where it comes from.

/var/lib/cloud/instance should always be a symbolic link.

Also, I think the logs show 2 boots. the 'modules:config' and 'modules:final' of one boot and then the 'init-local' and 'init' of a second boot. It seems like the clock was set backwards after the first. The best evidence of that is status.json. It is collected from /run, which is tmpfs filesystem, so it will only ever have a single boot in it. There we see:

init started at 982.966233
init-local started at 861.9958138
modules-config started at 985.3238215
modules-final started at 985.8770518

for readability i cut off the '1539336' prefix from each of the times above.
Ie, init logged it started at '1539336982.966233'.

I suspect if you run 'cloud-init clean' the problem would resolve itself, but I'm not /var/lib/cloud/instance got to be a directory. Any ideas?

Revision history for this message
Pengpeng Sun (pengpengs) wrote :

@Scott, thanks a lot for your investigation.
The /var/lib/cloud/instance is a directory in Ubuntu18.04.1 and Ubuntu18.10 daily build, while it is a symbolic link in Ubuntu18.04 GA, could this be a regression bug for Ubuntu release? Or for some reason, it was changed from Ubuntu18.04.1. I will go to launch a LP bug for Ubuntu on this.

Revision history for this message
Pengpeng Sun (pengpengs) wrote :
Revision history for this message
Pengpeng Sun (pengpengs) wrote :

Confirmed on Ubuntu18.10 GA build, the 'var/lib/cloud/instance' is back to a symbolic link.

Revision history for this message
Joshua Powers (powersj) wrote :

I believe this issue is fixed and I am marking it as invalid as nothing from cloud-init was changed.

If you disagree or have additional information please feel free to follow up here and put the bug back into the 'new' state.

Changed in cloud-init (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.