cloud-init disables user on azure at second reboot

Bug #1803173 reported by Adrian Vladu
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init
Invalid
Medium
Unassigned
linux (Ubuntu)
Triaged
Undecided
Unassigned
walinuxagent (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hello,

Environment:

platform: Azure
arm image: Canonical UbuntuServer 16.04-DAILY-LTS latest

Steps:

Deploy VM with user/pass authentication
Install latest linux-next-upstream kernel (for example 4.19.0-4db9d11bcbef, where 4db9d11bcbef is the git tag from the linux-next latest tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/)
reboot (all good)
reboot again
cloud-init disables the username password authentication
I checked the cloud-init logs and found:

2018-11-01 16:45:28,566 - init.py[INFO]: User already exists, skipping.
2018-11-01 16:45:28,570 - util.py[DEBUG]: Running command ['passwd', '-l', ''] with allowed return codes [0] (shell=False, capture=True)
2018-11-01 16:45:28,793 - util.py[DEBUG]: Reading from /etc/sudoers (quiet=False)
2018-11-01 16:45:28,795 - util.py[DEBUG]: Read 781 bytes from /etc/sudoers
2018-11-01 16:45:28,796 - util.py[DEBUG]: Writing to /etc/sudoers.d/90-cloud-init-users - ab: [None] 51 bytes
2018-11-01 16:45:28,797 - handlers.py[DEBUG]: finish: init-network/config-users-groups: SUCCESS: config-users-groups ran successfully

This issue is very bad one, as it can render your vm inaccessible on Azure.
I think this problem is due to the new kernel installation.

Initial bug report:
https://github.com/Azure/WALinuxAgent/issues/1386

Revision history for this message
Scott Moser (smoser) wrote :

Hi,
Please attach the tarball created by running 'cloud-init collect-logs'.
Then set the status back to New.

As it is, there is not enough information available here to see what happened.

Thanks for filing a bug.
Scott

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Adrian Vladu (avladu) wrote :

Hello,

I know that /sys/devices/virtual/dmi/id/product_uuid gives the unicity of the instance to cloud-init on Azure platform.
In this case, after upstream kernel install, that UUID value gets from upper case to lower case.

Example:
before kernel install: B016A6CF-B90D-B743-8379-902BBF5E0037
after upstream kernel install: b016a6cf-b90d-b743-8379-902bbf5e0037

I checked the linux kernel tree and I found the commit responsible for this change:
https://github.com/torvalds/linux/commit/712ff25450bd01366301eef81c33e865d901e7b7#diff-f2bd14bc67b5e2da67116bca971bbd0b

It seems that the current 16.04/18.04 Azure kernels do not have this patch and when I install the upstream kernel, the UUID changes -> cloud-init reprovisions the instance.

This issue is similar to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1551419.

Thank you,
Adrian Vladu

Revision history for this message
Chad Smith (chad.smith) wrote :

Marking this New so we can re-look at it as this feels like case insensitivity in UUID matching is something we really should be handling.

Changed in cloud-init:
status: Incomplete → New
Scott Moser (smoser)
Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Scott Moser (smoser) wrote :

Hi.
I added a 'linux' package task here. It seems that in bug 1551419 patches were added
to the ubuntu kernel to do the right thing. Possibly those patches were inadvertantly
dropped? Either way, it seems like the [Ubuntu] kernel should promise this consistently
so that multiple consumers do not need to all know about this issue and handle it
themselves.

That said, we will fix cloud-init to handle it.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1803173

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Scott Moser (smoser) wrote :

Hi Adrian,

In a system where this recreates, could you please run:
  cloud-init collect-logs
and attach the output?

Thank you.

Revision history for this message
Scott Moser (smoser) wrote :

Hi,
I recreated the issue using the Ubuntu upstream kernel builds at
  http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/

After launching a 18.04 instance, and then installing those kernels
and rebooting (generic_4.20.0-999.201811252100) I saw the issue.

I noticed that cloud-init's /var/lib/cloud data was getting messed up,
which didn't make any sense. The problem that I noticed was
/var/lib/cloud/instance was a directory rather than a symlink.
It turns out that the problem was walinuxagent was deleting the
/var/lib/cloud during boot. somewhere before cloud-init modules that
was getting deleted and was wreaking havoc on cloud-init.

It looks like this is at least identified as not the best idea at:
 https://github.com/Azure/WALinuxAgent/commit/f42d2e75617bb54

I verified that cloud-init was working properly by itself with:
  systemctl disable walinuxagent
before the reboot into the new kernel. All was well.
On reboot, cloud-init still used the azure datasource and had a single
entry in /var/lib/cloud/instances/

So, I'm marking this 'Invalid' for cloud-init. The fix needs to be
to have walinuxagent stop deleting state from other programs.

Changed in cloud-init:
status: Confirmed → Invalid
Changed in walinuxagent (Ubuntu):
status: New → Confirmed
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Stephen A. Zarkos (stevez) wrote :

Thanks All.

A newer Azure Agent which no longer touches cloud-init data is already in the pipeline.

Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.