[linux-azure] Delay during boot of some instance sizes

Bug #1830740 reported by Joseph Salisbury
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Undecided
Unassigned

Bug Description

We are seeing a long delay, about 100s, when booting new E32s_v3 VMs.

We don’t see this delay with smaller sized Ubuntu VMs(D2s and NC6 sizes).

I attached a screen shot of the boot delay. You can see between 7.022911 and 101.325958 seconds there is no activity.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
Revision history for this message
Terry Rudd (terrykrudd) wrote :

We will be investigation this situation ASAP but it is pending submission for SRU for other azure work.

Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Hi, Joe. Do you think you can provide the output of "systemd-analyse blame" on this system?

Revision history for this message
Francis Ginther (fginther) wrote :

Joe, can you also attach the gz file generated by `cloud-init collect-logs` and the output from `cloud-init analyze show`?

Revision history for this message
Chris Newcomer (cnewcomer) wrote :

I created a test instance and was able to produce most of the data requested. Attachments incoming.

Revision history for this message
Chris Newcomer (cnewcomer) wrote :
Revision history for this message
Chris Newcomer (cnewcomer) wrote :
Revision history for this message
Chris Newcomer (cnewcomer) wrote :

This sosreport contains the systemd analyze blame output. It is from a test instance that I was able to reproduce this issue on.

Revision history for this message
Chris Newcomer (cnewcomer) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

This looks hypervisor/kernel related.

Some observations:

The cloud-init.log in the collect-logs shows cloud-init running twice. The first time, run-time is expected, approx 17s of cloud-init time, the second boot took much longer, but the bulk if the time is in udev

2019-06-20 18:09:18,951 - util.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2019-06-20 18:10:48,209 - util.py[DEBUG]: Waiting for udev events to settle took 89.258 seconds

Some kernel logs around this time show some strangeness:

Jun 20 18:09:18.252780 cnewcomer-1561053730 systemd[1]: Reached target System Time Synchronized.
Jun 20 18:10:05.561513 cnewcomer-1561053730 kernel: hv_balloon: Max. dynamic memory size: 262144 MB
Jun 20 18:10:17.907484 cnewcomer-1561053730 systemd-udevd[770]: seq 9849 '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/c204def2-185f-44d7-9540-5836f397c2de/pci0001:00/0001:00:02.0/net/eth1' is taking a long time
Jun 20 18:10:48.179475 cnewcomer-1561053730 systemd-udevd[868]: error changing net interface name 'eth1' to 'eth0': File exists
Jun 20 18:10:48.179510 cnewcomer-1561053730 systemd-udevd[868]: could not rename interface '3' from 'eth1' to 'eth0': File exists
Jun 20 18:10:48.240895 cnewcomer-1561053730 kernel: mlx4_en: rename3: Steering Mode 2
Jun 20 18:10:48.336884 cnewcomer-1561053730 kernel: hv_netvsc 000d3a7a-59f0-000d-3a7a-59f0000d3a7a eth0: Data path switched to VF: rename3
Jun 20 18:10:48.337179 cnewcomer-1561053730 kernel: mlx4_en: rename3: Link Up

Revision history for this message
Chad Smith (chad.smith) wrote :

Cloud-init logs on an instance with accelerated networking (bionic) exhibiting this delay across reboot

Revision history for this message
James Falcon (falcojr) wrote :
Marcelo Cerri (mhcerri)
no longer affects: linux-azure (Ubuntu)
Revision history for this message
Paride Legovini (paride) wrote :

This is Fix Released by cloud-init 20.3-0ubuntu1. The bug was not auto-closed by the upload because the cloud-init task has been added only after the upload.

Changed in cloud-init (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers