[Hyper-V] Advanced networking support on Azure

Bug #1690430 reported by Joshua R. Poulson on 2017-05-12
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
High
Unassigned
cloud-init (Ubuntu)
Medium
Unassigned
Xenial
Medium
Unassigned
Yakkety
Medium
Unassigned
Zesty
Medium
Unassigned

Bug Description

=== Begin SRU Template ===
[Impact]
A new feature in Azure allows instances the ability to utilize
SR-IOV networking. Currently, Ubuntu images will fail to boot there.

[Test Case]
Testing this comes in the following parts:
a.) check that no regressions have leaked in outside of Azure.
b.) upgraded instance without SR-IOV device upgrade and reboot.
c.) fresh instance with SR-IOV and updated cloud-init and reboot.
d.) fresh instance without SR-IOV and updated cloud-init and reboot.

The cases above generally verify that users have not been exposed
to unexpected changes in behavior, and that the fix is correctly
applied.

After each boot, the user should collect logs, and generally look around
for evidence of failure. One tool that can be used to collected these
logs is 'save-old-data' at [1]. That checks for many common issues with
systemd boot.

[1] https://git.launchpad.net/~smoser/cloud-init/+git/sru-info/tree/bin/save-old-data

[Regression Potential]
The majority of the changes have been limited to the Azure code path.
Regressions then are likely limited to Azure users, and would most
likely present themselves as network configuration failures on reboot
or first boot.

[Other Info]
Upstream commit at
  https://git.launchpad.net/cloud-init/commit/?id=ebc9ecbc8a

=== End SRU Template ===

We are in the process of rolling out SR-IOV in Azure (available as a preview now, contact me offline and we can work out getting your subscription addedif you want to try it). In general our normal synthetic interface appears as eth0 and the VF comes in as eth1. We intend to bond these interfaces together so that if the VF goes down, or the VM is migrated to where no VF is present, eth0 remains as the valid default interface.

At the moment we are handling the bonding via this script:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/plain/tools/hv/bondvf.sh

We were looking at ways to either integrate the behavior into cloud-init or invoke the script from cloud-init and do the right thing.

We recently observed after https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1669860 that the VF interface gets renamed to something like "enP1p0s2" but more recently as "rename2".

Is it possible that 1669860 needs to be expanded to cover our case or is there something we should be doing to make sure that change is working properly for SR-IOV in Azure?

Related branches

Joshua R. Poulson (jrp) wrote :

Cloud-init is warning because the VF and synthetic (Hyper-V) NIC have the same MAC address. The following patch will suppress that warning in that situation.

The attachment "cloudinit-skip-hv-vf.diff" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
David Britton (davidpbritton) wrote :

This is not cloud-init doing the renaming. After confirming with submitted, we agree this affects udev.

affects: cloud-init (Ubuntu) → udev (Ubuntu)
Changed in udev (Ubuntu):
importance: Undecided → High
status: New → Triaged
status: Triaged → Confirmed
Joshua R. Poulson (jrp) wrote :

Our proposed solution is in private bug https://bugs.launchpad.net/lansing/+bug/1695119

Do you want me to bring that over to here?

David Britton (davidpbritton) wrote :

There is an SRU in progress

[ubuntu/xenial-proposed] cloud-init 0.7.9-153-g16a7302f-0ubuntu1~16.04.2 (Waiting for approval)
[ubuntu/yakkety-proposed] cloud-init 0.7.9-153-g16a7302f-0ubuntu1~16.10.2 (Waiting for approval)
[ubuntu/zesty-proposed] cloud-init 0.7.9-153-g16a7302f-0ubuntu1~17.04.2 (Waiting for approval)

That will address the blocking portion of this bug. The interfaces will not be bonded, but the instance will boot with basic networking and allow anyone to bond the interfaces as a post-boot step however they would like.

A more advanced solution will be proposed.

no longer affects: udev (Ubuntu)
Changed in cloud-init:
importance: Undecided → High
summary: - [Hyper-V] VF renaming on Azure
+ [Hyper-V] Advanced networking support on Azure
Changed in cloud-init:
status: New → Fix Committed
Scott Moser (smoser) on 2017-06-29
Changed in cloud-init (Ubuntu Xenial):
status: New → Confirmed
Changed in cloud-init (Ubuntu Yakkety):
status: New → Confirmed
Changed in cloud-init (Ubuntu Xenial):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu):
importance: Undecided → Medium
status: New → Fix Released
Changed in cloud-init (Ubuntu Zesty):
status: New → Confirmed
importance: Undecided → Medium
Scott Moser (smoser) on 2017-06-29
description: updated

Hello Joshua, or anyone else affected,

Accepted cloud-init into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~17.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty.If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Zesty):
status: Confirmed → Fix Committed
tags: added: verification-needed verification-needed-zesty
Steve Langasek (vorlon) wrote :

I've accepted this SRU because I know there is some urgency involved, but the test case does not describe what testing will be done, and where, to verify that there are no regressions outside of Azure. This will need review still before the SRU is released.

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Fix Committed
tags: added: verification-needed-yakkety
Steve Langasek (vorlon) wrote :

Hello Joshua, or anyone else affected,

Accepted cloud-init into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~16.10.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-yakkety to verification-done-yakkety.If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-yakkety. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Xenial):
status: Confirmed → Fix Committed
tags: added: verification-needed-xenial
Steve Langasek (vorlon) wrote :

Hello Joshua, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~16.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial.If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Scott Moser (smoser) wrote :

To verify that this change has not caused regressions, Josh has done a set of runs on different clouds. Basically he did:
  - launch instance
  - enable proposed
  - apt-get update && apt-get install cloud-init
  - reboot
  - verify no WARN in /var/log/cloud-init.log or other obvious failures (networking is sniff tested by the fact that ssh worked)
  - rm -Rf /var/lib/cloud /var/log/cloud-init*
  - reboot
  - ssh in again and look around.

Heres the logs he's collected. the 'rm -Rf' is to somewhat verify first boot.

-- nocloud (via uvt-kvm) --
Install proposed + reboot
 - http://paste.ubuntu.com/24982779/
 - http://paste.ubuntu.com/24982780/
Clean /var/lib/cloud /var/log/cloud-init* + reboot
 - http://paste.ubuntu.com/24982793/
 - http://paste.ubuntu.com/24982794/

-- LXC --
Install proposed + reboot
 - http://paste.ubuntu.com/24982693/
 - http://paste.ubuntu.com/24982694/
Clean /var/lib/cloud /var/log/cloud-init* + reboot
 - http://paste.ubuntu.com/24982699/
 - http://paste.ubuntu.com/24982700/

-- Digital Ocean --
Install proposed + reboot
 - http://paste.ubuntu.com/24982605/
 - http://paste.ubuntu.com/24982606/
Clean /var/lib/cloud /var/log/cloud-init* + reboot
 - http://paste.ubuntu.com/24982612/
 - http://paste.ubuntu.com/24982613/

-- AWS --
Install proposed + reboot
 - http://paste.ubuntu.com/24982550/
 - http://paste.ubuntu.com/24982551/
Clean /var/lib/cloud /var/log/cloud-init* + reboot
 - http://paste.ubuntu.com/24982584/
 - http://paste.ubuntu.com/24982585/

-- GCE --
First reboot + proposed
 - http://paste.ubuntu.com/24982248/
 - http://paste.ubuntu.com/24982250/
Clean /var/lib/cloud /var/log/cloud-init* + reboot
 - http://paste.ubuntu.com/24982420/
 - http://paste.ubuntu.com/24982422/

Ryan Harper (raharper) wrote :

In addition to the above covering (a) of the SRU tests, on Azure, we've tested the following observing no failures.

(b) Azure (upgrade instance with no sriov devices and reboot)
    1. ssh in, add proposed, install cloud-init, reboot
    2. ssh in and check for errors

(c1) Azure (launch sriov instance, upgrade cloud-init, reboot)
    1. ssh in
    2. confirm sriov by precense of second nic (mlx4_core driver)
    3. add xenial-proposed, apt update and apt install cloud-init, reboot
    4. ssh in and check for errors

(c2) Azure (launch sriov instance with captured image using upgraded cloud-init)
    1. Use c1, remove /var/lib/cloud /var/lib/waagent /var/log/cloud-init*
       /etc/network/interfaces.d/50-cloud-init.cfg
       /etc/udev/rules.d/70-persistent-net.rules
       /etc/systemd/network/50-cloud-init.link
    2. capture image in portal
    3. create new image from template for fresh boot of cloud-init from proposed
    4. ssh into instance
    5. confirm sriov by precense of second nic (mlx4_core driver

(d) Azure (launch non-sriov instance with captured image using upgraded cloud-init)
    1. Use b, remove /var/lib/cloud /var/lib/waagent /var/log/cloud-init*
       /etc/network/interfaces.d/50-cloud-init.cfg
       /etc/udev/rules.d/70-persistent-net.rules
       /etc/systemd/network/50-cloud-init.link
    2. capture image in portal
    3. create new image from template for fresh boot of cloud-init from proposed
    4. ssh into instance

David Britton (davidpbritton) wrote :

Yakkety and Zesty need verification, but testing *with* sr-iov on those instances is not supported by Azure. Testing there should be limited to regression.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-153-g16a7302f-0ubuntu1~16.04.2

---------------
cloud-init (0.7.9-153-g16a7302f-0ubuntu1~16.04.2) xenial-proposed; urgency=medium

  * debian/patches/ds-identify-behavior-xenial.patch: refresh patch.
  * cherry-pick 5fb49bac: azure: identify platform by well known value
    in chassis asset (LP: #1693939)
  * cherry-pick 003c6678: net: remove systemd link file writing from eni
    renderer
  * cherry-pick 1cd4323b: azure: remove accidental duplicate line in
    merge.
  * cherry-pick ebc9ecbc: Azure: Add network-config, Refactor net layer
    to handle duplicate macs. (LP: #1690430)
  * cherry-pick 11121fe4: systemd: make cloud-final.service run before
    apt daily (LP: #1693361)

 -- Scott Moser <email address hidden> Wed, 28 Jun 2017 17:17:18 -0400

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Simon Xiao (sixiao) wrote :

Thanks. Can you please share what's the right steps to upgrade the cloud-init and test it?

David Britton (davidpbritton) wrote :

Hi Simon --

http://cloud-images.ubuntu.com/daily/server/xenial/20170630/

The images as of this date for xenial have the fix.

Patricia Gaughen (gaughen) wrote :

The daily images have been published into Azure:

data: b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu_DAILY_BUILD-artful-17_10-amd64-server-20170630-en-us-30GB Public Linux Canonical
data: b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu_DAILY_BUILD-zesty-17_04-amd64-server-20170630-en-us-30GB Public Linux Canonical
data: b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu_DAILY_BUILD-xenial-16_04-LTS-amd64-server-20170630-en-us-30GB

Simon Xiao (sixiao) wrote :

Thanks David and Patricia.

I uploaded this vhd image (http://cloud-images.ubuntu.com/daily/server/xenial/20170630/xenial-server-cloudimg-amd64-disk1.vhd.zip) to Azure and from that I successfully created Azure VMs with persistent VF names (vf1, vf2, ... vf8) guided by the new generated udev rule (70-persistent-net.rules).

But one thing (sorry for randomize and jump into so late): why not name the VFs starting from vf0?
Have vf naming starts from 0, will make the triple names consistent: eth0, vf0, bond0.

David Britton (davidpbritton) wrote :

Hi @Sixiao -- the name comes from the PCI domain number from the script and name suggested earlier (in the sample instance, and bug #1695119 if you have access to it). It's really of no special interest to cloud-init which it is. Renaming to vf0 would be a separate change/SRU, please file that as a separate bug if that is how you would like it to behave on Azure.

Thanks!

Scott Moser (smoser) wrote :

I launched a vm of zesty with:

 $ az group create --location=westus2 smgroup1
 $ az vm create -n smz1 -g smgroup1 --image=Canonical:UbuntuServer:17.04-DAILY:latest \
     --ssh-key-value=$HOME/.ssh/id_rsa.pub --admin-username=smoser

Then ssh'd in and
$ echo "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc)-proposed main" | sudo tee /etc/apt/sources.list.d/proposed.list
deb http://archive.ubuntu.com/ubuntu zesty-proposed main

$ sudo apt-get update -q
...
Fetched 10.9 MB in 2s (4,872 kB/s)
Reading package lists...

$ sudo apt-get install cloud-init
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be upgraded:
  cloud-init
1 upgraded, 0 newly installed, 0 to remove and 24 not upgraded.
Need to get 311 kB of archives.
After this operation, 8,192 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu zesty-proposed/main amd64 cloud-init all 0.7.9-153-g16a7302f-0ubuntu1~17.04.2 [311 kB]
Fetched 311 kB in 0s (362 kB/s)
Preconfiguring packages ...
(Reading database ... 64722 files and directories currently installed.)
Preparing to unpack .../cloud-init_0.7.9-153-g16a7302f-0ubuntu1~17.04.2_all.deb ...
Unpacking cloud-init (0.7.9-153-g16a7302f-0ubuntu1~17.04.2) over (0.7.9-153-g16a7302f-0ubuntu1~17.04.1) ...
Processing triggers for ureadahead (0.100.0-19) ...
Setting up cloud-init (0.7.9-153-g16a7302f-0ubuntu1~17.04.2) ...
Leaving 'diversion of /etc/init/ureadahead.conf to /etc/init/ureadahead.conf.disabled by cloud-init'

$ dpkg-query --show cloud-init
cloud-init 0.7.9-153-g16a7302f-0ubuntu1~17.04.2

$ mkdir old
$ sudo mv /var/log/cloud-init* old
$ sudo mv /run/cloud-init/ old/
$ sudo mv /var/lib/cloud/ old/
$ sudo reboot

After reboot, I ran the sru-info collect script mentioned in bug opening and collected the attached output.

All went well.

tags: added: verification-done-zesty
removed: verification-needed-zesty
tags: removed: verification-needed verification-needed-yakkety
Scott Moser (smoser) wrote :

For the zesty SRU, we have done less than we did for xenial. That is for a few reasons.
a.) this exact code has been in xenial for a couple weeks now.
b.) xenial is an lts, zesty not (i realize that is not justification for a regression, but hopefully for lesser testing).

Chad Smith (chad.smith) wrote :

Validated zesty: non-SRIOV with updated cloud-init reboot succeeds without error
ubuntu@Z1:~$ grep CODENAME /etc/os-release
VERSION_CODENAME=zesty
UBUNTU_CODENAME=zesty
ubuntu@Z1:~$ grep 'found local' /var/log/cloud-init*
/var/log/cloud-init.log:2017-07-26 18:04:10,381 - handlers.py[DEBUG]: finish: init-local/search-Azure: SUCCESS: found local data from DataSourceAzure
ubuntu@Z1:~$ grep error /var/log/cloud-init* /var/lib/cloud/data/*json
/var/lib/cloud/data/result.json: "errors": []
/var/lib/cloud/data/status.json: "errors": [],
/var/lib/cloud/data/status.json: "errors": [],
/var/lib/cloud/data/status.json: "errors": [],
/var/lib/cloud/data/status.json: "errors": [],
ubuntu@Z1:~$ dpkg-query --show cloud-initcloud-init 0.7.9-153-g16a7302f-0ubuntu1~17.04.2

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-153-g16a7302f-0ubuntu1~17.04.2

---------------
cloud-init (0.7.9-153-g16a7302f-0ubuntu1~17.04.2) zesty-proposed; urgency=medium

  * cherry-pick 5fb49bac: azure: identify platform by well known value
    in chassis asset (LP: #1693939)
  * cherry-pick 003c6678: net: remove systemd link file writing from eni
    renderer
  * cherry-pick 1cd4323b: azure: remove accidental duplicate line in
    merge.
  * cherry-pick ebc9ecbc: Azure: Add network-config, Refactor net layer
    to handle duplicate macs. (LP: #1690430)
  * cherry-pick 11121fe4: systemd: make cloud-final.service run before
    apt daily (LP: #1693361)

 -- Scott Moser <email address hidden> Wed, 28 Jun 2017 17:20:51 -0400

Changed in cloud-init (Ubuntu Zesty):
status: Fix Committed → Fix Released
Steve Langasek (vorlon) on 2017-07-26
Changed in cloud-init (Ubuntu Yakkety):
status: Fix Committed → Won't Fix
Scott Moser (smoser) wrote :

This is believed to be fixed in cloud-init 17.1

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers