grub 2.04 net does not like deployments with kvm maas pods

Bug #1915288 reported by Dimitri John Ledkov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * MAAS KVM LXD pods fail to deploy, in very beefy servers.

 1) cherrypick upstream fix that caused crash upon completion of http networking

 2) revert patches that add support for TCP window scaling and non-ethernet cards

With above changes one can deploy 100 out of 100 MAAS KVM LXD pods using the patched grubnetx86.efi

[Test Case]

 * Configure maas

 * Configure networking for lxd kvm pods

 * Deploy ubuntu focal on a node

 * Manually init lxd, allow netowrking, add remote password

 * Add kvm lxd host in the kvm page of maas

 * Disable image syncing in maas

 * replace /var/snap/maas/common/maas/boot-resources/current/bootloader/uefi/amd64/grubx64.efi with the signed grubnetx64.efi.signed from the grub-efi-amd64-signed package

 * compose & commision hundred nodes with cli api:

for i in `seq 100`; do maas ps5 vmhost compose 8; done

(where 8 is the id of the kvm host, see url for the kvm node in question)

Without these patches failure rate is up to 30%. With these patches 100 out of 100 pods deploy fine.

Thus one should be ok testing with less number of pods too.

[Where problems could occur]

 * These patches mean that we are using smaller (same ones we did in bionic) TCP window, thus for very large kernels/initrds we may hit http timeouts on the server. Also it means that throughput speed of deployments is lower again.

* However it is better to deploy where we used to deploy before, than failing to deploy small things on small networks.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

espeically LXD KVM pods

description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Dimitri, or anyone else affected,

Accepted grub2 into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.04-1ubuntu35.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in grub2 (Ubuntu Groovy):
status: New → Fix Committed
tags: added: verification-needed verification-needed-groovy
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Dimitri, or anyone else affected,

Accepted grub2 into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.04-1ubuntu26.9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in grub2 (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (grub2/2.04-1ubuntu35.4)

All autopkgtests for the newly accepted grub2 (2.04-1ubuntu35.4) for groovy have finished running.
The following regressions have been reported in tests triggered by the package:

ubiquity/unknown (amd64)
grml2usb/unknown (amd64)
zsys/unknown (amd64)
grubzfs-testsuite/unknown (amd64)
ubuntu-image/unknown (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/groovy/update_excuses.html#grub2

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Changed in grub2 (Ubuntu):
status: New → Fix Committed
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Sideloaded http://archive.ubuntu.com/ubuntu/dists/groovy-proposed/main/uefi/grub2-amd64/2.04-1ubuntu35.4/grubnetx64.efi.signed onto maas deployment machine.

Composed 10 machines, with 60 second delay between composing due to races in edk2 dhcp lease acquiring.

All ten machines reached steady state READY state and off without a hitch.

tags: added: verification-done-groovy
removed: verification-needed-groovy
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Sideloaded http://archive.ubuntu.com/ubuntu/dists/focal-proposed/main/uefi/grub2-amd64/2.04-1ubuntu26.9/grubnetx64.efi.signed onto maas deployment machine.

Composed a further 10 machines, and they all reached READY state and powered off without a hitch.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Sideloaded http://archive.ubuntu.com/ubuntu/dists/hirsute-proposed/main/uefi/grub2-amd64/2.04-1ubuntu40/grubnetx64.efi.signed onto maas deployment machine.

Composed a further 10 machines, and they all reached READY state and powered off without a hitch.

This makes it 30 deployments in a row without issues. This is well above previous failure rate of 1 in 3 machines or so.

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.04-1ubuntu35.4

---------------
grub2 (2.04-1ubuntu35.4) groovy; urgency=medium

  * Fix grub-initrd-fallback.service thanks to JawnSmith LP: #1910815

grub2 (2.04-1ubuntu35.3) groovy; urgency=medium

  * Revert: rhboot-f34-tcp-add-window-scaling-support.patch,
    rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet-2.patch: these break MAAS
    LXD KVM pod deployments. LP: #1915288
  * Cherrypick fix crash in http LP: #1915288

 -- Dimitri John Ledkov <email address hidden> Fri, 12 Feb 2021 22:11:53 +0000

Changed in grub2 (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for grub2 has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.04-1ubuntu26.9

---------------
grub2 (2.04-1ubuntu26.9) focal; urgency=medium

  * Revert: rhboot-f34-tcp-add-window-scaling-support.patch,
    rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet-2.patch: these break MAAS
    LXD KVM pod deployments. LP: #1915288
  * Cherrypick fix crash in http LP: #1915288
  * Fix grub-initrd-fallback.service thanks to JawnSmith LP: #1910815

 -- Dimitri John Ledkov <email address hidden> Fri, 12 Feb 2021 22:03:32 +0000

Changed in grub2 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.04-1ubuntu40

---------------
grub2 (2.04-1ubuntu40) hirsute; urgency=medium

  * Revert: rhboot-f34-tcp-add-window-scaling-support.patch,
    rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet-2.patch: these break MAAS
    LXD KVM pod deployments. LP: #1915288

grub2 (2.04-1ubuntu39) hirsute; urgency=medium

  * Cherrypick a bunch of patches:
    - fix crash in http LP: #1915288
    - add bootp6 documentation
    - add support for UEFI boot protocols
    - use UEFI protocols for http & https networking
    - make netboot search for by-mac/by-uuid/by-ip for grub.cfg
    - update documentation for netboot search paths of grub.cfg
  * Make prebuilt netboot image look for MAAS grub.cfg
  * Fix grub-initrd-fallback.service thanks to JawnSmith LP: #1910815

grub2 (2.04-1ubuntu38) hirsute; urgency=medium

  [ Jean-Baptiste Lallement ]
  [ Didier Roche ]
  * Fix warnings during grub menu generation. Thanks wdoekes for the patch
    (LP: #1898177)
    - Fix warnings when bpool doesn't exist.
    - Fix warnings when snapshot name contains dashes.
  * Do not fail to generate grub menu when name of the snapshot contains
    spaces. (LP: #1903524)

 -- Dimitri John Ledkov <email address hidden> Fri, 12 Feb 2021 20:29:16 +0000

Changed in grub2 (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.