maas uses 3.13 (hwe-t) kernel which does not work on non-virtual IBM power

Bug #1508565 reported by Mike Rushton on 2015-10-21
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
maas-images
Medium
Unassigned
maas (Ubuntu)
Undecided
Unassigned

Bug Description

IBM Power8(Tuleta) kernel panics during enlistment and commissioning due to the ephemeral image running 3.13.0-52-generic.

If we temporarily replace /var/lib/maas/boot-resources/current/ubuntu/ppc64el/generic with /var/lib/maas/boot-resources/current/ubuntu/ppc64el/hwe-u, then we boot fine running 3.16.

This has been tested and confirmed on MAAS 1.7.5, 1.7.6 and 1.8 using fresh installs as well as images from releases and daily.

attached is the enlistment process and failure.

Mike Rushton (leftyfb) wrote :
tags: added: blocks-hwcert-server
Scott Moser (smoser) on 2015-10-21
summary: - simplestream uses 3.13 kernel which does not work on IBM power
+ maas uses 3.13 (hwe-t) kernel which does not work on IBM power
no longer affects: simplestreams
Download full text (3.2 KiB)

Some data: http://paste.ubuntu.com/12887493/ (same data is below, but for formatting the paste is nicer).

The problem stated (and generally known) is that hwe-t kernels do not reliably boot on power8 in powernv ("bare metal" mode).

There are at least 2 possible solutions to this problem:
 a.) fix maas to be able to set a subarch ('hwe-u') for enlistment and commissioning.
 b.) remove the 'hwe-t' stream data from existance and hope that maas then uses the 'hwe-u' product in its place.

'a' has been reported as to be fixed in 1.9, and thats great, but that does not help users for the short term.

For 'b', this is still less than ideal for several reasons:
 1.) hwe-t (3.13) kernels work fine on ppc64 kvm VMs
     If i were a user of maas driving ppc64 kvm VMS (PowerKVM or Ubuntu), then I'd be quite happy with 3.13/hwe-t which has support until 2019.04 . I'd much prefer staying on that supported kernel to the "interim kernel upgrade path" [1]. Ie, If I installed a hwe-u kernel that kernel is only supported for 10 more months (2016.08). If we remove that product, we'll no longer have a way of installing 14.04+hwe-t.
 2.) If we remove the 'hwe-t' stream, we're essentially deleting a product on the server side and hoping that maas systems will follow that deletion. I'd hope that maas doesn't do that, as it would mean a error on the server side could completely wipe out installable systems on the target. The code i've written for mirroring streams does not delete products locally unless specifically told to. So, likely an existing user *still* has to do something manually to fix this problem.
 3.) its not clear if this would work or not. Small amount of testing would get us an answer, but not clear yet, and I personally didn't understand how it happens in code.

[1] https://wiki.ubuntu.com/Kernel/LTSEnablementStack#Kernel.2BAC8-Support.A14.04.x_Ubuntu_Kernel_Support

-- data in line --
$ fmt="%(product_name)s %(kflavor)s %(kpackage)s %(subarch)s %(subarches)s"
$ daily=http://maas.ubuntu.com/images/ephemeral-v2/daily/streams/v1/com.ubuntu.maas:daily:v2:download.json
$ releases=http://maas.ubuntu.com/images/ephemeral-v2/releases/streams/v1/com.ubuntu.maas:v2:download.json

##
## From daily, we have trusty builds for only hwe-t, hwe-u
##
$ sstream-query --max=1 --output-format="$fmt" $releases ftype=boot-kernel release=trusty arch=ppc64el | column -t
com.ubuntu.maas:v2:boot:14.04:ppc64el:hwe-t generic linux-generic hwe-t generic,hwe-p,hwe-q,hwe-r,hwe-s,hwe-t
com.ubuntu.maas:v2:boot:14.04:ppc64el:hwe-u generic linux-generic-lts-utopic hwe-u generic,hwe-p,hwe-q,hwe-r,hwe-s,hwe-t,hwe-u

##
## From daily, we have trusty builds for each hwe-t, hwe-u, hwe-v
##
$ sstream-query --max=1 --output-format="$fmt" $daily ftype=boot-kernel release=trusty arch=ppc64el | column -t
com.ubuntu.maas.daily:v2:boot:14.04:ppc64el:hwe-t generic linux-generic hwe-t generic,hwe-p,hwe-q,hwe-r,hwe-s,hwe-t
com.ubuntu.maas.daily:v2:boot:14.04:ppc64el:hwe-u generic linux-generic-lts-utopic hwe-u generic,hwe-p,hwe-q,hwe-r,hwe-s,hwe-t,hwe-u
com.ubuntu.maas.daily:v2:boot:14.04:ppc64el:hwe-v generic linux-generic-lts-vivid ...

Read more...

Scott Moser (smoser) wrote :

the one option that has been floated that i didnt list above is:
 c. lie in data and make 'hwe-t' product contain the hwe-u (or hwe-v) kernel
     this is less than ideal because it seems almost certain that at some point someone will expect quite reasonably that they're using hwe-t and they're not.

Changed in maas:
milestone: none → 1.8.4
Scott Moser (smoser) wrote :

I did some investigation on option 'b' above. Attached is a bunch of doc on how i did that.

The summary is we can exploit bug 1508975 to make all maas ppc64el users to start getting hwe-v instead of hwe-t by default by deleting the 'com.ubuntu.maas.daily:v2:boot:14.04:amd64:hwe-t' from existance. The next time 'import' runs in maas, the 'hwe-v' kernel would be used for enlistment and commissioning.

That would have some interesting consequences, though:
i. Users will get systems installed with hwe-w.
  Curtin assumes booting with hwe-N means installation of hwe-N is desired.

ii. When hwe-w arrives it will automatically be used.
  I have not verified this, but I suspect that when 'hwe-w' arrives in stream data, then boot and enlistment would then magically move to using hwe-w, and then again with hwe-x. This assumption is based on the fact that my test deletion of hwe-t resulted in use of hwe-v rather than hwe-u (newest rather than oldest). I'
  That moving probably ends in hwe-x , and thus after 2016.05 as hwe-x is last kernel for trusty.

iii. Users who were happy with hwe-t (kvm guests) will need manual action for install of hwe-t
  As a user, the "hwe-kernel ride" above seems a necessary evil at best. However, its not necessary for those on virtual hardware. hwe-t works well there and as a result is probably the recommended path for a user in a virtual machine. Those users would now have to manually 'apt-get install linux-generic && reboot' after installation to get that kernel.

summary: - maas uses 3.13 (hwe-t) kernel which does not work on IBM power
+ maas uses 3.13 (hwe-t) kernel which does not work on non virtual IBM
+ power
summary: - maas uses 3.13 (hwe-t) kernel which does not work on non virtual IBM
+ maas uses 3.13 (hwe-t) kernel which does not work on non-virtual IBM
power
Jon Grimm (jgrimm) wrote :

WRT option c (lie about LTS kernel): https://bugs.launchpad.net/ubuntu/+source/maas/+bug/1508565/comments/3

Can a MaaS customer still get the hwe-t kernel through manual steps if desired?

What are consequences to support on hwe-u? Are they know forced up to 16.04 sooner (since hwe-u does not have lifetime of hwe-t)?

WRT option b (delete LTS kernel in stream): https://bugs.launchpad.net/ubuntu/+source/maas/+bug/1508565/comments/4

>> iii. Users who were happy with hwe-t (kvm guests) will need manual action

Or anyone wanting to recreate specific certification needs to know what the point in time kernel was...
That being said, it seems like this is a fairly short window. hwe-w is here, so we'll only move from hwe-w to hwe-x and then back to normal. And only affects P8, where the intersection of MaaS + KVM/P8 seems likely a low number of affected users.

Scott Moser (smoser) wrote :

> Can a MaaS customer still get the hwe-t kernel through manual steps if desired?

they could apt-get install it after the fact (same as for option 'b').

> What are consequences to support on hwe-u? Are they know forced up to 16.04 sooner (since hwe-u does not have lifetime of hwe-t)?

Well, no. see [1], specifically the picture at [2]. An installed system would get updates on the installed 'linux-generic-lts-<release>' kernel until August of 2016 (the release of 14.04.5). At that point, they no longer get kernel updates, but get a message that says "You should upgrade to linux-generic-lts-xeniel'. Also note, that your question really should say hwe-v. My testing showed that hwe-v would be selected if available, and we should make it availble in the released stream (as it is offiically released).

> Or anyone wanting to recreate specific certification needs to know what the point in time kernel was...

Cert has this general issue. We have no 'snapshot.ubuntu.com' (see http://snapshot.debian.org/ for reference), so there is actually no way to reproduce a point in time of ubuntu other than GA (because you can jsut disable -security and -updates to get GA only). Ie, as soon as you hit archive, you can only get updates.

  [1] https://wiki.ubuntu.com/Kernel/LTSEnablementStack
  [2] https://wiki.ubuntu.com/Kernel/LTSEnablementStack#Kernel.2BAC8-Support.A14.04.x_Ubuntu_Kernel_Support

Scott Moser (smoser) wrote :

My suggested plan forward would be:
 a.) promote the 14.04:ppc64el:hwe-v to released for ppc64el
     I will need someone to test that and suggest its reasonable
     That can be done by using the daily stream and doing an install to a
     system marked as 'generic/hwe-v'
 b.) remove 14.04:ppc64el:hwe-t product from both daily and released streams.
     This exploits maas bug http://pad.lv/1508975 , but should end up
     making even existing users to just start getting hwe-v kernels for
     enlistment, commissioning, and default install.
 c.) document this ppc64el specific behavior in some external facing place
     this should
  i. describe why hwe-t/3.13 is not available for ppc64el.
  ii. describe how to use linux-generic (3.13/hwe-t) in a maas
      installed VM if desired. That looks something like:
      sudo apt-get install linux-generic # probably not necessary
      sudo apt-get --purge remove "linux-(image|headers)-(3.16.0|3.19.0|4.2.0)-.*"
      sudo apt-get autoremove
      sudo reboot
  iii. inform the user about hwe kernels and what their support path
      will be using their 'hwe-v', and subseqently hwe-w and then
      hwe-x kernels.
      probabably should point to [1]

I can do 'a' and 'b' above, but I suggest that 'c' should be done first.
I'm not sure what the appropriate place is for this documentation, and I'd be looking for Mike or Michael or Jeff to write it.

[1] https://wiki.ubuntu.com/Kernel/LTSEnablementStack#Kernel.2BAC8-Support.A14.04.x_Ubuntu_Kernel_Support

Scott Moser (smoser) on 2015-10-29
Changed in maas-images:
status: New → Confirmed
importance: Undecided → Medium
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in maas (Ubuntu):
status: New → Confirmed
Jeff Lane (bladernr) wrote :

Will this be fixed in the 1.9 release?

Gavin Panella (allenap) on 2016-01-25
Changed in maas:
status: New → Invalid
Changed in maas (Ubuntu):
status: Confirmed → Invalid
Jeff Lane (bladernr) wrote :

Unblocking cert on this since we can now deploy using HWE-V or later via MAAS

tags: removed: blocks-hwcert-server
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers