system freeze when swapping to encrypted swap partition

Bug #1647400 reported by bugproxy on 2016-12-05
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Critical
Unassigned
linux (Ubuntu)
Undecided
Thadeu Lima de Souza Cascardo
Xenial
Undecided
Unassigned

Bug Description

== Comment: #0 - Bernd-Rainer Bresser <email address hidden> - 2016-12-05 04:27:00 ==
+++ This bug was initially created as a clone of Bug #147836 +++

---Problem Description---
When the system is installed with encrypted swap partition the attempt to swap ends in a system freeze. No error, no dump, the system needs to be reloaded.

Contact Information = Bernd-Rainer Bresser/Germany/IBM, Wen Yi AG Gao/China/IBM

---uname output---
Linux s38lp65 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:05:09 UTC 2016 s390x s390x s390x GNU/Linux

Machine Type = z13 LPAR

---System Hang---
 System is frozen. No error, no dump, the system needs to be reloaded.

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. install a system with encrypted swap partition. This can be done using the installer or via manual updates to /etc/fstab and /etc/crypttab.

2. Verify encrypted swap partition exists
 cryptsetup status kvmibm-swap
/dev/mapper/kvmibm-swap is active and is in use.
  type: PLAIN
  cipher: aes-xts-plain64:sha256
  keysize: 256 bits
  device: /dev/dasda2
  offset: 0 sectors
  size: 8388672 sectors
  mode: read/write

3. force swapping
this can be done by a command like this (if the system has 8GB memory)
dd if=/dev/zero of=/dev/null ibs=16k obs=8G count=2MB

Stack trace output:
 no

Oops output:
 no

System Dump Location:

*Additional Instructions for Bernd-Rainer Bresser/Germany/IBM, Wen Yi AG Gao/China/IBM:
-Attach sysctl -a output output to the bug.

== Comment: #1 - Bernd-Rainer Bresser <email address hidden> - 2016-12-05 04:33:24 ==
- the problem was originally found on KVM for IBM z 1.1.3
- test on Ubuntu 16.04.1 showed the same issue here
- on KVM for IBM z 1.1.3 the problem could be solved/circumvented by reverting
       commit 564e81a57f9788b1475127012e0fd44e9049e342
      Author: Tetsuo Handa <email address hidden>
      Date: Fri Feb 5 15:36:30 2016 -0800
            mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress

CVE References

bugproxy (bugproxy) on 2016-12-05
tags: added: architecture-s39064 bugnameltc-149525 severity-critical targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
bugproxy (bugproxy) on 2016-12-06
tags: removed: bugnameltc-149525 severity-critical
Changed in ubuntu-z-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in ubuntu-z-systems:
assignee: Canonical Kernel Team (canonical-kernel-team) → Thadeu Lima de Souza Cascardo (cascardo)

Hi, Mr. Bresser.

Can you try Ubuntu 16.10 and let us know if it still has the same problem?

Thank you.
Cascardo.

Hi, Mr. Bresser.

Can you try the kernel at http://people.canonical.com/~cascardo/lp1647400/ besides kernel 4.8?

Thank you.
Cascardo.

Changed in ubuntu-z-systems:
status: New → In Progress
bugproxy (bugproxy) on 2016-12-12
tags: added: bugnameltc-149525 severity-critical
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Kernel (canonical-kernel)
Changed in ubuntu-z-systems:
importance: Undecided → Critical
Changed in linux (Ubuntu):
assignee: Canonical Kernel (canonical-kernel) → Thadeu Lima de Souza Cascardo (cascardo)
Changed in ubuntu-z-systems:
assignee: Thadeu Lima de Souza Cascardo (cascardo) → nobody
Changed in linux (Ubuntu):
status: New → In Progress
Changed in ubuntu-z-systems:
status: In Progress → New

------- Comment From <email address hidden> 2016-12-12 10:25 EDT-------
With Ubuntu 16.10 the problem is gone !

If you still want me to try the kernel at http://people.canonical.com/~cascardo/lp1647400/
Please give me some info how to use the files. Sorry, I have no experience with this.

Hi, Mr. Bresser.

Thank you for your response.

Your testing of the kernel at http://people.canonical.com/~cascardo/lp1647400/linux-image-4.4.0-53-generic_4.4.0-53.74+lp1647400_s390x.deb would be very much welcome.

In order to test it, you should install 16.04, then:

wget http://people.canonical.com/~cascardo/lp1647400/linux-image-4.4.0-53-generic_4.4.0-53.74+lp1647400_s390x.deb
dpkg -i linux-image-4.4.0-53-generic_4.4.0-53.74+lp1647400_s390x.deb

After that, reboot, verify you are running the new kernel:

uname -a | grep 1647400

Then, run your tests.

Thank you very much.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-12-12 11:21 EDT-------
Thanks for the lesson, Mr. Cascardo !

install of the new kernel went well
uname -a | grep 1647400
Linux p23lp37 4.4.0-53-generic #74+lp1647400 SMP Fri Dec 9 17:14:12 UTC 2016 s390x s390x s390x GNU/Linux

and....
....with this kernel the problem is also gone.
I doublechecked with a new 16.04.1 installation afterwards. Here I see the problem immediately.

Changed in ubuntu-z-systems:
status: New → In Progress
Luis Henriques (henrix) on 2016-12-19
Changed in linux (Ubuntu Xenial):
status: New → Fix Committed
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-12-22 05:52 EDT-------
Can someone tell me please, how I can EnableProposed on my s390 system ?
Can I use the "Software & Updates" program here as well ? how ?

Frank Heimes (frank-heimes) wrote :

The easiest way (imho) to enable proposed is:

enable proposed:
$ sudo sh -c "echo 'deb http://us.ports.ubuntu.com/ubuntu-ports $(lsb_release -cs)-proposed restricted main multiverse universe' >> /etc/apt/sources.list.d/proposed-repositories.list" && sudo apt update
$ sudo apt update

and for completeness to disable proposed:
$ sudo rm /etc/apt/sources.list.d/proposed-repositories.list
$ sudo apt update

before installing you may check if the packages you are looking at is really coming from proposed:
$ apt-cache policy <package>

But you can also use packages from proposed w/o this, just by using:

install package from a special (-t = target) repository, like from proposed:
$ sudo apt-get -t $(lsb_release -cs)-proposed install <package>
or:
$ sudo apt-get install <package>/$(lsb_release -cs)-proposed

On Thu, Dec 22, 2016 at 10:59:27AM -0000, bugproxy wrote:
> ------- Comment From <email address hidden> 2016-12-22 05:52 EDT-------
> Can someone tell me please, how I can EnableProposed on my s390 system ?
> Can I use the "Software & Updates" program here as well ? how ?

An easy way to do that would be to edit /etc/apt/sources.list and add
something like:

deb http://archive.ubuntu.com/ubuntu/ xenial-proposed main

------- Comment From <email address hidden> 2016-12-22 07:36 EDT-------
Thanks for the advice so far. But sorry, I need more.

I am asked to verify that "the kernel in -proposed solves the problem".
I know a command to install a package from -proposed now :
sudo apt-get -t $(lsb_release -cs)-proposed install <package>
Question: what package do I need for the kernel ?

Frank Heimes (frank-heimes) wrote :

Hi, according to the apt-cache policy output for the kernel package 'linux-generic':
$ apt-cache policy linux-generic
linux-generic:
  Installed: 4.4.0.57.60
  Candidate: 4.4.0.58.61
  Version table:
     4.4.0.58.61 500
        500 http://us.ports.ubuntu.com/ubuntu-ports xenial-proposed/main s390x Packages
 *** 4.4.0.57.60 500
        500 http://us.ports.ubuntu.com/ubuntu-ports xenial-updates/main s390x Packages
        500 http://ports.ubuntu.com/ubuntu-ports xenial-security/main s390x Packages
        100 /var/lib/dpkg/status
     4.4.0.21.22 500
        500 http://us.ports.ubuntu.com/ubuntu-ports xenial/main s390x Packages

It looks like version 4.4.0.58.61 of the package 'linux-generic' is the one from proposed.

You can install b just doing an upgrade:
$ sudo apt upgrade
or by explicitly installing that package:
$ sudo apt install linux-headers-4.4.0-58
(both with the proposed repository enabled and package index updated)
or with:
$ sudo apt-get -t $(lsb_release -cs)-proposed install linux-headers-4.4.0-58

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-12-22 10:04 EDT-------
Thanks for all your help !

I updated the system with the kernel from -proposed.
uname -a output after the update :
Linux s38lp65 4.4.0-58-generic #79-Ubuntu SMP Tue Dec 20 12:14:19 UTC 2016 s390x s390x s390x GNU/Linux

Then I ran my test and found, that the problem is gone.
Thank you.

tags: added: verification-done-xenial
removed: verification-needed-xenial

cherry-picking these two commits should also be accompanied by cherry-picking https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=6b4e3181d7bd5ca5ab6f45929e4a5ffa7ab4ab7f , otherwise users will see a lot of pre-mature OOM kills..

Launchpad Janitor (janitor) wrote :
Download full text (5.9 KiB)

This bug was fixed in the package linux - 4.4.0-59.80

---------------
linux (4.4.0-59.80) xenial; urgency=low

  [ John Donnelly ]

  * Release Tracking Bug
    - LP: #1654282

  * [2.1.1] MAAS has nvme0n1 set as boot disk, curtin fails (LP: #1651602)
    - (fix) nvme: only require 1 interrupt vector, not 2+

linux (4.4.0-58.79) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1651402

  * Support ACPI probe for IIO sensor drivers from ST Micro (LP: #1650123)
    - SAUCE: iio: st_sensors: match sensors using ACPI handle
    - SAUCE: iio: st_accel: Support sensor i2c probe using acpi
    - SAUCE: iio: st_pressure: Support i2c probe using acpi
    - [Config] CONFIG_HTS221=m, CONFIG_HTS221_I2C=m, CONFIG_HTS221_SPI=m

  * Fix channel data parsing in ST Micro sensor IIO drivers (LP: #1650189)
    - SAUCE: iio: common: st_sensors: fix channel data parsing

  * ST Micro lng2dm 3-axis "femto" accelerometer support (LP: #1650112)
    - SAUCE: iio: st-accel: add support for lis2dh12
    - SAUCE: iio: st_sensors: support active-low interrupts
    - SAUCE: iio: accel: Add support for the h3lis331dl accelerometer
    - SAUCE: iio: st_sensors: verify interrupt event to status
    - SAUCE: iio: st_sensors: support open drain mode
    - SAUCE: iio:st_sensors: fix power regulator usage
    - SAUCE: iio: st_sensors: switch to a threaded interrupt
    - SAUCE: iio: accel: st_accel: Add lis3l02dq support
    - SAUCE: iio: st_sensors: fix scale configuration for h3lis331dl
    - SAUCE: iio: accel: st_accel: add support to lng2dm
    - SAUCE: iio: accel: st_accel: inline per-sensor data
    - SAUCE: Documentation: dt: iio: accel: add lng2dm sensor device binding

  * ST Micro hts221 relative humidity sensor support (LP: #1650116)
    - SAUCE: iio: humidity: add support to hts221 rh/temp combo device
    - SAUCE: Documentation: dt: iio: humidity: add hts221 sensor device binding
    - SAUCE: iio: humidity: remove
    - SAUCE: iio: humidity: Support acpi probe for hts211

  * crypto : tolerate new crypto hardware for z Systems (LP: #1644557)
    - s390/zcrypt: Introduce CEX6 toleration

  * Acer, Inc ID 5986:055a is useless after 14.04.2 installed. (LP: #1433906)
    - uvcvideo: uvc_scan_fallback() for webcams with broken chain

  * vmxnet3 driver could causes kernel panic with v4.4 if LRO enabled.
    (LP: #1650635)
    - vmxnet3: segCnt can be 1 for LRO packets

  * system freeze when swapping to encrypted swap partition (LP: #1647400)
    - mm, oom: rework oom detection
    - mm: throttle on IO only when there are too many dirty and writeback pages

  * Kernel Fixes to get TCMU File Backed Optical to work (LP: #1646204)
    - target/user: Use sense_reason_t in tcmu_queue_cmd_ring
    - target/user: Return an error if cmd data size is too large
    - target/user: Fix comments to not refer to data ring
    - SAUCE: (no-up) target/user: Fix use-after-free of tcmu_cmds if they are
      expired

  * CVE-2016-9756
    - KVM: x86: drop error recovery in em_jmp_far and em_ret_far

  * Dell Precision 5520 & 3520 freezes at login screent (LP: #1650054)
    - ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520

  * CVE-2016-979...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
bugproxy (bugproxy) on 2017-01-11
tags: added: targetmilestone-inin16041
removed: targetmilestone-inin---

Hi, Mr. Grünbichler.

Do you have any specific workload where you have observed such OOMs and that represent a regression?

Can you open a new bug for such regressions?

As far as I was able to investigate, bringing this commit would imply bringing a whole set which introduce and change should_compact_retry. At first, it was introduced to prevent other OOMs as well, but I would rather see such a problem with the updated Xenial kernel before we can bring a whole bunch of updates.

Thank you very much.
Thadeu Cascardo.

alex (arwineap) wrote :

I can confirm pre-mature OOM issues running 4.4.0-59; a downgrade back to 4.4.0-57 completely fixes the problem.

The specific workload in this case was a build server (http://concourse.ci) which makes heavy use of btrfs and runc containers.

I will try to spend some time this week to make a more generic reproduction case.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers