Cryptoswap not working in Bionic

Bug #1762468 reported by Martin D. Weinberg on 2018-04-09
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Bionic
High
Unassigned

Bug Description

I have cryptoswap set up both with the 2 GB default swap file and an 8 GB swap partition with 8 GB of ram. However, the system does not use this swap when it should. For example, if I run a little test code which attempts to allocate 10 GB of memory in 1 GB chuncks, the system hangs with no obvious use of swap. E.g. running top shows no swap use before the hang. However, if I replace cryptoswap by standard swap, it works as expected: allocates the requested memory and exits, using swap. Of course, my disk is slow, so this takes some time but it does work.

I originally noticed this problem when compiling a very large application. The system would hang on ld. I've checked hardware using memtest and stress. No problems. This took me days to track down but I'm 99% certain now that the problem is with cryptoswap. If I compile with cryptoswap: hang. If I compile with standard swap: no problem.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-13-generic 4.15.0-13.14
ProcVersionSignature: Ubuntu 4.15.0-13.14-generic 4.15.10
Uname: Linux 4.15.0-13-generic x86_64
ApportVersion: 2.20.9-0ubuntu4
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: weinberg 2425 F.... pulseaudio
 /dev/snd/controlC0: weinberg 2425 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
Date: Mon Apr 9 11:57:23 2018
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=8d33a538-4ec4-418d-b277-be1a9a1c1113
InstallationDate: Installed on 2018-03-12 (28 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20180114)
MachineType: LENOVO 20ARA0S100
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-13-generic root=UUID=b6683bc1-7dd5-4afd-8e70-f69253403b71 ro quiet splash vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-13-generic N/A
 linux-backports-modules-4.15.0-13-generic N/A
 linux-firmware 1.173
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/19/2014
dmi.bios.vendor: LENOVO
dmi.bios.version: GJET74WW (2.24 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20ARA0S100
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvrGJET74WW(2.24):bd03/19/2014:svnLENOVO:pn20ARA0S100:pvrThinkPadT440s:rvnLENOVO:rn20ARA0S100:rvrNotDefined:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.family: ThinkPad T440s
dmi.product.name: 20ARA0S100
dmi.product.version: ThinkPad T440s
dmi.sys.vendor: LENOVO

The code that I used to force failure, memtest.c, is attached to the bug report.

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed

It seems that cryptswap is working correctly on 17.10 with the kernel 4.13.0-36-generic on a different machine using the same test as described above. I tried the same kernel (4.13.0-36) on 18.04 (machine used in original post) and it still fails. Of course, I checked that the swap appears in /proc/meminfo. Hope this provides a clue.

[FYI, I'm using 4.13.0-36 rather than 4.13.0-38 because of a bad firmware interaction with 4.13.0-37 and 4.13.0-38 that results in CPU lockups]

Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.16 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-key
Changed in linux (Ubuntu Bionic):
status: Confirmed → Incomplete
Changed in linux (Ubuntu Bionic):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream

Yes, the problem does occur in mainline 4.16 and I have added the requested tag.

If anything, 4.16.1 seems slower than 4.15 release kernel even for no cryptswap. It could be that my disk is too slow for cryptswap, but it did work in 17.10.

The obvious work around is not to use cryptswap, of course.

dino99 (9d9) wrote :

Looks like the same issue as reported on lp:#1736072

Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a bisect to figure out what commit caused this regression. We need to identify the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that exhibits this bug:

v4.14 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14/
v4.15-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc1/
v4.15-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc4/
v4.15 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/

You don't have to test every kernel, just up until the kernel that first has this bug.

Thanks in advance!

tags: added: performing-bisect

The first kernel on that list that exhibits the bug is the first one:
4.14 Final.  Recall: the bug results in a full up hang.  The disk stops
swapping (it's audible).   I can reboot with sysctl SUB however.

I then tested 4.13.0-36 from the Ubuntu 17.10 respoitory, followed by
4.11.12-041112-generic from mainline.  Both had the bug.

I also tested each of these without cryptswap (i.e. normal swap) and
they did not hang.  The system ground to a halt, but I could hear the
disk swapping and the process eventually completed.

I then booted an 16.04 live usb which has kernel 4.10.  I had to make a
cryptswap using cryptsetup on the swap partition of the hard disk, to
test it.   But this one worked! The behavior was a bit different that
more recent kernels: after allocating 6 GB of the 10 GB that I
requested, the system invoked the oom-killer on the process.  However,
it did not hang.

I then grabbed 4.10.1-041001-generic from mainline and tested it.  This
worked the same as the live usb.  So 4.10 does not have the bug,
although the oom-killer is invoked to kill the process grabbing the memory.

I have an older machine with an SSD.  This does not exhibit the bug on
Ubuntu 17.10 running 4.13.0-36.  However, this disk is _fast_ (Samsung
EVO), so I'm guessing that there is some tuning problem, maybe, in how
the kernel handles swap speed vs memory allocation requests???  I really
don't know.

--M

On 04/11/2018 11:18 AM, Joseph Salisbury wrote:
> I'd like to perform a bisect to figure out what commit caused this
> regression. We need to identify the earliest kernel where the issue
> started happening as well as the latest kernel that did not have this
> issue.
>
> Can you test the following kernels and report back? We are looking for
> the first kernel version that exhibits this bug:
>
> v4.14 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14/
> v4.15-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc1/
> v4.15-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc4/
> v4.15 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/
>
> You don't have to test every kernel, just up until the kernel that first
> has this bug.
>
> Thanks in advance!
>
> ** Tags added: performing-bisect
>

--
Martin Weinberg
6 Grass Hill Rd
West Whately, MA
010039

BTW, I _do_ realize that you want a starting point for bisection; I'm
sorry this is turning out so messy.

On 04/11/2018 05:51 PM, Martin Weinberg wrote:
> The first kernel on that list that exhibits the bug is the first one:
> 4.14 Final.  Recall: the bug results in a full up hang.  The disk stops
> swapping (it's audible).   I can reboot with sysctl SUB however.
>
> I then tested 4.13.0-36 from the Ubuntu 17.10 respoitory, followed by
> 4.11.12-041112-generic from mainline.  Both had the bug.
>
> I also tested each of these without cryptswap (i.e. normal swap) and
> they did not hang.  The system ground to a halt, but I could hear the
> disk swapping and the process eventually completed.
>
> I then booted an 16.04 live usb which has kernel 4.10.  I had to make a
> cryptswap using cryptsetup on the swap partition of the hard disk, to
> test it.   But this one worked! The behavior was a bit different that
> more recent kernels: after allocating 6 GB of the 10 GB that I
> requested, the system invoked the oom-killer on the process.  However,
> it did not hang.
>
> I then grabbed 4.10.1-041001-generic from mainline and tested it.  This
> worked the same as the live usb.  So 4.10 does not have the bug,
> although the oom-killer is invoked to kill the process grabbing the memory.
>
> I have an older machine with an SSD.  This does not exhibit the bug on
> Ubuntu 17.10 running 4.13.0-36.  However, this disk is _fast_ (Samsung
> EVO), so I'm guessing that there is some tuning problem, maybe, in how
> the kernel handles swap speed vs memory allocation requests???  I really
> don't know.
>
> --M
>
>
> On 04/11/2018 11:18 AM, Joseph Salisbury wrote:
>> I'd like to perform a bisect to figure out what commit caused this
>> regression. We need to identify the earliest kernel where the issue
>> started happening as well as the latest kernel that did not have this
>> issue.
>>
>> Can you test the following kernels and report back? We are looking for
>> the first kernel version that exhibits this bug:
>>
>> v4.14 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14/
>> v4.15-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc1/
>> v4.15-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc4/
>> v4.15 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/
>>
>> You don't have to test every kernel, just up until the kernel that first
>> has this bug.
>>
>> Thanks in advance!
>>
>> ** Tags added: performing-bisect
>>

--
Martin Weinberg
6 Grass Hill Rd
West Whately, MA
010039

In response to #7, I don't think this is the same issue. The cryptswap initiates with no difficulties, timeouts, etc. on boot. When I first installed Bionic, ubiquity did not correctly install the swap, and I filed #1759253 on that issue.

Joseph Salisbury (jsalisbury) wrote :

Thanks for finding the mainline 4.10 does not have the bug and 4.11 final does. Can you also test v4.11-rc1, so we can find which release candidate introduced the bug?

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc1

Some good and bad news: I have been testing this in a running Gnome 3 environment. I decided to start testing on console. On console, all kernels seem to fail using crypt swap. This suggests to me that failure must depend on current memory usage, dirty ratio, or some such runtime condition.

However, the good news, I have kernel messages on console. I'm attaching a screen shot, since none of them get written to syslog. A typical message is: task dm_write:xxx blocked for more than 120 seconds

There are no problems when not using dm_crypt for swap, i.e. 'normal' swap.

Redsandro (redsandro) wrote :

At the risk of saying something completely unrelated; any change this is a regression of #1310058 where the UUID got overwritten? It came back a few times around 2015, leading me to believe that the devs don't actually use this method, and might not test for this scenario, making a regression possible.

tags: added: kernel-da-key
removed: kernel-key
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers