Comment 14 for bug 1880032

Revision history for this message
Dexuan Cui (decui) wrote :

Detailed steps to repro the issueo on Azure:
1. Create a VM with the image "Ubuntu Server 20.04 LTS - Gen1". Any VM size should be fine. Here I use "Standard E4-2ds_v4 (2 vcpus, 32 GiB memory)".

2. Add an extra disk of 64GB to the VM via Azure portal.

3. Login the VM via ssh and check the kernel version: here I get 5.4.0-1022-azure.

4. In the VM, the 64GB disk can be sdc. Let's create a swap partition in it, i.e. sdc1.

5. mkswap /dev/sdc1
    root@decui-tmp-2004:~# mkswap /dev/sdc1
    Setting up swapspace version 1, size = 64 GiB (68718424064 bytes)
    no label, UUID=544831e4-72ab-4d2c-81aa-6dac3a8e20ad

6. Add the swap partition info into /etc/fstab:
    UUID=544831e4-72ab-4d2c-81aa-6dac3a8e20ad none swap sw 0 0

7. Use "swapon -a; swapon -s" to confirm that the swap partition works.

8. Add the kernel parameter resume= into /etc/default/grub.d/50-cloudimg-settings.cfg:
     GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0 earlyprintk=ttyS0 resume=UUID=544831e4-72ab-4d2c-81aa-6dac3a8e20ad ignore_loglevel no_console_suspend"

   Note: here I also add "ignore_loglevel no_console_suspend", which are *required* to see the error messages during hibernation.

9. Comment out the only line in /etc/default/grub.d/40-force-partuuid.cfg:
     ####GRUB_FORCE_PARTUUID=bf00dea3-136e-49cb-a640-0df7ce49d6db
   Note: this step is required, otherwise the generated grub.cfg doesn't contain the "initrd ..." line , which is required for resuming to work.

10. Run "update-grub2; reboot".
     Note: this 'reboot' might be a must, because we'll need to re-generate the initramfs when the running kernel has the resume= parameter.

11. Login the VM again and run "update-initramfs -u".

12. Run "echo disk > /sys/power/state". Note: we'd better run this command from Azure serial console (we need to set a password for root and use that to login via the serial console) so we can easily watch what will be happening.

root@decui-tmp-2004:~# echo disk > /sys/power/state
[ 67.838749] PM: hibernation entry
[ 68.266627] Filesystems sync: 0.041 seconds
[ 68.271740] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 68.281528] OOM killer disabled.
[ 68.286475] PM: Marking nosave pages: [mem 0x00000000-0x00000fff]
[ 68.293459] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
[ 68.300306] PM: Marking nosave pages: [mem 0x3fff0000-0xffffffff]
[ 68.308250] PM: Basic memory bitmaps created
[ 68.313082] PM: Preallocating image memory... done (allocated 298659 pages)
[ 69.303864] PM: Allocated 1194636 kbytes in 0.98 seconds (1219.01 MB/s)
[ 69.311605] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 69.322486] serial 00:04: disabled
[ 69.345193] ------------[ cut here ]------------
[ 69.345199] WARNING: CPU: 1 PID: 1495 at kernel/workqueue.c:3040 __flush_work+0x1b5/0x1d0
...
[ 70.047238] CPU1 is up
[ 70.054474] hv_utils: KVP IC version 4.0
[ 70.056763] hv_utils: Shutdown IC version 3.2
[ 70.061009] hv_balloon: Using Dynamic Memory protocol version 2.0

It looks the kernel hangs here forever. Normally the VM is expected to save the state to disk and power off and later when we start the VM from the portal, the VM is expected to resume back from the 'echo' command on the serial console.

If I build a kernel with the same source code but revert 0a14dbaa0736, the above suspending and resuming work fine.