Activity log for bug #1894895

Date Who What changed Old value New value Message
2020-09-08 19:49:58 Dexuan Cui bug added bug
2020-10-23 19:38:16 Marcelo Cerri nominated for series Ubuntu Focal
2020-10-23 19:38:16 Marcelo Cerri bug task added linux-azure (Ubuntu Focal)
2020-10-23 19:39:59 Marcelo Cerri linux-azure (Ubuntu Focal): status New In Progress
2020-10-23 19:45:15 Marcelo Cerri description Description of problem: On Azure, if the VM is Stopped(deallocated) and later Started, the VF NIC's VMBus Instance GUID may change, and as a result hibernation/resume can hang forever. This happens to the latest stable release of the linux-azure 5.4.0-1023.23 kernel and the latest mainline linux kernel. How reproducible: 100% Steps to Reproduce: 1. Start a VM in Azure that supports Accelerated Networking, and enable hibernation properly (please refer to https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/14 ) 2. Do hibernation from serial console # systemctl hibernate 4. After the VM state changes to "Stopped", click "Stop" button from Azure portal to change the VM state to Stopped(deallocated) 5. Wait for some time (e.g. 10 minutes? 1 hour?), and click the "Start" button to start the VM, and then check the boot-up process from the serial console. Actual results: Can not boot up. VM hangs after resume. Starting Resume from hibernation us…6c7-2c0c-491e-adcf-b625d69faf76... [ 19.822747] PM: resume from hibernation [ 19.836693] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 19.846968] OOM killer disabled. [ 19.850236] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 20.542934] PM: Using 1 thread(s) for decompression [ 20.548250] PM: Loading and decompressing image data (559580 pages)... [ 22.844964] PM: Image loading progress: 0% [ 28.131327] PM: Image loading progress: 10% [ 32.346480] PM: Image loading progress: 20% [ 37.453971] PM: Image loading progress: 30% [ 40.834525] PM: Image loading progress: 40% [ 42.980629] PM: Image loading progress: 50% [ 44.342959] PM: Image loading progress: 60% [ 45.506197] PM: Image loading progress: 70% [ 46.800445] PM: Image loading progress: 80% [ 48.010185] PM: Image loading progress: 90% [ 49.045671] PM: Image loading done [ 49.050419] PM: Read 2238320 kbytes in 28.48 seconds (78.59 MB/s) [ 49.074198] printk: Suspending console(s) (use no_console_suspend to debug) (The VM hangs here forever) BUG FIX: A workaround patch is available and is being reviewed: https://lkml.org/lkml/2020/9/4/1270 [Impact] Description of problem: On Azure, if the VM is Stopped(deallocated) and later Started, the VF NIC's VMBus Instance GUID may change, and as a result hibernation/resume can hang forever. This happens to the latest stable release of the linux-azure 5.4.0-1023.23 kernel and the latest mainline linux kernel. [Test Case] How reproducible: 100% Steps to Reproduce: 1. Start a VM in Azure that supports Accelerated Networking, and enable hibernation properly (please refer to https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/14 ) 2. Do hibernation from serial console # systemctl hibernate 4. After the VM state changes to "Stopped", click "Stop" button from Azure portal to change the VM state to Stopped(deallocated) 5. Wait for some time (e.g. 10 minutes? 1 hour?), and click the "Start" button to start the VM, and then check the boot-up process from the serial console. Actual results: Can not boot up. VM hangs after resume. Starting Resume from hibernation us…6c7-2c0c-491e-adcf-b625d69faf76... [ 19.822747] PM: resume from hibernation [ 19.836693] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 19.846968] OOM killer disabled. [ 19.850236] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 20.542934] PM: Using 1 thread(s) for decompression [ 20.548250] PM: Loading and decompressing image data (559580 pages)... [ 22.844964] PM: Image loading progress: 0% [ 28.131327] PM: Image loading progress: 10% [ 32.346480] PM: Image loading progress: 20% [ 37.453971] PM: Image loading progress: 30% [ 40.834525] PM: Image loading progress: 40% [ 42.980629] PM: Image loading progress: 50% [ 44.342959] PM: Image loading progress: 60% [ 45.506197] PM: Image loading progress: 70% [ 46.800445] PM: Image loading progress: 80% [ 48.010185] PM: Image loading progress: 90% [ 49.045671] PM: Image loading done [ 49.050419] PM: Read 2238320 kbytes in 28.48 seconds (78.59 MB/s) [ 49.074198] printk: Suspending console(s) (use no_console_suspend to debug) (The VM hangs here forever) [Regression Potential] The fix touched the vmbus and can compromise the hyper-v guest drivers. However the change is simple and just adds an additional timeout. [Other info] BUG FIX: A workaround patch is available and is being reviewed: https://lkml.org/lkml/2020/9/4/1270 Final fix: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19873eec7e13fda140a0ebc75d6664e57c00bfb1
2020-10-23 20:35:57 Marcelo Cerri description [Impact] Description of problem: On Azure, if the VM is Stopped(deallocated) and later Started, the VF NIC's VMBus Instance GUID may change, and as a result hibernation/resume can hang forever. This happens to the latest stable release of the linux-azure 5.4.0-1023.23 kernel and the latest mainline linux kernel. [Test Case] How reproducible: 100% Steps to Reproduce: 1. Start a VM in Azure that supports Accelerated Networking, and enable hibernation properly (please refer to https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/14 ) 2. Do hibernation from serial console # systemctl hibernate 4. After the VM state changes to "Stopped", click "Stop" button from Azure portal to change the VM state to Stopped(deallocated) 5. Wait for some time (e.g. 10 minutes? 1 hour?), and click the "Start" button to start the VM, and then check the boot-up process from the serial console. Actual results: Can not boot up. VM hangs after resume. Starting Resume from hibernation us…6c7-2c0c-491e-adcf-b625d69faf76... [ 19.822747] PM: resume from hibernation [ 19.836693] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 19.846968] OOM killer disabled. [ 19.850236] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 20.542934] PM: Using 1 thread(s) for decompression [ 20.548250] PM: Loading and decompressing image data (559580 pages)... [ 22.844964] PM: Image loading progress: 0% [ 28.131327] PM: Image loading progress: 10% [ 32.346480] PM: Image loading progress: 20% [ 37.453971] PM: Image loading progress: 30% [ 40.834525] PM: Image loading progress: 40% [ 42.980629] PM: Image loading progress: 50% [ 44.342959] PM: Image loading progress: 60% [ 45.506197] PM: Image loading progress: 70% [ 46.800445] PM: Image loading progress: 80% [ 48.010185] PM: Image loading progress: 90% [ 49.045671] PM: Image loading done [ 49.050419] PM: Read 2238320 kbytes in 28.48 seconds (78.59 MB/s) [ 49.074198] printk: Suspending console(s) (use no_console_suspend to debug) (The VM hangs here forever) [Regression Potential] The fix touched the vmbus and can compromise the hyper-v guest drivers. However the change is simple and just adds an additional timeout. [Other info] BUG FIX: A workaround patch is available and is being reviewed: https://lkml.org/lkml/2020/9/4/1270 Final fix: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19873eec7e13fda140a0ebc75d6664e57c00bfb1 [Impact] Description of problem: On Azure, if the VM is Stopped(deallocated) and later Started, the VF NIC's VMBus Instance GUID may change, and as a result hibernation/resume can hang forever. This happens to the latest stable release of the linux-azure 5.4.0-1023.23 kernel and the latest mainline linux kernel. [Test Case] How reproducible: 100% Steps to Reproduce: 1. Start a VM in Azure that supports Accelerated Networking, and enable hibernation properly (please refer to https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/14 ) 2. Do hibernation from serial console # systemctl hibernate 4. After the VM state changes to "Stopped", click "Stop" button from Azure portal to change the VM state to Stopped(deallocated) 5. Wait for some time (e.g. 10 minutes? 1 hour?), and click the "Start" button to start the VM, and then check the boot-up process from the serial console. Actual results: Can not boot up. VM hangs after resume. Starting Resume from hibernation us…6c7-2c0c-491e-adcf-b625d69faf76... [ 19.822747] PM: resume from hibernation [ 19.836693] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 19.846968] OOM killer disabled. [ 19.850236] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 20.542934] PM: Using 1 thread(s) for decompression [ 20.548250] PM: Loading and decompressing image data (559580 pages)... [ 22.844964] PM: Image loading progress: 0% [ 28.131327] PM: Image loading progress: 10% [ 32.346480] PM: Image loading progress: 20% [ 37.453971] PM: Image loading progress: 30% [ 40.834525] PM: Image loading progress: 40% [ 42.980629] PM: Image loading progress: 50% [ 44.342959] PM: Image loading progress: 60% [ 45.506197] PM: Image loading progress: 70% [ 46.800445] PM: Image loading progress: 80% [ 48.010185] PM: Image loading progress: 90% [ 49.045671] PM: Image loading done [ 49.050419] PM: Read 2238320 kbytes in 28.48 seconds (78.59 MB/s) [ 49.074198] printk: Suspending console(s) (use no_console_suspend to debug) (The VM hangs here forever) [Regression Potential] The fix touches vmbus and can compromise the hyper-v guest drivers. However the change is simple and just adds an additional timeout. [Other info] BUG FIX: A workaround patch is available and is being reviewed: https://lkml.org/lkml/2020/9/4/1270 Final fix: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19873eec7e13fda140a0ebc75d6664e57c00bfb1
2020-10-26 08:17:59 Stefan Bader linux-azure (Ubuntu Focal): importance Undecided Medium
2020-10-26 08:18:07 Stefan Bader linux-azure (Ubuntu): status New Invalid
2020-10-26 18:29:35 Ian May linux-azure (Ubuntu Focal): status In Progress Fix Committed
2020-10-27 12:20:33 Marcelo Cerri nominated for series Ubuntu Groovy
2020-10-27 12:20:33 Marcelo Cerri bug task added linux-azure (Ubuntu Groovy)
2020-10-27 12:21:10 Marcelo Cerri linux-azure (Ubuntu Groovy): status New Invalid
2020-10-27 12:21:23 Marcelo Cerri linux-azure (Ubuntu): assignee Marcelo Cerri (mhcerri)