With multiple swap partitions resume from hibenation fails

Bug #923326 reported by b3nmore
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
initramfs-tools
Invalid
Undecided
Unassigned
pm-utils (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

I've 3 swap partitions: sdX2, sdX5 and sdY2. Hibernate is setup via /etc/defaults/grub and /etc/initramfs-tools/conf.d/resume to use sdX5. When restarting after hibernation the system performs a normal boot. /var/log/boot.log reports an invalid resume image on sdX2. So I guess, that the resume image is written to sdX2 (as it should not), but the kernel looks for it at sdX5 (as it should).
If I disable swap partition sdX2, using only sdX5 and sdY2, resume works perfectly.
If I use all 3 partitions, but setup hibernate to use sdX2, resume works again.

Xubuntu Oneiric i686, Kernel: 3.0.0-15-generic #26, pm-utils: 1.4.1-8ubuntu1

Tags: oneiric
b3nmore (b3nmore)
description: updated
Revision history for this message
A Dasgupta (adg) wrote :
Download full text (5.7 KiB)

Not sure about kernel 3.0.x, but I will describe some methods for the
2.6.30+ kernels that I use with initramfs-tools-0.98. You most likely
have to modify and adapt things to suit your versions. Read carefully
and proceed (at your own risk) only if you know what you are doing.

When multiple swaps are active and you hibernate using the kernel's
built-in method (in pm-utils terms, SLEEP_MODULE="kernel", not
"uswsusp" or "tuxonice"), the kernel will randomly choose one of your
active swap devices to store the snapshot image (AFAIK, there is no way
to force the kernel to use a specific swap device). So resume will only
work if that device happens to be the same one as the one configured
via the "resume=<swapdev>" initramfs parameter (which is specified
either via "RESUME=..." in /etc/initramfs-tools/conf.d/resume, or via
grub2 config /etc/default/grub, or manually at the grub boot screen,
with each of these directives overriding the preceding ones).

Unfortunately, under normal conditions the kernel does not notify you
which partition it uses for hibernation. Here are some things to consider.
I will assume only swap partitions are being used --- no swap files.

(A) Quick and dirty manual recovery to resume:

DANGER, WILL ROBINSON!! READ THE WARNINGS BELOW!

If you hibernated while multiple swap partitions were active and still
want to resume, a quick and dirty manual method is to boot into the
initramfs "premount" break-poiint shell (boot parameter break=premount),
and run: blkid -t TYPE=swsusp (or run blkid | grep swsusp), which should
identify the unique partition where the hibernation image is stored.
[Note: Having multiple such partitions is a serious error condition
which should never happen and your only course of action then would be
to do a normal boot and making sure that the suspend signatures on those
partitions are erased (automatically done if those partitions are used as swap).
Also, if the blkid command above does not return anything you have to
do a normal boot; just exit out of the initramfs shell or reboot.]

Next, identify the major and minor numbers m:n for that partition from
/sys/class/block/sdaXY/dev (cat that file).

Finally, echo the major:minor numbers (in the format m:n) onto
/sys/power/resume. If everything has been done right, it should
perform a proper resume

Note again that these steps are valid only if the kernel's internal
hibernation method was used (when using userland uswsusp, the
command to manually resume is /sbin/resume -r <resume_dev>).

[One can even automate the above manual procedure by adding
an initramfs boot script which runs at the end of premount phase,
after resume via kernel/uswsusp has been tried. Such a script can
detect the presence of a unique "swsusp" swap partition (_after_
the initramfs premount resume/uswsusp boot scripts have
attempted to resume but have returned) and give the user
the option of either attempting a resume from that "swsusp"
partition, or proceeding to a normal boot (echo "if unsure, do
normal boot"). But unless carefully written, such a script can
introduce additional security or other vulnerabilities, see the
warnings below.]

WARNING, USE AT YOUR OWN RI...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in pm-utils (Ubuntu):
status: New → Confirmed
Revision history for this message
A Dasgupta (adg) wrote :

Here is a basic script that warns the user that the system is
about to a do normal boot while a resume partition is present,
giving a chance for a "manual recovery resume" for experienced
users. (The script will not work for swap files or for tuxonice.)

#!/bin/sh
#
# Initramfs-tools boot script:
#
# Warn the user if partition(s) with swsusp signature remains even after the
# resume/uswsusp boot scripts have run. This script is only for kernel swsusp
# and userland uswsusp (not for tuxonice, sorry).
#
# Put this script where the resume/uswsusp boot scripts are kept.
# USE AT YOUR OWN RISK.
#
# A Dasgupta, 2012
#

PREREQ="resume uswsusp"

prereqs()
{
        echo "$PREREQ"
}

case $1 in
# get pre-requisites
prereqs)
        prereqs
        exit 0
        ;;
esac

if [ -n "${noresume}" ] ; then
  exit 0
fi

# Get a list of devices with swsusp signature
lst=$(blkid -t TYPE=swsusp | while IFS=":" read x jnk ; do echo $x ; done)
# Count how many found
n=0
if [ -n "$lst" ] ; then
  for p in $lst ; do
    n=$((1+$n))
  done
fi

# Normally, this script should not run beyond this point
if [ -z "$lst" ] || [ "$n" = "0" ] ; then
  # Normal exit
  exit 0
fi

# Remaining swsusp partition(s) found -- warn user
sleep 2
echo ""
echo "Error: Suspend signature found on partition(s): ${lst}"

# Multiple partitions wth swsusp signature!
if [ "$n" != "1" ] ; then
  echo ""
  echo "ERROR: You have multiple partitions with swsusp signature."
  echo "This should never happen. If you are not using these swap"
  echo "partitions, please clear their swsusp signatures by turning on"
  echo "swap (temporarily) on these: ${lst}"
  echo ""
  sleep 15
  exit
fi

# A unique partition with the swsusp signature remains.
# Notify user of possible actions.
echo ""
echo "Error: Resume script(s) ran but suspend signature found on ${lst}."
echo "This can happen if you hibernated but resume parameters are wrong."
echo "If so, you may want to reboot/poweroff now, and break into a premount"
echo "initramfs shell at the next boot to attempt resuming from ${lst}."
echo ""
echo "WARNING: Resuming after your filesystems were mounted or from a"
echo "stale/wrong hibernation image will cause severe filesystem"
echo "corruption!!! If unsure, wait for normal boot, and make sure to"
echo "clear any stale hibernation image signature."
echo ""
echo "If unsure, do not attempt to resume; proceed with normal boot."
echo ""
sleep 2
echo "Seconds remaining before normal boot:"
n=40
while [ $n -ge 0 ] ; do
  if [ $n -lt 10 ] || [ $(($n % 5)) -eq 0 ] ; then
    echo -n " $n"
  else
    echo -n "."
  fi
  sleep 1
  n=$(($n - 1))
done
echo ""
echo "Proceeding with normal boot"
echo ""
sleep 3
exit

Revision history for this message
A Dasgupta (adg) wrote :

Script (slightly improved) attached for download.

Revision history for this message
A Dasgupta (adg) wrote :

Some internals:

If you decide to resume manually, here is some additional low-level
(non-portable) method for detecting suspend signatures. This is
meant only for experienced users and even then a small error
can destroy all your data.

Both the kernel internal swsusp and the userland uswsusp (suspend-utils)
codes store their suspend signature in the last 10 bytes of the first
4096-byte block of the swap partition used at hibernation.

For the kernel internal swsusp, that signature is "S1SUSPEND\0",
as defined in kernel/power/swap.c:

#define SWSUSP_SIG "S1SUSPEND"

and for userland suspend, it is "ULSUSPEND\0", as defined in
swsusp.h:

#define SWSUSP_SIG "ULSUSPEND"

So if in the initramfs premount break-point shell you find a swap partition
/dev/sdXY for which blkid reports TYPE=swsusp, you can do the following
test for further confirmation (DANGER WARNING APPLIES):

suspsig=$(dd if=/dev/sdXY bs=1 skip=4086 count=9 2>/dev/null)

if [ "$suspsig" = "S1SUSPEND" ] ; then
  echo "kernel swsusp, echo major:minor onto /sys/power/resume to resume"
fi

if [ "$suspsig" = "ULSUSPEND" ] ; then
  echo "userland uswsusp, run /sbin/resume -r /dev/sdXY to resume"
fi

I am posting these only for those who may want to experiment while
being fully aware of the risk of losing all data on disk. The warning script
for manual resume posted in #4 can be further automated with the
last commands (giving the user the option to run these low-level tests
and resume commands to "auto-resume"), but that would be neither
portable not safe.
.

Revision history for this message
A Dasgupta (adg) wrote :

There was an error in my posted script above ("swswap"should be
"swsuspend").

Fixed script (tested) is attached with this post.

Please replace the old script by the attached fixed script file
warnresumefail.fixed

Here is the diff anyway:

--- warnresumefail 2012-04-17 03:56:16.945738809 -0400
+++ warnresumefail.fixed 2012-04-17 03:34:27.294976347 -0400
@@ -41,7 +41,7 @@
 fi

 # Get a list and count of devices with swsusp signature
-set x $($BLKID -o device -t TYPE=swswap)
+set x $($BLKID -o device -t TYPE=swsuspend)
 shift
 lst="$@"
 n="$#"

Revision history for this message
A Dasgupta (adg) wrote :

Final version ready for testing (it only improves the error
message from the last fixed version). If you have a spare
computer where data does not matter, put this script in

   /etc/initramfs-tools/scripts/local-premount/

run update-initramfs -u ; update-grub, and then
deliberately misconfigure the "resume=" parameter
to test if you can still recover.
.

papukaija (papukaija)
tags: added: oneiric
Revision history for this message
A Dasgupta (adg) wrote :

The warning message in the script is still a little cryptic.
To help the user to recover, a better message is needed.

Revision history for this message
dino99 (9d9) wrote :

This version has expired

Changed in pm-utils (Ubuntu):
status: Confirmed → Invalid
Changed in initramfs-tools:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.