[reiserfs] Suspend/resume corrupts external data storage devices

Bug #883748 reported by Steve
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Ubuntu will permit suspend/resume (have not tried hibernate) without attempting to unmount any external USB storage devices, even though processes may still have open files on these devices. After the system is resumed, the USB storage devices may not be present, due to not having been re-enumerated yet. This can and DOES result in data corruption on the filesystem on the external device, as evidenced by errors in dmesg and having to do journal replays. I am not sure that the dmesg here will have the whole story (due to dmesg -c having been run as part of another bug report) but the parts that may not be there basically boil down to reiserfs being very unhappy that the block device was yanked out from under it.

As a policy decision, it is WRONG to allow suspend to take place unless externally-attached USB block devices are still mounted. The system needs to attempt to unmount everything in /media prior to suspending, and needs to FAIL TO SUSPEND (throwing an error) if all devices cannot be unmounted due to some of the mount points being in use.

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: linux-image-3.0.0-12-generic 3.0.0-12.20
ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
Uname: Linux 3.0.0-12-generic x86_64
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 1.23-0ubuntu3
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
   Subdevices: 2/2
   Subdevice #0: subdevice #0
   Subdevice #1: subdevice #1
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: steve 24738 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xfe220000 irq 49'
   Mixer name : 'Analog Devices AD1984'
   Components : 'HDA:11d41984,17aa20bb,00100400'
   Controls : 32
   Simple ctrls : 20
Card29.Amixer.info:
 Card hw:29 'ThinkPadEC'/'ThinkPad Console Audio Control at EC reg 0x30, fw 7KHT24WW-1.08'
   Mixer name : 'ThinkPad EC 7KHT24WW-1.08'
   Components : ''
   Controls : 1
   Simple ctrls : 1
Card29.Amixer.values:
 Simple mixer control 'Console',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Date: Sat Oct 29 23:43:31 2011
GvfsMonitorLog: Monitoring events. Press Ctrl+C to quit.
HibernationDevice: RESUME=UUID=31923f5c-ed0d-4c47-92e3-5ee28dfeafe4
HotplugNewDevices:

HotplugNewMounts:

InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
MachineType: LENOVO 646067U
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.0.0-12-generic root=UUID=d4fbb1dd-1060-4e10-b8b8-f7d06ae50146 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.0.0-12-generic N/A
 linux-backports-modules-3.0.0-12-generic N/A
 linux-firmware 1.60
SourcePackage: linux
Symptom: storage
UdevMonitorLog:
 monitor will print the received events for:
 UDEV - the event which udev sends out after rule processing
UdisksMonitorLog: Monitoring activity from the disks daemon. Press Ctrl+C to cancel.
UpgradeStatus: Upgraded to oneiric on 2011-10-17 (13 days ago)
dmi.bios.date: 03/18/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 7LETC9WW (2.29 )
dmi.board.name: 646067U
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7LETC9WW(2.29):bd03/18/2011:svnLENOVO:pn646067U:pvrThinkPadT61p:rvnLENOVO:rn646067U:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 646067U
dmi.product.version: ThinkPad T61p
dmi.sys.vendor: LENOVO

Revision history for this message
Steve (stevenm86) wrote :
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 883748] [NEW] Suspend/resume corrupts external data storage devices

On Sun, Oct 30, 2011 at 4:04 PM, Launchpad Bug Tracker
<email address hidden> wrote:
> You have been subscribed to a public bug:
>
> Ubuntu will permit suspend/resume (have not tried hibernate) without
> attempting to unmount any external USB storage devices, even though
> processes may still have open files on these devices. After the system
> is resumed, the USB storage devices may not be present, due to not
> having been re-enumerated yet. This can and DOES result in data
> corruption on the filesystem on the external device, as evidenced by
> errors in dmesg and having to do journal replays. I am not sure that the

How can you conclude that the case may result in data corruption on
filesystem on the external device?

In fact, before system suspend, 'sync' has been called to sync all the data
for all the filesystems.

thanks,
--
Ming Lei

Revision history for this message
Steve (stevenm86) wrote : Re: Suspend/resume corrupts external data storage devices

First, problems can occur because there may be open files on the device when the system is suspended. Due to delays in enumeration, the external device will not immediately be back when the system is resumed. Thus, the effect is that of essentially unplugging the device without unmounting it first. Any processes that have files open on the device will now have an invalid file handle and will not be able to write to those files. While this alone may not immediately result in corruption at the filesystem level, it may certainly result in corruption or inconsistency at the data level (eg. if a process is writing related data to two files, one on the internal drive and one on an external drive, and the external drive suddenly goes away due to suspend).

Second, I am seeing logs in dmesg about how the filesystem was not unmounted cleanly (obviously) and also seeing journal replay. Removing the device after doing 'sync' should not really result in a journal replay unless disk inconsistency is happening, which is what I am seeing here.

Steve

Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 883748] Re: Suspend/resume corrupts external data storage devices

Hi,

On Sun, Oct 30, 2011 at 11:31 PM, Steve <email address hidden> wrote:
> First,  problems can occur because there may be open files on the device
> when the system is suspended. Due to delays in enumeration, the external
> device will not immediately be back when the system is resumed. Thus,

It will be so if reset_resume is set for the usb mass storage device
during resume.

> the effect is that of essentially unplugging the device without
> unmounting it first. Any processes that have files open on the device
> will now have an invalid file handle and will not be able to write to
> those files. While this alone may not immediately result in corruption
> at the filesystem level, it may certainly result in corruption or
> inconsistency at the data level (eg. if a process is writing related
> data to two files, one on the internal drive and one on an external
> drive, and the external drive suddenly goes away due to suspend).

Sounds like writing to filesystem on external device is not finished
before suspending. After resuming, the writing is ongoing but
failed because of reset on external usb device.

PM guys are discussing on linux-pm about suspend block mechanism,
looks like it can prevent the problem from being triggered.

>
> Second, I am seeing logs in dmesg about how the filesystem was not
> unmounted cleanly (obviously) and also seeing journal replay. Removing
> the device after doing 'sync' should not really result in a journal
> replay unless disk inconsistency is happening, which is what I am seeing
> here.

For the partial writing to filesystem on external device, looks like it should
not be a disaster for filesystem and the fs should deal with this case, such as
just mark this writing as failure and do not corrupt the whole filesystem.

Could you make sure that the filesystem(REISERFS) on the external device
has been destroyed? If only the single file is affected, it should not be a big
deal.

thanks,
--
Ming Lei

Revision history for this message
Steve (stevenm86) wrote : Re: Suspend/resume corrupts external data storage devices

> Could you make sure that the filesystem(REISERFS) on the external device
> has been destroyed? If only the single file is affected, it should not be a big
> deal.

I'm sorry, but this is unacceptable. It is not correct for even a single file to be affected by this problem. What if it happens to be a very important file? When it comes to filesystem integrity, it's either fully correct or it isn't - there is no "partial credit" here.

The workaround can be done entirely from userspace, without having to wait for linux-pm guys to sort their stuff out. Back when I ran a gentoo box, I had a suspend script with a simple 5 lines of shell code which tried to unmount external devices, and did not initiate suspend unless this operation completed. Sure, the problem would still occur if you directly echo mem > /sys/power/state, but at least clicking 'Suspend' from the UI would try to do the right thing first, and fail to suspend if unmounting was unsuccessful.

Maybe I will just go back to that script on this system.

Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 883748] Re: Suspend/resume corrupts external data storage devices

On Tue, Nov 1, 2011 at 2:44 AM, Steve <email address hidden> wrote:
>> Could you make sure that the filesystem(REISERFS) on the external device
>> has been destroyed? If only the single file is affected, it should not be a big
>> deal.
>
> I'm sorry, but this is unacceptable. It is not correct for even a single
> file to be affected by this problem. What if it happens to be a very
> important file? When it comes to filesystem integrity, it's either fully
> correct or it isn't - there is no "partial credit" here.

Yes, it should be a kind of critical things.

In fact, I mean the problem has been in linux kernel for long time if it is so.

Suspend blocker mechanism may prevent the problem from being
triggered if it is applied properly, but it is still under discussion
on upstream.
At least now, system sleep can happen at any time. If it does happen during
writing file, file or fs corruption may be caused as the issue.

BTW, I am still not sure if the partial writing before suspend caused
the problem.

>
> The workaround can be done entirely from userspace, without having to
> wait for linux-pm guys to sort their stuff out. Back when I ran a gentoo
> box, I had a suspend script with a simple 5 lines of shell code which
> tried to unmount external devices, and did not initiate suspend unless
> this operation completed. Sure, the problem would still occur if you

You may not umount successfully if accesses to the filesystem is
pending or ongoing.

> directly echo mem > /sys/power/state, but at least clicking 'Suspend'
> from the UI would try to do the right thing first, and fail to suspend
> if unmounting was unsuccessful.
>
> Maybe I will just go back to that script on this system.

thanks,
--
Ming Lei

Revision history for this message
Steve (stevenm86) wrote : Re: Suspend/resume corrupts external data storage devices

> You may not umount successfully if accesses to the filesystem is
> pending or ongoing.

Yes, that is exactly the point. If you cannot unmount, you cannot and SHOULD NOT suspend. So, if the user clicks Suspend and everything in /media cannot be unmounted, you give an error instead of suspending.

Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 883748] Re: Suspend/resume corrupts external data storage devices

On Tue, Nov 1, 2011 at 5:55 AM, Steve <email address hidden> wrote:
>> You may not umount successfully if accesses to the filesystem is
>> pending or ongoing.
>
> Yes, that is exactly the point. If you cannot unmount, you cannot and
> SHOULD NOT suspend. So, if the user clicks Suspend and everything in
> /media cannot be unmounted, you give an error instead of suspending.

It will make user confused if suspend operation is just failed because of
a usb mass storage device connected and some directories or files are open,
looks like it is a bit stupid, doesn't it?

Just check for /media is still not enough too, someone may mount external
block devices under other directories.

thanks,
--
Ming Lei

Revision history for this message
Steve (stevenm86) wrote : Re: Suspend/resume corrupts external data storage devices

> It will make user confused if suspend operation is just failed because of
> a usb mass storage device connected and some directories or files are open,
> looks like it is a bit stupid, doesn't it?

It looks a lot MORE stupid if you allow the suspend to happen, but then destroy the data on the disk.

> Just check for /media is still not enough too, someone may mount external
> block devices under other directories.

It doesn't have to be /media, but you can certainly figure out which devices are external USB storage devices. /proc/mounts would be a good place to start.

Even in the current scheme, I've had Ubuntu totally screw up the external storage device stuff. I can have a terminal open to my external device, and then suspend/resume. The filesystem becomes inaccessible, and the current directory looks empty. The path given on the shell prompt now says "(unreachable)/whatever". Sometimes you have to cd out of the external device mount point and back into it, and sometimes the external device gets mounted under a different name (same UUID, but with a '_' appended to the name). In this situation, the old device has not gone away due to the filesystem still having a reference to it, and by this point the filesystem state is trashed.

Revision history for this message
Steve (stevenm86) wrote :
Download full text (4.6 KiB)

Experienced this once again. Given the severity of the problem, I cannot believe this is not getting immediate attention.

The device was not removed or even touched between suspend and resume. The mount point is filled with some half-assed combination of the original drive contents and unreadable files. The only way to restore sanity is to reset the disk and remount it, and hope it didn't get corrupted as a result of inconsistent writebacks. dmesg is filled with the following:
[236091.998664] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318400 0x0 SD]
[236091.998686] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318401 0x0 SD]
[236091.998695] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318401 0x0 SD]
[236091.998717] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318402 0x0 SD]
[236091.998726] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318402 0x0 SD]
[236091.998748] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318403 0x0 SD]
[236091.998757] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318403 0x0 SD]
[236092.002947] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318404 0x0 SD]
[236092.002967] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318404 0x0 SD]
[236092.002999] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318405 0x0 SD]
[236092.003019] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318405 0x0 SD]
[236092.003043] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318769 0x0 SD]
[236092.003052] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318769 0x0 SD]
[236092.003074] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318388 0x0 SD]
[236092.003084] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318388 0x0 SD]
[236092.003106] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318410 0x0 SD]
[236092.003115] REISERFS error (device sdc1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1318385 1318410 0x0 SD]
[236092.003137] REISERFS error (device sdc1): vs...

Read more...

penalvch (penalvch)
tags: added: needs-upstream-testing resume suspend
Revision history for this message
Steve (stevenm86) wrote :

I have been working around the problem with a shell script hack that tries to unmount all my known media prior to suspending. Yes, sometimes it kicks me back to the unlock screen without suspending if I left a terminal open somewhere that points to external media, but that's kind of the point (and is certainly better than the alternative).

It would be nice to get a proper upstream fix...

Revision history for this message
Nathan Korth (computernut401) wrote :

This is still a problem; I've been having issues using an SD card with a netbook that has very little internal memory. After resume, the mount is still there but it doesn't work. In fact, one time it seems to have caused a kernel panic:

Mar 9 22:16:09 nkorth-mini kernel: [12594.477903] EXT4-fs (mmcblk0p1): mount failed
Mar 9 22:16:09 nkorth-mini kernel: [12594.484058] BUG: unable to handle kernel NULL pointer dereference at 00000054
Mar 9 22:16:09 nkorth-mini kernel: [12594.484328] IP: [<c1148663>] mount_fs+0x43/0x180
Mar 9 22:16:09 nkorth-mini kernel: [12594.484504] *pdpt = 00000000152bf001 *pde = 0000000000000000
Mar 9 22:16:09 nkorth-mini kernel: [12594.484709] Oops: 0000 [#1] SMP

Revision history for this message
Steve (stevenm86) wrote : Re: [Bug 883748] Re: Suspend/resume corrupts external data storage devices
Download full text (5.7 KiB)

I don't understand why distributions think it is okay to keep removable
devices mounted across suspend. This is just asking for data corruption.
I've written some sleep scripts that prevent suspend of unmount could not
be done but these are fairly hacked up...
On Mar 11, 2013 6:45 AM, "Nathan Korth" <email address hidden> wrote:

> This is still a problem; I've been having issues using an SD card with a
> netbook that has very little internal memory. After resume, the mount is
> still there but it doesn't work. In fact, one time it seems to have
> caused a kernel panic:
>
> Mar 9 22:16:09 nkorth-mini kernel: [12594.477903] EXT4-fs (mmcblk0p1):
> mount failed
> Mar 9 22:16:09 nkorth-mini kernel: [12594.484058] BUG: unable to handle
> kernel NULL pointer dereference at 00000054
> Mar 9 22:16:09 nkorth-mini kernel: [12594.484328] IP: [<c1148663>]
> mount_fs+0x43/0x180
> Mar 9 22:16:09 nkorth-mini kernel: [12594.484504] *pdpt =
> 00000000152bf001 *pde = 0000000000000000
> Mar 9 22:16:09 nkorth-mini kernel: [12594.484709] Oops: 0000 [#1] SMP
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/883748
>
> Title:
> Suspend/resume corrupts external data storage devices
>
> Status in “linux” package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu will permit suspend/resume (have not tried hibernate) without
> attempting to unmount any external USB storage devices, even though
> processes may still have open files on these devices. After the system
> is resumed, the USB storage devices may not be present, due to not
> having been re-enumerated yet. This can and DOES result in data
> corruption on the filesystem on the external device, as evidenced by
> errors in dmesg and having to do journal replays. I am not sure that
> the dmesg here will have the whole story (due to dmesg -c having been
> run as part of another bug report) but the parts that may not be there
> basically boil down to reiserfs being very unhappy that the block
> device was yanked out from under it.
>
> As a policy decision, it is WRONG to allow suspend to take place
> unless externally-attached USB block devices are still mounted. The
> system needs to attempt to unmount everything in /media prior to
> suspending, and needs to FAIL TO SUSPEND (throwing an error) if all
> devices cannot be unmounted due to some of the mount points being in
> use.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 11.10
> Package: linux-image-3.0.0-12-generic 3.0.0-12.20
> ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
> Uname: Linux 3.0.0-12-generic x86_64
> NonfreeKernelModules: nvidia
> AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
> ApportVersion: 1.23-0ubuntu3
> Architecture: amd64
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
> Subdevices: 2/2
> Subdevice #0: subdevice #0
> Subdevice #1: subdevice #1
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: steve 24738 F.... pulseaudio
> CRD...

Read more...

penalvch (penalvch)
tags: added: bios-outdated-2.30 needs-suspend-logs regression-potential
summary: - Suspend/resume corrupts external data storage devices
+ [reiserfs] Suspend/resume corrupts external data storage devices
Revision history for this message
Fixitman Arizona (fixitmanarizona) wrote :

No idea of the status of this bug. Running Xubuntu 16.04 and whatever workaround has been applied on my particular system (yes, years later) will cause suspend to work, but not resume, if drives are UNMOUNTED while suspend is taking place. No data corruption is likely at this point if they are left mounted and not removed. I've had this problem where partitions on the same disc as the OS, if unmounted upon suspend, cause resume to fail, and a hard reboot to become necessary, and the system then checks for file errors before booting.
External devices showing as bootable, even with no drive attached, also cause the same failure, so I am sure to remove, say, a multimedia card reader plugged in via USB, with no card in it, which shows the same as an unmounted drive.
I had the same problem back with Xubuntu 12.04 on another machine (whose drive is now trashed after 10+ years of uptime.)
Hope this helps someone searching for resume from suspend failure.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.