lvm2 hangs when creating snapshot of live root and IO to the root filesystem is happening

Bug #1096520 reported by Peter Passchier
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

On Lucid LTS 10.04.4 with package linux-image-2.6.32-45-generic-pae 2.6.32-45.101 (2.6.32.45.52),
also happens on linux-image-2.6.35-32-generic-pae 2.6.35-32.68~lucid1 (2.6.35.14)
Root filesystem ext4 (e2fslibs 1.41.11-1ubuntu2.1) on lvm2 2.02.54-1ubuntu4.1,
vg resides on luks-encrypted partition (cryptsetup 2:1.1.0~rc2-1ubuntu13)

This might be a regression of bug 615911, bug 595489, bug 604807, bug 605551 - same symptoms:
after lvcreate -s -p r -L 3G -n oscopy /dev/secret/ubuntu the system becomes unresponsive at the root filesystem becomes inaccessible. The lv is never properly created, but on reboot there are some vestiges that cannot be mounted (/dev/secret/oscopy).
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: i386
ArecordDevices:
 XOpenDisplay() failed
 **** List of CAPTURE Hardware Devices ****
 card 0: PCH [HDA Intel PCH], device 0: ALC662 rev1 Analog [ALC662 rev1 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: gdm 1869 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'PCH'/'HDA Intel PCH at 0xfe700000 irq 49'
   Mixer name : 'Realtek ALC662 rev1'
   Components : 'HDA:10ec0662,103c2ac2,00100101'
   Controls : 21
   Simple ctrls : 13
Card1.Amixer.info:
 Card hw:1 'Generic'/'HD-Audio Generic at 0xfe640000 irq 50'
   Mixer name : 'ATI R6xx HDMI'
   Components : 'HDA:1002aa01,00aa0100,00100200'
   Controls : 4
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [off]
DistroRelease: Ubuntu 10.04
HibernationDevice: RESUME=UUID=03a0c90e-c2e3-4eaf-bab1-3160a73be6c1
InstallationMedia: Ubuntu-Server 10.04.3 LTS "Lucid Lynx" - Release i386 (20110719.2)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: Hewlett-Packard p7-1022l
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.35-32-generic-pae root=/dev/mapper/secret-ubuntu ro quiet
ProcEnviron:
 SHELL=/bin/bash
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.35-32.68~lucid1-generic-pae 2.6.35.14
Regression: Yes
RelatedPackageVersions: linux-firmware 1.34.14
Reproducible: Yes
RfKill:

Tags: lucid regression-update needs-upstream-testing
Uname: Linux 2.6.35-32-generic-pae i686
UserGroups:

dmi.bios.date: 07/04/2011
dmi.bios.vendor: AMI
dmi.bios.version: 7.11
dmi.board.name: 2AC2
dmi.board.vendor: PEGATRON CORPORATION
dmi.board.version: 1.02A
dmi.chassis.asset.tag: 4CE13108SV
dmi.chassis.type: 3
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnAMI:bvr7.11:bd07/04/2011:svnHewlett-Packard:pnp7-1022l:pvr1.02A:rvnPEGATRONCORPORATION:rn2AC2:rvr1.02A:cvnHewlett-Packard:ct3:cvr:
dmi.product.name: p7-1022l
dmi.product.version: 1.02A
dmi.sys.vendor: Hewlett-Packard

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1096520

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: lucid
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Peter Passchier (peter-passchier) wrote : AlsaDevices.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Peter Passchier (peter-passchier) wrote : AplayDevices.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : BootDmesg.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : Card1.Codecs.codec.0.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : Lspci.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : Lsusb.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : PciMultimedia.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : ProcModules.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : UdevDb.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : UdevLog.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote : WifiSyslog.txt

apport information

Revision history for this message
Peter Passchier (peter-passchier) wrote :

This problem does not occur on linux-image-3.0.0-29-generic-pae 3.0.0-29.46~lucid1
I guess it's still a problem for people on stock Lucid 10.04.4 LTS for over 2 years...
At least I'm happy to have found something that works.

Revision history for this message
Peter Passchier (peter-passchier) wrote :

OK, it does hang when I do this:
lvcreate -s -p r -L 3G -n oscopy /dev/secret/ubuntu >>logfile

But it is fine when Ieave off the redirection to the logfile (which I added for debug info...)
I guess that should still count as a bug even on linux-image-3.0.0-29-generic-pae 3.0.0-29.46~lucid1

Revision history for this message
Peter Passchier (peter-passchier) wrote :

(The logging ends with:
    Setting chunksize to 8 sectors.
    Setting logging type to disk
    Finding volume group "secret"
    Archiving volume group "secret" metadata (seqno 27).
    Creating logical volume oscopy
    Creating volume group backup "/etc/lvm/backup/secret" (seqno 28).
    Found volume group "secret"
    Creating secret-oscopy
    Loading secret-oscopy table (251:4)
    Resuming secret-oscopy (251:4)
    Clearing start of logical volume "oscopy"
    Creating logical volume snapshot0
    Found volume group "secret"
    Found volume group "secret"
    Creating secret-ubuntu-real
    Loading secret-ubuntu-real table (251:5)
    Resuming secret-ubuntu-real (251:5)
    Loading secret-ubuntu table (251:2)
    Creating secret-oscopy-cow
    Loading secret-oscopy-cow table (251:6)
    Resuming secret-oscopy-cow (251:6)
    Loading secret-oscopy table (251:4)
    Suspending secret-ubuntu (251:2) with filesystem sync with device flush
)

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.8 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-rc2-raring/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-da-key
Revision history for this message
Peter Passchier (peter-passchier) wrote :

Sorry, this is a production server, I cannot afford to do such testing while I am in a different location and I have to bother local staff when they have to reboot.

Revision history for this message
Peter Passchier (peter-passchier) wrote :

But I can test it in a clean virtual server at home if that's helpful... Just give me some time.

Revision history for this message
Peter Passchier (peter-passchier) wrote :

kernel-bug-exists-upstream

Changed in linux (Ubuntu):
status: Incomplete → Opinion
status: Opinion → Confirmed
Revision history for this message
Peter Passchier (peter-passchier) wrote :

Tested with:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-rc2-raring/linux-image-3.8.0-030800rc2-generic_3.8.0-030800rc2.201301022235_i386.deb
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-rc2-raring/linux-image-extra-3.8.0-030800rc2-generic_3.8.0-030800rc2.201301022235_i386.deb

This still hangs:
lvcreate -s -n oscopy -v -l 100%FREE /dev/secret/root 2>/root/savefile

At some point there are messages like:
[ 600.234412] INFO: task lvcreate:928 blocked for more than 120 seconds.
[ 600.234836] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Note that this redirect of stdout doesn't hang, but that's probably because there's only 1 line of output at a non-crucial time:
lvcreate -s -n oscopy -v -l 100%FREE /dev/secret/root >/root/savefile

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Confirmed → Triaged
summary: - lvm2 hangs when creating snapshot of live root
+ lvm2 hangs when creating snapshot of live root and IO to the root
+ filesystem is happening
description: updated
Revision history for this message
Peter Passchier (peter-passchier) wrote :

Response from Zdenek Kabelac:
----------------------------------------------------
> 1. One-line summary
> lvm2 hangs when creating snapshot of live root and IO to the root filesystem
> is happening
>
>
> 2. Full description
> Root filesystem ext4 on lvm2 extent in luks-encrypted
> partition with mainline kernel 3.8-rc2.
> This is causing a hang:
> lvcreate -s -n oscopy -v -l 100%FREE /dev/secret/root 2>/root/savefile
>
> See: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1096520

Attach full -vvvv trace.

Hmm bugreport talks about 2.6.35 kernel.

My first suspect would be the 'Debian way' of handling udev rules - handling differes from the upstream solution and must be tracked and fixed on Debian.

>
> 6. Script to replicate the problem
> lvcreate -s -n oscopy -v -l 100%FREE /dev/secret/root 2>/root/savefile

'-vvvv'

Attach also 'changes' to udev rules from the upstream solution.

Probably udevadm monitoring output would be also useful.

Any cookies left unfinished 'dmsetup udevcookies' ?

Zdenek

Revision history for this message
Peter Passchier (peter-passchier) wrote :

I replied this back, I hope it doesn't have negative unintended consequences...
 ----------------------------------------------------
I think this happens everywhere. I just did a Fedora 18-beta install (no cryptsetup, just default lvm2 setup), and this hangs:
lvcreate -s -n oscopy -vvvv -l 100%FREE /dev/fedora/root 2>savefile

I don't know how to do all the things you say in your post, but is there a reference implementation where this can be proven to hang?? You could try it yourself -- I am happy to help, but I have been spending many hours filing this bug report, and I feel it's not getting anywhere. Should I demonstrate this on RHEL? Should I install the mainline kernel on Fedora? Which version??

Revision history for this message
Peter Passchier (peter-passchier) wrote :

This comment from Zdenek Kabelac 'closes' this bug report as far as I'm concerned:

Hmm you are doing snapshot of 'root' filesystem - and there could
be actually some problems with delivering of cookie since udev
might get frozen.

So folks, don't do: lvcreate -s ... &>/file/on/rootfs
But instead do: out=$(lvcreate -s ... 2>&1); echo "$out" >/file/on/rootfs

Revision history for this message
penalvch (penalvch) wrote :

pepa65, this bug report is being closed due to your last comments:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1096520/comments/21
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1096520/comments/32

regarding this being fixed with an update. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

tags: added: bios-outdated-7.16
removed: linux live lvcreate lvm2 root
Changed in linux (Ubuntu):
status: Triaged → Invalid
Revision history for this message
Peter Passchier (peter-passchier) wrote :

I don't think the status should be invalid. A workaround is provided, and there is an interaction with udev that causes it.

Revision history for this message
penalvch (penalvch) wrote :

pepa65, thank you for your comments. Could you please confirm this issue exists with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ . If the issue remains, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

If reproducible, could you also please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc6

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Invalid → Incomplete
Revision history for this message
Peter Passchier (peter-passchier) wrote :

Sorry, the iso won't install. It's stuck at 98% CPU for over an hour at 'cryptsetup', which manually only takes a second... I don't even know if the installer uses the proper setup, and I don't see a way to do it with "Something else".

Revision history for this message
penalvch (penalvch) wrote :

pepa65, thank you for attempting to test Trusty. Would it be possible to test an earlier release via http://releases.ubuntu.com/ ?

Revision history for this message
Peter Passchier (peter-passchier) wrote :

OK, on a non-luks encrypted bog-standard lvm2 install of the latest trusty tahr.
Note: the original bug was already solved. The case where the output of the lv-snapshot creation is directed to a file on the volume being snapshotted is a mixed bag with the latest trusty tahr. The first time it went OK; the second time I used a single verbosity flag, and it hung:
lvcreate -v -s -p r -L 700M -n oscopy /dev/mapper/ubuntu--vg-root 2>/root/log
(Tried both a few times, no -v no hang, at least one -v and it hangs.)

Revision history for this message
Peter Passchier (peter-passchier) wrote :

Additional note: on the console I get messages like:

[ 601.586905] INFO: task lvcreateL1999 blocked for more than 120 seconds.
[ 601.593195] Tainted: G W 3.12.0-7-generic #15-Ubuntu
[ 601.593195] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

(3 times every 2 minutes)

Revision history for this message
penalvch (penalvch) wrote :

pepa65, thank you for your comment. Given that as you noted in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1096520/comments/38 the original bug is solved, in order to eliminate scope creep, and focus on your problem quickly and efficiently, it would be best to close this report, and start fresh with a new bug report via a terminal, with the latest logs focused on the scoped problem:
ubuntu-bug PACKAGE

where PACKAGE is the affected Ubuntu package (may want to start with lvm2).

Thank you for reporting this bug and helping make Ubuntu better. Feel free to report any future bugs you may find.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.