NFS export of LVM snapshot exports origin instead of snapshot

Bug #1071733 reported by p
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned

Bug Description

we're using LVM snapshots on our storage server to provide data for our live and staging instances.
there are two mountpoints on the server, one for the origin volume, and one for the snapshot that are exported using NFS to our live/staging instances (webserver)

today i noticed, that writes on our staging-webserver don't go to the snapshot as expected, but are hitting the origin volume, messing up all of our live data.

i checked the NFS-mounts and -exports and they look perfectly valid (staging mounts snapshot-mountpoint, live mounts origin-mountpoint)
on the storage-server itself looking at the mountpoints, everything looks valid, too. i wrote some identifying test-files in each mountpoint in order to track them. no problem as long, as we look on the NFS-server itself.

but looking in the mountpoints of a NFS-client, it shows that the origin data was mounted.
i found no way to mount the snapshot again.

i'm not entirely sure, what component causes the problem. since the mountpoints look valid on the server itself, i would rule out LVM. i also would rule out the NFS-client, since it should not be able to see what the server hasn't exported. all clients can see the problem (lucid, maverick, precise).

i've checked for recent software-upgrades on the NFS-server and found:

Thu, Oct 25 2012 11:07:16 +0000

[INSTALL] linux-image-3.2.0-32-generic:amd64
[UPGRADE] nfs-common:amd64 1:1.2.5-3ubuntu3 -> 1:1.2.5-3ubuntu3.1
[UPGRADE] nfs-kernel-server:amd64 1:1.2.5-3ubuntu3 -> 1:1.2.5-3ubuntu3.1

Fri, Oct 26 2012 09:29:34 +0000

[UPGRADE] linux-generic:amd64 3.2.0.31.34 -> 3.2.0.32.35
[UPGRADE] linux-image-generic:amd64 3.2.0.31.34 -> 3.2.0.32.35

since it was working as expected before thursday (2012-10-25), i downgraded these packages (and rebooted the older kernel)
this didn't solve the problem.

i can reproduce this behavior using different filesystems on the NFS-server.

i'll attach verision info in my next post. please comment on what info you might need to figure this out.
---
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Oct 26 10:56 seq
 crw-rw---T 1 root audio 116, 33 Oct 26 10:56 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu14
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=UUID=2c4f34e2-4803-4753-98ae-ac398223c2e2
InstallationMedia:

IwConfig: Error: [Errno 2] No such file or directory
MachineType: System manufacturer System Product Name
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_US:en
 TERM=xterm
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-31-generic root=/dev/mapper/vg0-system ro nomodeset
ProcVersionSignature: Ubuntu 3.2.0-31.50-generic 3.2.28
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-31-generic N/A
 linux-backports-modules-3.2.0-31-generic N/A
 linux-firmware 1.79.1
RfKill: Error: [Errno 2] No such file or directory
StagingDrivers: mei
Tags: precise staging
Uname: Linux 3.2.0-31-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 07/16/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2106
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: P8B WS
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2106:bd07/16/2012:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnP8BWS:rvrRev1.xx:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
p (p1) wrote :

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.1 LTS
Release: 12.04
Codename: precise

this issue occurs both using nfs3/nfs4
i just noticed, our second storage-server running lucid just has the same problem.

any ideas how to catch that bug?

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1071733/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
p (p1) wrote :

same symptoms: https://bugzilla.kernel.org/show_bug.cgi?id=13011 (fsid advice not working, reordering without effect)
this one is mentioning a fix in kernel (2.6.29), which should be in lucid by now indicating this is a kernel issue.
however, one lucid-server is up since 213 days and shows these symptoms recently. as of two weeks ago, it was fine.
(running 2.6.38-13-server from backports)

i can reproduce this on different servers, even mounting localhost.

my /etc/exports looks like this:
/data/live_nfs 10.1.*.*(rw,sync,no_subtree_check)
/data-snapshots 10.1.*.*(rw,sync,no_subtree_check,crossmnt)

tags: added: precise
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1071733

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.7 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. Please only remove that one tag and leave the other tags. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-rc2-raring/

tags: added: kernel-da-key
tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

One additional question. Booting into the prior kernel, before the upgrade, does not make the issue go away?

p (p1)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
p (p1) wrote :

@Brad Figg: did that already, but it said nothing to add. probably b/c no package is referenced

@Joseph Salisbury: i'll try that, but probably not until next week. currently we need more hardware to live without snapshots.
alerady tried booting old 3.2.0.31.34, but without success. please note that this issue occurs on different servers using different (old) kernels.

Revision history for this message
p (p1) wrote : AcpiTables.txt

apport information

tags: added: apport-collected staging
description: updated
Revision history for this message
p (p1) wrote : BootDmesg.txt

apport information

Revision history for this message
p (p1) wrote : CurrentDmesg.txt

apport information

Revision history for this message
p (p1) wrote : Lspci.txt

apport information

Revision history for this message
p (p1) wrote : Lsusb.txt

apport information

Revision history for this message
p (p1) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
p (p1) wrote : ProcInterrupts.txt

apport information

Revision history for this message
p (p1) wrote : ProcModules.txt

apport information

Revision history for this message
p (p1) wrote : UdevDb.txt

apport information

Revision history for this message
p (p1) wrote : UdevLog.txt

apport information

Revision history for this message
p (p1) wrote : WifiSyslog.txt

apport information

Revision history for this message
p (p1) wrote :

did another run of apport-collect, which now was able to collect something, probably because Brian Murray suggests, this is a kernel issue.

@Joseph Salisbury: bug present in mainline kernel.
btw: somehow now the snapshot is exported multiple times and the origin isn't.

tags: added: kernel-bug-exists-upstream
removed: needs-upstream-testing
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
J. Bruce Fields (bfields-fieldses) wrote :
Download full text (65.0 KiB)

On Thu, Nov 08, 2012 at 05:13:37PM +0100, pille wrote:
> hi,
>
> [1.] One line summary of the problem:
> NFS export of LVM snapshot exports origin instead of snapshot
>
>
> [2.] Full description of the problem/report:
> from https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1071733
>
> we're using LVM snapshots on our storage server to provide data for our
> live and staging instances.
> there are two mountpoints on the server, one for the origin volume, and
> one for the snapshot that are exported using NFS to our live/staging
> instances (webserver)

The protocol references files on the filesystem using filehandles. The
linux NFS server generates a filehandle that has a part which identifies
the filesystem and a part which identifies the particular file (usually
an inode and generation number).

The filesystem part in your case is probably a uuid which is part of
what's copied when you snapshot (as are all the inode numbers). So
you're left with filehandles that are the same for the two filesystems.

You can tell it to instead to use whatever integer you'd like using the
"fsid=" export option (just don't used fsid=0, which has a special
meaning).

Or, probably better--you should be able to be able to modify the uuid...
Looking at the tune2fs man page, I think it should be:

 tune2fs -U random /dev/vg0/nfssnapshot

--b.

>
> today i noticed, that writes on our staging-webserver don't go to the
> snapshot as expected, but are hitting the origin volume, messing up all
> of our live data.
>
> i checked the NFS-mounts and -exports and they look perfectly valid
> (staging mounts snapshot-mountpoint, live mounts origin-mountpoint)
> on the storage-server itself looking at the mountpoints, everything
> looks valid, too. i wrote some identifying test-files in each mountpoint
> in order to track them. no problem as long, as we look on the NFS-server
> itself.
>
> but looking in the mountpoints of a NFS-client, it shows that the origin
> data was mounted.
> i found no way to mount the snapshot again.
>
> i'm not entirely sure, what component causes the problem. since the
> mountpoints look valid on the server itself, i would rule out LVM. i
> also would rule out the NFS-client, since it should not be able to see
> what the server hasn't exported. all clients can see the problem (lucid,
> maverick, precise).
>
>
> [3.] Keywords (i.e., modules, networking, kernel):
>
>
> [4.] Kernel version (from /proc/version):
> Linux version 3.7.0-030700rc4-generic (apw@gomeisa) (gcc version 4.6.3
> (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201211041435 SMP Sun Nov 4 19:35:50
> UTC 2012
>
>
> [5.] Output of Oops.. message (if applicable) with symbolic information
> resolved (see Documentation/oops-tracing.txt)
>
>
> [6.] A small shell script or example program which triggers the problem
> (if possible)
> # prepare
> if [ ! -e /dev/vg0/nfsorigin ]; then
> lvcreate --size=10M --name=nfsorigin vg0
> mkfs.ext3 /dev/vg0/nfsorigin
> mount /dev/vg0/nfsorigin /mnt/nfsorigin
> touch /mnt/nfsorigin/NFSORIGIN
> umount /mnt/nfsorigin
> fi
>
> if [ ! -e /dev/vg0/nfssnapshot ]; then
> lvcreate --snapshot --size=1M --name=nfssnapshot vg0/nfsori...

Dimitrenko (paviliong6)
Changed in linux (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.