tar -x sometimes fails on overlayfs

Bug #1728489 reported by Daniel Axtens
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Daniel Axtens
Xenial
Fix Released
Medium
Unassigned
Zesty
Fix Released
Medium
Unassigned

Bug Description

[SRU Justification]

[Impact]
A user is seeing failures from extracting tar archives on overlay filesystems on the 4.4 kernel in constrained environments. The error presents as:

`tar: ./deps/0/bin: Directory renamed before its status could be extracted`

Following this thread (http://www.spinics.net/lists/linux-unionfs/msg00856.html), it appears that this occurs when entries in the kernel's inode cache are reclaimed, and subsequent lookups return new inode numbers.

Further testing showed that when setting `/proc/sys/vm/vfs_cache_pressure` to 0 (don't allow the kernel to reclaim inode cache entries due to memory pressure) the error does not recur, supporting the hypothesis that cache entries are being evicted. However, this setting may lead to a kernel OOM so is not a reasonable workaround even temporarily.

The error cannot be reproduced on a 4.13 kernel, due to the series at https://www.spinics.net/lists/linux-fsdevel/msg110235.html. The particular relevant commit is b7a807dc2010334e62e0afd89d6f7a8913eb14ff, which needs a couple of dependencies.

[Fix]
For Zesty, backport the entire series.
For Xenial, where a full backport is not feasible, backport the key commit and the short list of dependencies.

[Testcase]

# Testing this bug

The testcase for this particular bug is simple - create an overlay filesystem with all layers on the same underlying file system, and then see if the inode of a directory is constant across dropping the caches:

mkdir -p /upper/upper /upper/work /lower
mount -t overlay none /mnt -o lowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work
cd /mnt
mkdir a
stat a # observe inode number
echo 2 > /proc/sys/vm/drop_caches
stat a # compare inode number

If the inode number is the same, the fix is successful.

# Regression testing

I have run the unionmount test suite from http://git.infradead.org/users/dhowells/unionmount-testsuite.git in overlay mode (./run --ov), and verified that it still passes.

(The series cover letter mentions a fork of the test suite at https://github.com/amir73il/unionmount-testsuite/commits/overlayfs-devel. I have *not* attempted to get this running: it assumes a range of changes that are not present in our kernels.)

[Regression Potential]
As this changes overlayfs, there is potential for regression in the form of unexpected breakages to overlaysfs behaviour.

I think this is adequately addressed by the regression testing.

One option to reduce the regression potential on Zesty is to reduce the set of patches applied - rather than including the whole series we could include just the patches to solve this bug, which are much easier to inspect for correctness.

Stefan Bader (smb)
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
status: New → Fix Committed
Changed in linux (Ubuntu Zesty):
importance: Undecided → Medium
status: New → Fix Committed
Revision history for this message
Khaled El Mously (kmously) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-zesty
Revision history for this message
Khaled El Mously (kmously) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Gavin Guo (mimi0213kimo)
tags: added: verification-done-xenial verification-done-zesty
removed: verification-needed-xenial verification-needed-zesty
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.10.0-42.46

---------------
linux (4.10.0-42.46) zesty; urgency=low

  * linux: 4.10.0-42.46 -proposed tracker (LP: #1736152)

  * CVE-2017-1000405
    - mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()

  * CVE-2017-16939
    - ipsec: Fix aborted xfrm policy dump crash

linux (4.10.0-41.45) zesty; urgency=low

  * linux: 4.10.0-41.45 -proposed tracker (LP: #1733524)

  * tar -x sometimes fails on overlayfs (LP: #1728489)
    - ovl: check if all layers are on the same fs
    - ovl: persistent inode number for directories

  * CVE-2017-12146
    - driver core: platform: fix race condition with driver_override

  * NVMe timeout is too short (LP: #1729119)
    - nvme: update timeout module parameter type

  * Set PANIC_TIMEOUT=10 on Power Systems (LP: #1730660)
    - [Config]: Set PANIC_TIMEOUT=10 on ppc64el

  * Cannot pair BLE remote devices when using combo BT SoC (LP: #1731467)
    - Bluetooth: increase timeout for le auto connections

  * Plantronics P610 does not support sample rate reading (LP: #1719853)
    - ALSA: usb-audio: Add sample rate quirk for Plantronics P610

  * Invalid btree pointer causes the kernel NULL pointer dereference
    (LP: #1729256)
    - xfs: reinit btree pointer on attr tree inactivation walk

  * Samba mount/umount in docker container triggers kernel Oops (LP: #1729637)
    - ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
    - ipv6: fix NULL dereference in ip6_route_dev_notify()

  * Device hotplugging with MPT SAS cannot work for VMWare ESXi (LP: #1730852)
    - scsi: mptsas: Fixup device hotplug for VMWare ESXi

  * Boot/Installation crash of Ubuntu-16.04.3 HWE kernel on R940 (LP: #1719697)
    - Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"

 -- Stefan Bader <email address hidden> Mon, 04 Dec 2017 15:04:01 +0100

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (9.5 KiB)

This bug was fixed in the package linux - 4.4.0-103.126

---------------
linux (4.4.0-103.126) xenial; urgency=low

  * linux: 4.4.0-103.126 -proposed tracker (LP: #1736181)

  * CVE-2017-1000405
    - mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()

  * CVE-2017-16939
    - netlink: add a start callback for starting a netlink dump
    - ipsec: Fix aborted xfrm policy dump crash

linux (4.4.0-102.125) xenial; urgency=low

  * linux: 4.4.0-102.125 -proposed tracker (LP: #1733541)

  * tar -x sometimes fails on overlayfs (LP: #1728489)
    - ovl: check if all layers are on the same fs
    - ovl: persistent inode number for directories

  * NVMe timeout is too short (LP: #1729119)
    - nvme: update timeout module parameter type

  * Set PANIC_TIMEOUT=10 on Power Systems (LP: #1730660)
    - [Config]: Set PANIC_TIMEOUT=10 on ppc64el

  * Cannot pair BLE remote devices when using combo BT SoC (LP: #1731467)
    - Bluetooth: increase timeout for le auto connections

  * CIFS errors on 4.4.0-98, but not on 4.4.0-97 with same config (LP: #1729337)
    - SMB3: Validate negotiate request must always be signed

  * Plantronics P610 does not support sample rate reading (LP: #1719853)
    - ALSA: usb-audio: Add sample rate quirk for Plantronics P610

  * Invalid btree pointer causes the kernel NULL pointer dereference
    (LP: #1729256)
    - xfs: reinit btree pointer on attr tree inactivation walk

  * Samba mount/umount in docker container triggers kernel Oops (LP: #1729637)
    - ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
    - ipv6: fix NULL dereference in ip6_route_dev_notify()

  * [kernel] tty/hvc: Use opal irqchip interface if available (LP: #1728098)
    - tty/hvc: Use opal irqchip interface if available

  * Device hotplugging with MPT SAS cannot work for VMWare ESXi (LP: #1730852)
    - scsi: mptsas: Fixup device hotplug for VMWare ESXi

  * NMI watchdog: BUG: soft lockup on Guest upon boot (KVM) (LP: #1727331)
    - KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread

  * Attempt to map rbd image from ceph jewel/luminous hangs (LP: #1728739)
    - crush: ensure bucket id is valid before indexing buckets array
    - crush: ensure take bucket value is valid
    - crush: add chooseleaf_stable tunable
    - crush: decode and initialize chooseleaf_stable
    - libceph: advertise support for TUNABLES5
    - libceph: MOSDOpReply v7 encoding

  * Xenial update to 4.4.98 stable release (LP: #1732698)
    - adv7604: Initialize drive strength to default when using DT
    - video: fbdev: pmag-ba-fb: Remove bad `__init' annotation
    - PCI: mvebu: Handle changes to the bridge windows while enabled
    - xen/netback: set default upper limit of tx/rx queues to 8
    - drm: drm_minor_register(): Clean up debugfs on failure
    - KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counter
    - iommu/arm-smmu-v3: Clear prior settings when updating STEs
    - powerpc/corenet: explicitly disable the SDHC controller on kmcoge4
    - ARM: omap2plus_defconfig: Fix probe errors on UARTs 5 and 6
    - crypto: vmx - disable preemption to enable vsx in aes_ctr.c
    - iio: trigger: free trigger...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Daniel Axtens (daxtens)
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.