Comment 10 for bug 1873074

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Problem Demonstration / Instrumentation:
---------------------------------------

This kprobe kernel module ("kmod-kprobe-fput.c") inserts a
probe on __fput(), and prints the i_readcount value before
before decrementing it, when a specified filename is found.
(usage/steps on header comments.)

    $ sudo insmod kmod-kprobe-fput.ko
    [ 315.625113] kmod_kprobe_fput: kprobe registered (filename: test, multiple: 0)

The i_readcount value is only incremeted on reads, not writes:

    $ touch test
    [ 308.193058] file: test, fs type: ext4, inode readcount: 0

    $ > test
    [ 310.293847] file: test, fs type: ext4, inode readcount: 0

    $ cat test
    [ 312.667149] file: test, fs type: ext4, inode readcount: 1

    $ cat test
    [ 317.312413] file: test, fs type: ext4, inode readcount: 1

    $ cat test
    [ 319.223841] file: test, fs type: ext4, inode readcount: 1

It is decremented only when the file is closed:

    $ tail -f test &
    $ tail -f test &
    $ tail -f test &

    $ cat test
    [ 365.042632] file: test, fs type: ext4, inode readcount: 4

    $ kill %%
    [ 372.241224] file: test, fs type: ext4, inode readcount: 3

    $ kill %%
    [ 376.151455] file: test, fs type: ext4, inode readcount: 2

    $ kill %%
    [ 378.802151] file: test, fs type: ext4, inode readcount: 1

With aufs, there are 2 files/inodes, one in the virtual/aufs
filesystem, another in the (underlying) real/ext4 filesystem.
Then aufs handles/redirects the open/read/write calls to it.

    $ mkdir dir mnt
    $ touch dir/test
    $ sudo mount -t aufs -o br=dir none mnt

    $ ls mnt
    test

The problem is observable upfront: i_readcount for the real
inode/filesystem is extra incremented on the read-only open.

    $ cat mnt/test
    [ 453.819165] file: test, fs type: aufs, inode readcount: 1
    [ 453.819226] file: test, fs type: ext4, inode readcount: 2

    $ cat mnt/test
    [ 458.091550] file: test, fs type: aufs, inode readcount: 1
    [ 458.091599] file: test, fs type: ext4, inode readcount: 3

    $ cat mnt/test
    [ 463.165711] file: test, fs type: aufs, inode readcount: 1
    [ 463.165759] file: test, fs type: ext4, inode readcount: 4

Compare that with the non-aufs/ext4-only output above for
multiple cats ;-) - the inode's i_readcount on ext4 grows.

...

That kprobe was enabled during the 'Exploit / Local' run.

The logs show the i_readcount value incrementing until it
overflowed, when the BUG_ON()/panic happened, and crashed.

(The 'multiple' parameter only prints when i_readcount is
a multiple of its value, in unsigned type.)

    $ sudo insmod kmod-kprobe-fput.ko multiple=100000
    [ 1684.953480] kmod_kprobe_fput: kprobe registered (filename: test, multiple: 100000)

    $ cd mnt && /tmp/exploit
    [ 1799.795277] file: test, fs type: ext4, inode readcount: 100000
    [ 1800.420418] file: test, fs type: ext4, inode readcount: 200000
    [ 1801.030687] file: test, fs type: ext4, inode readcount: 300000
    ...
    [ 2428.610831] file: test, fs type: ext4, inode readcount: 100000000
    ...
    [ 7909.385033] file: test, fs type: ext4, inode readcount: 1000000000
    ...
    [14191.533372] file: test, fs type: ext4, inode readcount: 2000000000
    ...
    [15156.688678] file: test, fs type: ext4, inode readcount: 2147400000
    [15157.432852] file: test, fs type: ext4, inode readcount: -2147451616
    ...
    [16123.045186] file: test, fs type: ext4, inode readcount: -2000051616
    ...
    [22655.214420] file: test, fs type: ext4, inode readcount: -1000051616
    ...
    [28517.303066] file: test, fs type: ext4, inode readcount: -100051616
    ...
    [29161.058111] file: test, fs type: ext4, inode readcount: -1051616
    [29161.702771] file: test, fs type: ext4, inode readcount: -951616
    [29162.337571] file: test, fs type: ext4, inode readcount: -851616
    [29162.980385] file: test, fs type: ext4, inode readcount: -751616
    [29163.614763] file: test, fs type: ext4, inode readcount: -651616
    [29164.253970] file: test, fs type: ext4, inode readcount: -551616
    [29164.890793] file: test, fs type: ext4, inode readcount: -451616
    [29165.566457] file: test, fs type: ext4, inode readcount: -351616
    [29166.224213] file: test, fs type: ext4, inode readcount: -251616
    [29166.879175] file: test, fs type: ext4, inode readcount: -151616
    [29167.528966] file: test, fs type: ext4, inode readcount: -51616
    [29167.862871] file: test, fs type: ext4, inode readcount: 0
    [29167.864633] ------------[ cut here ]------------
    [29167.866016] kernel BUG at include/linux/fs.h:2963!
    [29167.867423] invalid opcode: 0000 [#1] SMP PTI
    [29167.868584] CPU: 0 PID: 5314 Comm: exploit Tainted: G OE 5.4.0-21-generic #25-Ubuntu
    [29167.870751] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    [29167.873202] RIP: 0010:__fput+0x25d/0x260
    ...
    [29167.901583] Call Trace:
    [29167.902387] ____fput+0xe/0x10
    [29167.903344] task_work_run+0x8f/0xb0
    [29167.904420] exit_to_usermode_loop+0x131/0x160
    [29167.905749] do_syscall_64+0x163/0x190
    [29167.906929] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    ...
    [29167.967808] Kernel panic - not syncing: Fatal exception

Example of Web Server / nginx with Kubernetes
--

The same i_readcount() increments are obtained to the
'index.html' page served by nginx, as that is stored
in the container image, thus accessed via aufs.

(After deploying Kubernetes/Docker with aufs storage driver)

Start nginx pod/container:

    $ kubectl run web-server --image=nginx

Get its IP address:

    $ kubectl get pods -o wide
    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    web-server 1/1 Running 0 48s 10.10.0.4 sf244755-focal <none> <none>

Test it:

    $ curl -s 10.10.0.4 | grep title
    <title>Welcome to nginx!</title>

    $ sudo insmod kmod-kprobe-fput.ko filename=index.html
    [ 3735.601633] kmod_kprobe_fput: kprobe registered (filename: index.html, multiple: 0)

    $ curl -s 10.10.0.4 >/dev/null
    [ 3757.368671] file: index.html, fs type: aufs, inode readcount: 1
    [ 3757.381055] file: index.html, fs type: ext4, inode readcount: 7

    $ curl -s 10.10.0.4 >/dev/null
    [ 3767.402218] file: index.html, fs type: aufs, inode readcount: 1
    [ 3767.407846] file: index.html, fs type: ext4, inode readcount: 8

    $ curl -s 10.10.0.4 >/dev/null
    [ 3771.856605] file: index.html, fs type: aufs, inode readcount: 1
    [ 3771.866484] file: index.html, fs type: ext4, inode readcount: 9

And the web server can be exposed/made available externally,
for example:

    $ kubectl expose pod web-server --port 80 --type NodePort

    $ kubectl get services web-server
    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    web-server NodePort 10.100.0.69 <none> 80:32089/TCP 6s

    another-host$ curl -s 192.168.122.151:32089 | grep title
    <title>Welcome to nginx!</title>

    [ 4037.893050] file: index.html, fs type: aufs, inode readcount: 1
    [ 4037.909541] file: index.html, fs type: ext4, inode readcount: 10