First, I made sure I could reproduce the problem on 4.15.0-115-generic.
I made a fresh Bionic VM, and copied over the ksm_refcnt_overflow.sh and zero_page_refcound.c files.
I built the kernel module, and inserted it into the kernel.
From there, I checked the zero_page reference counter.
$ sudo insmod zero_page_refcount.ko
[sudo] password for ubuntu:
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1
From there, in another terminal, I ran the script ksm_refcnt_overflow.sh, and
checked to see VMs were running:
$ virsh list
Id Name State
----------------------------------------------------
1 instance-0 running
2 instance-1 running
3 instance-2 running
4 instance-3 running
5 instance-4 running
From there, we can see the reference counter increment:
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1158 or 4440
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1622 or 5666
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x163a or 5690
I issued the set command, to get it ready to overflow:
$ cat /proc/zero_page_refcount_set
Zero Page Refcount set to 0x1FFFFFFFFF000
I then checked and saw it overflow:
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff27 or 2147483431
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff92 or 2147483538
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x80000000 or -2147483648
Instances became paused, and virtualisation broken:
$ virsh list
Id Name State
----------------------------------------------------
5 instance-4 paused
6 instance-5 paused
7 instance-6 paused
8 instance-7 paused
9 instance-0 paused
10 instance-1 paused
11 instance-2 paused
12 instance-3 paused
I rebooted, and enabled -proposed. I then installed the 4.15.0-116-generic kernel, and rebooted again.
I rebuilt the zero_page_refcount kernel module with the new headers, and inserted it into the running kernel.
$ uname -rv
4.15.0-116-generic #117-Ubuntu SMP Fri Aug 28 16:04:22 UTC 2020
$ sudo insmod zero_page_refcount.ko
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1
From there, I started the script ksm_refcnt_overflow.sh in another terminal.
We can see that VMs are running:
$ virsh list
Id Name State
----------------------------------------------------
1 instance-1 running
2 instance-2 running
3 instance-3 running
4 instance-4 running
Checking the value of the zero_page reference counter:
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1
We are still at 1. Now attempting to trigger overflow:
$ cat /proc/zero_page_refcount_set
Zero Page Refcount set to 0x1FFFFFFFFF000
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff00 or 2147483392
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff00 or 2147483392
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff00 or 2147483392
The reference counter is never incremented, and will not overflow.
The problem is solved, and I am happy to mark this bug as verified for bionic.
Verification steps for Bionic:
First, I made sure I could reproduce the problem on 4.15.0-115-generic.
I made a fresh Bionic VM, and copied over the ksm_refcnt_ overflow. sh and zero_page_ refcound. c files.
I built the kernel module, and inserted it into the kernel.
From there, I checked the zero_page reference counter.
$ sudo insmod zero_page_ refcount. ko ubuntu: ~/module$ cat /proc/zero_ page_refcount
[sudo] password for ubuntu:
ubuntu@
Zero Page Refcount: 0x1 or 1
From there, in another terminal, I ran the script ksm_refcnt_ overflow. sh, and
checked to see VMs were running:
$ virsh list ------- ------- ------- ------- ------- ------- ---
Id Name State
-------
1 instance-0 running
2 instance-1 running
3 instance-2 running
4 instance-3 running
5 instance-4 running
From there, we can see the reference counter increment:
$ cat /proc/zero_ page_refcount ubuntu: ~/module$ cat /proc/zero_ page_refcount ubuntu: ~/module$ cat /proc/zero_ page_refcount
Zero Page Refcount: 0x1158 or 4440
ubuntu@
Zero Page Refcount: 0x1622 or 5666
ubuntu@
Zero Page Refcount: 0x163a or 5690
I issued the set command, to get it ready to overflow:
$ cat /proc/zero_ page_refcount_ set
Zero Page Refcount set to 0x1FFFFFFFFF000
I then checked and saw it overflow:
ubuntu@ ubuntu: ~/module$ cat /proc/zero_ page_refcount ubuntu: ~/module$ cat /proc/zero_ page_refcount ubuntu: ~/module$ cat /proc/zero_ page_refcount
Zero Page Refcount: 0x7fffff27 or 2147483431
ubuntu@
Zero Page Refcount: 0x7fffff92 or 2147483538
ubuntu@
Zero Page Refcount: 0x80000000 or -2147483648
Instances became paused, and virtualisation broken:
$ virsh list ------- ------- ------- ------- ------- ------- ---
Id Name State
-------
5 instance-4 paused
6 instance-5 paused
7 instance-6 paused
8 instance-7 paused
9 instance-0 paused
10 instance-1 paused
11 instance-2 paused
12 instance-3 paused
From there, we see the usual call trace in dmesg:
https:/ /paste. ubuntu. com/p/wpJkGCH3f J/
I rebooted, and enabled -proposed. I then installed the 4.15.0-116-generic kernel, and rebooted again.
I rebuilt the zero_page_refcount kernel module with the new headers, and inserted it into the running kernel.
$ uname -rv refcount. ko ubuntu: ~/module$ cat /proc/zero_ page_refcount
4.15.0-116-generic #117-Ubuntu SMP Fri Aug 28 16:04:22 UTC 2020
$ sudo insmod zero_page_
ubuntu@
Zero Page Refcount: 0x1 or 1
From there, I started the script ksm_refcnt_ overflow. sh in another terminal.
We can see that VMs are running:
$ virsh list ------- ------- ------- ------- ------- ------- ---
Id Name State
-------
1 instance-1 running
2 instance-2 running
3 instance-3 running
4 instance-4 running
Checking the value of the zero_page reference counter:
$ cat /proc/zero_ page_refcount
Zero Page Refcount: 0x1 or 1
We are still at 1. Now attempting to trigger overflow:
$ cat /proc/zero_ page_refcount_ set
Zero Page Refcount set to 0x1FFFFFFFFF000
$ cat /proc/zero_ page_refcount ubuntu: ~/module$ cat /proc/zero_ page_refcount ubuntu: ~/module$ cat /proc/zero_ page_refcount
Zero Page Refcount: 0x7fffff00 or 2147483392
ubuntu@
Zero Page Refcount: 0x7fffff00 or 2147483392
ubuntu@
Zero Page Refcount: 0x7fffff00 or 2147483392
The reference counter is never incremented, and will not overflow.
The problem is solved, and I am happy to mark this bug as verified for bionic.