Thanks for the pointer to the synthetic reproducer!
It provided accurate and consistent results considering
the kernel versions reported (not) to exhibit the issue.
The Azure test kernel with the 3 patches [1] to address
that shows the same (good) results as the Azure kernel
prior to the regression being introduced.
P.S.: the issue isn't strictly having that patch in,
as it's included in later kernel versions w/out this
issue (eg, 5.15), but having that patch in while not
having these other patches in as well (as, eg, 5.15).
Test Results from 4x VMs on Azure (2x 4vCPU/16G and 2x 8vCPU/32G)
Test Steps follow below; essentially, run the for-loop with curl
10x times, and count how many times it doesn't finish / is stuck.
(i.e., epoll wait didn't return/finish).
1) original/"good" kernel: 0% error rate
-- 5.4.0-1094-azure #100-Ubuntu SMP Mon Oct 17 03:14:36 UTC 2022
VM1: 0/10
VM2: 0/10
VM3: 0/10
VM4: 0/10
2) regression/"bad" kernel: 60%-80% error rate
-- 5.4.0-1095-azure #101-Ubuntu SMP Thu Oct 20 15:50:47 UTC 2022
VM1: 8/10
VM2: 7/10
VM3: 7/10
VM4: 6/10
3) candidate/"test" kernel: 0% error rate
-- 5.4.0-1098-azure #104-Ubuntu SMP Wed Nov 23 21:19:57 UTC 2022
VM1: 0/10
VM2: 0/10
VM3: 0/10
VM4: 0/10
...
Test Steps/Criteria on Focal:
Install go 1.19:
$ sudo snap install --channel=1.19/stable --classic go
Hi Hans,
Thanks for the pointer to the synthetic reproducer!
It provided accurate and consistent results considering
the kernel versions reported (not) to exhibit the issue.
The Azure test kernel with the 3 patches [1] to address
that shows the same (good) results as the Azure kernel
prior to the regression being introduced.
P.S.: the issue isn't strictly having that patch in,
as it's included in later kernel versions w/out this
issue (eg, 5.15), but having that patch in while not
having these other patches in as well (as, eg, 5.15).
[1] https:/ /lists. ubuntu. com/archives/ kernel- team/2022- November/ 135069. html
...
Test Results from 4x VMs on Azure (2x 4vCPU/16G and 2x 8vCPU/32G)
Test Steps follow below; essentially, run the for-loop with curl
10x times, and count how many times it doesn't finish / is stuck.
(i.e., epoll wait didn't return/finish).
1) original/"good" kernel: 0% error rate
-- 5.4.0-1094-azure #100-Ubuntu SMP Mon Oct 17 03:14:36 UTC 2022
VM1: 0/10
VM2: 0/10
VM3: 0/10
VM4: 0/10
2) regression/"bad" kernel: 60%-80% error rate
-- 5.4.0-1095-azure #101-Ubuntu SMP Thu Oct 20 15:50:47 UTC 2022
VM1: 8/10
VM2: 7/10
VM3: 7/10
VM4: 6/10
3) candidate/"test" kernel: 0% error rate
-- 5.4.0-1098-azure #104-Ubuntu SMP Wed Nov 23 21:19:57 UTC 2022
VM1: 0/10
VM2: 0/10
VM3: 0/10
VM4: 0/10
...
Test Steps/Criteria on Focal:
Install go 1.19:
$ sudo snap install --channel= 1.19/stable --classic go
Create test programs:
$ cat <<EOF >main.go
package main
import (
"github. com/prometheus/ procfs/ sysfs"
"fmt"
"log"
"net/http"
)
func main() {
fs, err := sysfs.NewFS("/sys")
panic( err) ces()
if err != nil {
}
netDevices, err := fs.NetClassDevi
for _, device := range netDevices {
}
})
}
EOF
$ cat <<EOF >go.mod
module app
go 1.19
require (
github. com/prometheus/ procfs v0.8.0 // indirect
golang. org/x/sync v0.0.0- 20220601150217- 0de741cfad7f // indirect
)
EOF
Fetch test deps:
$ go mod download github. com/prometheus/ procfs
$ go get <email address hidden>
Start test program:
go run main.go &
Exercise it: 9100/metrics ; done
for i in {0..10000} ; do curl localhost:
Test Criteria:
PASS = for-loop finishes.
FAIL = for-loop doesn't finish.
# reference: https:/ /github. com/prometheus/ node_exporter/ issues/ 2500#issuecomme nt-1304847221
Stack traces on FAIL / for-loop not finished:
azureuser@ ktest-3: ~$ sudo grep -l epoll /proc/$(pidof go)/task/*/stack | xargs sudo grep -H ^ task/33287/ stack:[ <0>] ep_poll+0x3bb/0x410 task/33287/ stack:[ <0>] do_epoll_ wait+0xb8/ 0xd0 task/33287/ stack:[ <0>] __x64_sys_ epoll_pwait+ 0x4c/0xa0 task/33287/ stack:[ <0>] do_syscall_ 64+0x5e/ 0x200 task/33287/ stack:[ <0>] entry_SYSCALL_ 64_after_ hwframe+ 0x5c/0xc1
/proc/33267/
/proc/33267/
/proc/33267/
/proc/33267/
/proc/33267/
azureuser@ ktest-3: ~$ sudo grep -l epoll /proc/$(pidof go)/task/*/stack | xargs sudo grep -H ^ task/1193/ stack:[ <0>] ep_poll+0x3bb/0x410 task/1193/ stack:[ <0>] do_epoll_ wait+0xb8/ 0xd0 task/1193/ stack:[ <0>] __x64_sys_ epoll_pwait+ 0x4c/0xa0 task/1193/ stack:[ <0>] do_syscall_ 64+0x5e/ 0x200 task/1193/ stack:[ <0>] entry_SYSCALL_ 64_after_ hwframe+ 0x5c/0xc1
/proc/1193/
/proc/1193/
/proc/1193/
/proc/1193/
/proc/1193/
azureuser@ ktest-3: ~$ sudo grep -l epoll /proc/$(pidof go)/task/*/stack | xargs sudo grep -H ^ task/1193/ stack:[ <0>] ep_poll+0x3bb/0x410 task/1193/ stack:[ <0>] do_epoll_ wait+0xb8/ 0xd0 task/1193/ stack:[ <0>] __x64_sys_ epoll_pwait+ 0x4c/0xa0 task/1193/ stack:[ <0>] do_syscall_ 64+0x5e/ 0x200 task/1193/ stack:[ <0>] entry_SYSCALL_ 64_after_ hwframe+ 0x5c/0xc1
/proc/1173/
/proc/1173/
/proc/1173/
/proc/1173/
/proc/1173/