snap install seems to be polling at 3000 nanoseconds in a pselect call
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
snapd |
Triaged
|
Medium
|
Unassigned |
Bug Description
I observed snap install was eating a lot of CPU across threads, attaching strace -f in a snap install I observed *lots* of polling pselect calls:
[pid 27055] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
[pid 27055] <... pselect6 resumed> ) = 0 (Timeout)
[pid 27055] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
[pid 27055] <... pselect6 resumed> ) = 0 (Timeout)
[pid 27049] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
[pid 27049] <... pselect6 resumed> ) = 0 (Timeout)
[pid 27049] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
[pid 27049] <... pselect6 resumed> ) = 0 (Timeout)
Honestly, not much can happen in 3000 nanoseconds, light travels less that 900 meters in the kind of duration. Polling on pselect is generating a lot of work for the scheduler. Can this pselect timeout be increased to something sensible, like 10th of a second so we don't eat up so much CPU. Is this some kind of broken spin lock?
Changed in snapd: | |
importance: | Undecided → High |
Changed in snapd: | |
status: | New → Triaged |
Thanks for your bugreport.
The 300 nanoseconds apparently comes from a usleep(3) in GO runtime (runtime/ proc.go: runqgrab to be precise). I have not looked into the details why this is happening yet, the internet (https:/ /stackoverflow. com/questions/ 35155119/ how-to- optimize- golang- program- that-spends- most-time- in-runtime- osyield- and-runt) indicates that we might trigger too much GC for some reason. But it might also be a red-herring. The next step here is probably to run pprof to see what triggers the usleep(3).