snapd

snap install seems to be polling at 3000 nanoseconds in a pselect call

Bug #1792959 reported by Colin Ian King on 2018-09-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	snapd	Triaged	Medium	Unassigned

Bug Description

I observed snap install was eating a lot of CPU across threads, attaching strace -f in a snap install I observed *lots* of polling pselect calls:

[pid 27055] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
[pid 27055] <... pselect6 resumed> ) = 0 (Timeout)
[pid 27055] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
[pid 27055] <... pselect6 resumed> ) = 0 (Timeout)
[pid 27049] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
[pid 27049] <... pselect6 resumed> ) = 0 (Timeout)
[pid 27049] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
[pid 27049] <... pselect6 resumed> ) = 0 (Timeout)

Honestly, not much can happen in 3000 nanoseconds, light travels less that 900 meters in the kind of duration. Polling on pselect is generating a lot of work for the scheduler. Can this pselect timeout be increased to something sensible, like 10th of a second so we don't eat up so much CPU. Is this some kind of broken spin lock?

Michael Vogt (mvo) on 2018-11-15

Changed in snapd:
importance:	Undecided → High

Revision history for this message

Michael Vogt (mvo) wrote on 2018-11-15:

Thanks for your bugreport.

The 300 nanoseconds apparently comes from a usleep(3) in GO runtime (runtime/proc.go:runqgrab to be precise). I have not looked into the details why this is happening yet, the internet (https://stackoverflow.com/questions/35155119/how-to-optimize-golang-program-that-spends-most-time-in-runtime-osyield-and-runt) indicates that we might trigger too much GC for some reason. But it might also be a red-herring. The next step here is probably to run pprof to see what triggers the usleep(3).

Paweł Stołowski (stolowski) on 2018-11-23

Changed in snapd:
status:	New → Triaged

Revision history for this message

Michael Vogt (mvo) wrote on 2020-01-07:

We fixed on over agressive progress bar which resulted in much less of these calls AFAICT. Moving to medium for this reason.

Changed in snapd:
importance:	High → Medium

Revision history for this message

Robert Collins (lifeless) wrote on 2021-08-19:

Download full text (10.6 KiB)

Seeing the same behaviour currently.
pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=20000}, NULL) = 0 (Timeout)

robertc@robertc-xps:~$ sudo snap debug stacktraces
goroutine 788 [running]:
github.com/snapcore/snapd/daemon.getStacktraces(0xc42042a1d0, 0xb)
/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/api_debug_stacktrace.go:29 +0x76
github.com/snapcore/snapd/daemon.postDebug(0x562917885b60, 0xc4206e4c00, 0x0, 0x0, 0x0)
/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/api_debug.go:344 +0x37a
github.com/snapcore/snapd/daemon.(*Command).ServeHTTP(0x562917885b60, 0x56291742f140, 0xc4205ee420, 0xc4206e4c00)
/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/daemon.go:167 +0x3fd
github.com/snapcore/snapd/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0xc42014e000, 0x56291742f140, 0xc4205ee420, 0xc4206e4900)
/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/vendor/github.com/gorilla/mux/mux.go:212 +0xcf
github.com/snapcore/snapd/daemon.logit.func1(0x56291742f980, 0xc4201536c0, 0xc4206e4900)
/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/daemon.go:214 +0xdf
net/http.HandlerFunc.ServeHTTP(0xc420594580, 0x56291742f980, 0xc4201536c0, 0xc4206e4900)
/usr/lib/go-1.10/src/net/http/server.go:1947 +0x46
net/http.serverHandler.ServeHTTP(0xc420435930, 0x56291742f980, 0xc4201536c0, 0xc4206e4900)
/usr/lib/go-1.10/src/net/http/server.go:2697 +0xbe
net/http.(*conn).serve(0xc4209ac320, 0x562917430880, 0xc4204046c0)
/usr/lib/go-1.10/src/net/http/server.go:1830 +0x653
created by net/http.(*Server).Serve
/usr/lib/go-1.10/src/net/http/server.go:2798 +0x27d

goroutine 1 [select, 317 minutes]:
main.run(0xc4202a96e0, 0x0, 0x0)
/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/cmd/snapd/main.go:153 +0x501
main.main()
/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/cmd/snapd/main.go:64 +0x10d

goroutine 5 [syscall, 317 minutes]:
os/signal.signal_recv(0x0)
/usr/lib/go-1.10/src/runtime/sigqueue.go:139 +0xa8
os/signal.loop()
/usr/lib/go-1.10/src/os/signal/signal_unix.go:22 +0x24
created by os/signal.init.0
/usr/lib/go-1.10/src/os/signal/signal_unix.go:28 +0x43

goroutine 10 [select, 317 minutes, locked to thread]:
runtime.gopark(0x56291741e780, 0x0, 0x562916d20415, 0x6, 0x18, 0x1)
/usr/lib/go-1.10/src/runtime/proc.go:291 +0x120
runtime.selectgo(0xc42005a750, 0xc42039efc0)
/usr/lib/go-1.10/src/runtime/select.go:392 +0xe56
runtime.ensureSigM.func1()
/usr/lib/go-1.10/src/runtime/signal_unix.go:549 +0x1f6
runtime.goexit()
/usr/lib/go-1.10/src/runtime/asm_amd64.s:2361 +0x1

Seeing the same behaviour currently.
pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=20000}, NULL) = 0 (Timeout)

robertc@robertc-xps:~$ sudo snap debug stacktraces
goroutine 788 [running]:
github.com/snapcore/snapd/daemon.getStacktraces(0xc42042a1d0, 0xb)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/api_debug_stacktrace.go:29 +0x76
github.com/snapcore/snapd/daemon.postDebug(0x562917885b60, 0xc4206e4c00, 0x0, 0x0, 0x0)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/api_debug.go:344 +0x37a
github.com/snapcore/snapd/daemon.(*Command).ServeHTTP(0x562917885b60, 0x56291742f140, 0xc4205ee420, 0xc4206e4c00)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/daemon.go:167 +0x3fd
github.com/snapcore/snapd/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0xc42014e000, 0x56291742f140, 0xc4205ee420, 0xc4206e4900)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/vendor/github.com/gorilla/mux/mux.go:212 +0xcf
github.com/snapcore/snapd/daemon.logit.func1(0x56291742f980, 0xc4201536c0, 0xc4206e4900)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/daemon.go:214 +0xdf
net/http.HandlerFunc.ServeHTTP(0xc420594580, 0x56291742f980, 0xc4201536c0, 0xc4206e4900)
	/usr/lib/go-1.10/src/net/http/server.go:1947 +0x46
net/http.serverHandler.ServeHTTP(0xc420435930, 0x56291742f980, 0xc4201536c0, 0xc4206e4900)
	/usr/lib/go-1.10/src/net/http/server.go:2697 +0xbe
net/http.(*conn).serve(0xc4209ac320, 0x562917430880, 0xc4204046c0)
	/usr/lib/go-1.10/src/net/http/server.go:1830 +0x653
created by net/http.(*Server).Serve
	/usr/lib/go-1.10/src/net/http/server.go:2798 +0x27d

goroutine 1 [select, 317 minutes]:
main.run(0xc4202a96e0, 0x0, 0x0)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/cmd/snapd/main.go:153 +0x501
main.main()
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/cmd/snapd/main.go:64 +0x10d

goroutine 5 [syscall, 317 minutes]:
os/signal.signal_recv(0x0)
	/usr/lib/go-1.10/src/runtime/sigqueue.go:139 +0xa8
os/signal.loop()
	/usr/lib/go-1.10/src/os/signal/signal_unix.go:22 +0x24
created by os/signal.init.0
	/usr/lib/go-1.10/src/os/signal/signal_unix.go:28 +0x43

goroutine 10 [select, 317 minutes, locked to thread]:
runtime.gopark(0x56291741e780, 0x0, 0x562916d20415, 0x6, 0x18, 0x1)
	/usr/lib/go-1.10/src/runtime/proc.go:291 +0x120
runtime.selectgo(0xc42005a750, 0xc42039efc0)
	/usr/lib/go-1.10/src/runtime/select.go:392 +0xe56
runtime.ensureSigM.func1()
	/usr/lib/go-1.10/src/runtime/signal_unix.go:549 +0x1f6
runtime.goexit()
	/usr/lib/go-1.10/src/runtime/asm_amd64.s:2361 +0x1

goroutine 53 [IO wait, 316 minutes]:
internal/poll.runtime_pollWait(0x7fa74c12bf00, 0x72, 0x0)
	/usr/lib/go-1.10/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc420136318, 0x72, 0xc4202de000, 0x0, 0x0)
	/usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:85 +0x9d
internal/poll.(*pollDesc).waitRead(0xc420136318, 0xffffffffffffff00, 0x0, 0x0)
	/usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Accept(0xc420136300, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/lib/go-1.10/src/internal/poll/fd_unix.go:372 +0x1aa
net.(*netFD).accept(0xc420136300, 0xc42006cdd0, 0xc420198750, 0x56291741e8b0)
	/usr/lib/go-1.10/src/net/fd_unix.go:238 +0x44
net.(*UnixListener).accept(0xc420173ef0, 0x562916d097c0, 0xc420030010, 0xc4204158c0)
	/usr/lib/go-1.10/src/net/unixsock_posix.go:162 +0x34
net.(*UnixListener).Accept(0xc420173ef0, 0xc420588140, 0xc42006ce50, 0x562916d1503a, 0xc420030010)
	/usr/lib/go-1.10/src/net/unixsock.go:253 +0x4b
github.com/snapcore/snapd/daemon.(*ucrednetListener).Accept(0xc420173f80, 0x0, 0x0, 0x0, 0x0)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/ucrednet.go:112 +0x4f
net/http.(*Server).Serve(0xc420435930, 0x56291742f100, 0xc420173f80, 0x0, 0x0)
	/usr/lib/go-1.10/src/net/http/server.go:2773 +0x1a7
github.com/snapcore/snapd/daemon.(*Daemon).Start.func1.1(0x56291679a12b, 0xc42008c180)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/daemon.go:371 +0x4b
github.com/snapcore/snapd/vendor/gopkg.in/tomb%!e(MISSING)v2.(*Tomb).run(0xc420430050, 0xc420298000)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
created by github.com/snapcore/snapd/vendor/gopkg.in/tomb%!e(MISSING)v2.(*Tomb).Go
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xbb

goroutine 22 [select, 2 minutes]:
github.com/snapcore/snapd/overlord/standby.(*StandbyOpinions).Start.func1(0xc420432fa0)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/overlord/standby/standby.go:104 +0x142
created by github.com/snapcore/snapd/overlord/standby.(*StandbyOpinions).Start
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/overlord/standby/standby.go:97 +0xb6

goroutine 24 [select, 2 minutes]:
github.com/snapcore/snapd/overlord.(*Overlord).Loop.func1(0x0, 0x0)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/overlord/overlord.go:445 +0x196
github.com/snapcore/snapd/vendor/gopkg.in/tomb%!e(MISSING)v2.(*Tomb).run(0xc420432000, 0xc420594600)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
created by github.com/snapcore/snapd/vendor/gopkg.in/tomb%!e(MISSING)v2.(*Tomb).Go
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xbb

goroutine 25 [IO wait]:
internal/poll.runtime_pollWait(0x7fa74c12be30, 0x72, 0x0)
	/usr/lib/go-1.10/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc420136218, 0x72, 0xc4203a1f00, 0x0, 0x0)
	/usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:85 +0x9d
internal/poll.(*pollDesc).waitRead(0xc420136218, 0xffffffffffffff00, 0x0, 0x0)
	/usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Accept(0xc420136200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/lib/go-1.10/src/internal/poll/fd_unix.go:372 +0x1aa
net.(*netFD).accept(0xc420136200, 0xc4201c7dd0, 0xc420198750, 0x56291741e8b0)
	/usr/lib/go-1.10/src/net/fd_unix.go:238 +0x44
net.(*UnixListener).accept(0xc420173e30, 0x562916d097c0, 0xc420030010, 0xc4204158c0)
	/usr/lib/go-1.10/src/net/unixsock_posix.go:162 +0x34
net.(*UnixListener).Accept(0xc420173e30, 0xc4205ee360, 0xc4201c7e50, 0x562916d1503a, 0xc420030010)
	/usr/lib/go-1.10/src/net/unixsock.go:253 +0x4b
github.com/snapcore/snapd/daemon.(*ucrednetListener).Accept(0xc420173f20, 0x0, 0x0, 0x0, 0x0)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/ucrednet.go:112 +0x4f
net/http.(*Server).Serve(0xc420435930, 0x56291742f100, 0xc420173f20, 0x0, 0x0)
	/usr/lib/go-1.10/src/net/http/server.go:2773 +0x1a7
github.com/snapcore/snapd/daemon.(*Daemon).Start.func1(0x0, 0x0)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/daemon/daemon.go:379 +0x58
github.com/snapcore/snapd/vendor/gopkg.in/tomb%!e(MISSING)v2.(*Tomb).run(0xc420430050, 0xc420030ab0)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
created by github.com/snapcore/snapd/vendor/gopkg.in/tomb%!e(MISSING)v2.(*Tomb).Go
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xbb

goroutine 26 [select, 2 minutes]:
main.runWatchdog.func1(0xc420433090, 0xc420430000)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/cmd/snapd/main.go:93 +0xff
created by main.runWatchdog
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/cmd/snapd/main.go:91 +0x22d

goroutine 80 [select, 22 minutes]:
github.com/snapcore/snapd/overlord/ifacestate/udevmonitor.(*Monitor).Run.func1(0x562916480ffa, 0x56291741e618)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/overlord/ifacestate/udevmonitor/udevmon.go:147 +0x35a
github.com/snapcore/snapd/vendor/gopkg.in/tomb%!e(MISSING)v2.(*Tomb).run(0xc4204465a0, 0xc4208eeec0)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
created by github.com/snapcore/snapd/vendor/gopkg.in/tomb%!e(MISSING)v2.(*Tomb).Go
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xbb

goroutine 29 [syscall]:
syscall.Syscall6(0x17, 0x9, 0xc4201c4e38, 0x0, 0x0, 0x0, 0x0, 0xc4202adc80, 0xc4201c4dc0, 0x562916481b2b)
	/usr/lib/go-1.10/src/syscall/asm_linux_amd64.s:44 +0x5
syscall.Select(0x9, 0xc4201c4e38, 0x0, 0x0, 0x0, 0x0, 0xc4201c4e60, 0x562916478206)
	/usr/lib/go-1.10/src/syscall/zsyscall_linux_amd64.go:1367 +0xb0
github.com/snapcore/snapd/osutil/udev/netlink.stopperSelectReadable(0x4, 0x8, 0xc42024c190, 0x46, 0xc4203bbf20)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/osutil/udev/netlink/rawsockstop.go:57 +0x160
github.com/snapcore/snapd/osutil/udev/netlink.RawSockStopper.func1(0xc42031ec60, 0xc4205b1f38, 0x6)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/osutil/udev/netlink/rawsockstop.go:31 +0x43
github.com/snapcore/snapd/osutil/udev/netlink.(*UEventConn).Monitor.func4(0xc42031ec80, 0xc42094c4e0, 0xc42094c540, 0xc42094c5a0, 0xc4200c73c0, 0x562917432000, 0xc42031ec60, 0xc42094c480)
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/osutil/udev/netlink/conn.go:139 +0x3e
created by github.com/snapcore/snapd/osutil/udev/netlink.(*UEventConn).Monitor
	/build/snapd/parts/snapd-deb/build/_build/src/github.com/snapcore/snapd/osutil/udev/netlink/conn.go:136 +0x233

goroutine 789 [syscall]:
syscall.Syscall(0x0, 0x3, 0xc42086e1f1, 0x1, 0xc4205dc600, 0xc4202bc000, 0xc420990dd8)
	/usr/lib/go-1.10/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x3, 0xc42086e1f1, 0x1, 0x1, 0xc4202bc000, 0x0, 0x0)
	/usr/lib/go-1.10/src/syscall/zsyscall_linux_amd64.go:749 +0x61
syscall.Read(0x3, 0xc42086e1f1, 0x1, 0x1, 0xc420432230, 0x0, 0xc4205c2400)
	/usr/lib/go-1.10/src/syscall/syscall_unix.go:162 +0x4b
internal/poll.(*FD).Read(0xc420acc280, 0xc42086e1f1, 0x1, 0x1, 0x0, 0x0, 0x0)
	/usr/lib/go-1.10/src/internal/poll/fd_unix.go:153 +0x11a
net.(*netFD).Read(0xc420acc280, 0xc42086e1f1, 0x1, 0x1, 0x4, 0xc420990f18, 0x56291648b643)
	/usr/lib/go-1.10/src/net/fd_unix.go:202 +0x51
net.(*conn).Read(0xc4208080e0, 0xc42086e1f1, 0x1, 0x1, 0x0, 0x0, 0x0)
	/usr/lib/go-1.10/src/net/net.go:176 +0x6c
net/http.(*connReader).backgroundRead(0xc42086e1e0)
	/usr/lib/go-1.10/src/net/http/server.go:668 +0x5c
created by net/http.(*connReader).startBackgroundRead
	/usr/lib/go-1.10/src/net/http/server.go:664 +0xd0

Revision history for this message

Paweł Stołowski (stolowski) wrote on 2021-08-20:

@Robert I'm not sure it's the same problem, the original one was related to 'snap install ...' activity (which involved I/O related to our progress bar, among other things, and we tuned that as mvo said), while in your case snapd seems to be idle and not doing anything, is that right? On twitter you mentioned powertop reported lots of wakeups - did you collect 'snap debug stacktraces' while powertop was still reporting wakeups?

There is a similar problem for containerd https://bugs.launchpad.net/ubuntu/+source/containerd/+bug/1826684 which might suggest a go-runtime issue.

Revision history for this message

Colin Ian King (colin-king) wrote on 2021-08-20:

One can also run health-check on the process and observe the activity, e.g.

sudo health-check -p snapd -d 60

..this will profile the process snapd for 60 seconds to get an overall idea of the kinds of system calls that are producing excessive wakeups.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.