Application SNAP is slower than DEB at first launch

Bug #1833004 reported by frenchy82
44
This bug affects 8 people
Affects Status Importance Assigned to Milestone
snapd
Confirmed
Wishlist
Unassigned

Bug Description

Gnome calculator for exemple take 7 times more time to launch than DEB version on a first launch

For exemple with gnome-calculator on ubuntu 19.04

Just after booting the computer
$ time snap run gnome-calculator

real 0m7,078s
user 0m0,465s
sys 0m2,635s

then install DEB of gnome calculator
$ time gnome-calculator

real 0m1,813s
user 0m0,361s
sys 0m0,037s

In a second launch in the same session
$ time snap run gnome-calculator

real 0m1,856s
user 0m0,447s
sys 0m0,285s

So the problem is for the first time in the session.

My computer

$ free
              total utilisé libre partagé tamp/cache disponible
Mem: 8146512 1496916 4372884 119200 2276712 6245588
Partition d'échange: 4095996 0 4095996

$ cat /proc/cpuinfo

vendor_id : AuthenticAMD
cpu family : 21
model : 1
model name : AMD FX(tm)-4100 Quad-Core Processor
stepping : 2
microcode : 0x600063e
cpu MHz : 2007.232
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 7248.50
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb

processor : 1
vendor_id : AuthenticAMD
cpu family : 21
model : 1
model name : AMD FX(tm)-4100 Quad-Core Processor
stepping : 2
microcode : 0x600063e
cpu MHz : 2330.727
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 7248.50
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb

processor : 2
vendor_id : AuthenticAMD
cpu family : 21
model : 1
model name : AMD FX(tm)-4100 Quad-Core Processor
stepping : 2
microcode : 0x600063e
cpu MHz : 1688.981
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 2
apicid : 2
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 7248.50
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb

processor : 3
vendor_id : AuthenticAMD
cpu family : 21
model : 1
model name : AMD FX(tm)-4100 Quad-Core Processor
stepping : 2
microcode : 0x600063e
cpu MHz : 1648.517
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 2
apicid : 3
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 7248.50
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb

$ df
Sys. de fichiers blocs de 1K Utilisé Disponible Uti% Monté sur
udev 4015848 0 4015848 0% /dev
tmpfs 814652 3148 811504 1% /run
/dev/sda2 50133544 8527028 39030132 18% /
tmpfs 4073256 132632 3940624 4% /dev/shm
tmpfs 5120 4 5116 1% /run/lock
tmpfs 4073256 0 4073256 0% /sys/fs/cgroup
/dev/loop1 55040 55040 0 100% /snap/core18/941
/dev/loop3 1024 1024 0 100% /snap/gnome-logs/61
/dev/loop2 3840 3840 0 100% /snap/gnome-system-monitor/87
/dev/loop4 271744 271744 0 100% /snap/kde-frameworks-5-core18/29
/dev/loop7 55040 55040 0 100% /snap/core18/970
/dev/loop9 90624 90624 0 100% /snap/core/6964
/dev/loop8 15104 15104 0 100% /snap/gnome-characters/284
/dev/loop6 91392 91392 0 100% /snap/core/6673
/dev/loop10 15104 15104 0 100% /snap/gnome-characters/280
/dev/loop5 91648 91648 0 100% /snap/core/6818
/dev/loop12 3840 3840 0 100% /snap/gnome-system-monitor/83
/dev/loop14 36224 36224 0 100% /snap/gtk-common-themes/1198
/dev/loop11 154880 154880 0 100% /snap/gnome-3-28-1804/55
/dev/loop13 154752 154752 0 100% /snap/gnome-3-28-1804/51
/dev/sda3 906873768 104670068 756114076 13% /home
tmpfs 814648 44 814604 1% /run/user/1000

Revision history for this message
frenchy82 (cartes) wrote :

more information

 snap run --trace-exec gnome-calculator
[sudo] Mot de passe de cartes :
Slowest 10 exec calls during snap run:
  0.596s snap-update-ns
  0.869s /snap/core/6964/usr/lib/snapd/snap-confine
  0.199s /usr/lib/snapd/snap-exec
  0.265s /snap/gnome-calculator/406/snap/command-chain/snapcraft-runner
  0.166s /usr/bin/getent
  0.091s /bin/mkdir
  0.162s /usr/bin/head
  0.097s /usr/bin/realpath
  0.080s /bin/rm
  1.736s /snap/gnome-calculator/406/bin/desktop-launch
Total time: 9.840s

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

While snaps will always have some non-zero overhead compared to non-containerized applications, we have greatly reduced the overhead of a large class of desktop applications that were caused by the fontconfig stack.

On a 4th gen mobile core i7 my timings are:

zyga@x240:~$ snap run --trace-exec gnome-calculator
Slowest 10 exec calls during snap run:
  0.146s /snap/core/7910/usr/lib/snapd/snap-confine
  0.148s /snap/gnome-calculator/501/gnome-platform/usr/bin/gtk-update-icon-cache
  0.159s /snap/gnome-calculator/501/gnome-platform/usr/sbin/update-icon-caches
  0.108s /snap/gnome-calculator/501/gnome-platform/usr/sbin/update-icon-caches
  0.777s /snap/gnome-calculator/501/gnome-platform/usr/lib/x86_64-linux-gnu/glib-2.0/gio-querymodules
  0.122s /snap/gnome-calculator/501/gnome-platform/usr/bin/gtk-update-icon-cache
  0.149s /snap/gnome-calculator/501/gnome-platform/usr/sbin/update-icon-caches
  0.992s /snap/gnome-calculator/501/gnome-platform/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/gdk-pixbuf-query-loaders
  0.275s /snap/gnome-calculator/501/gnome-platform/usr/lib/x86_64-linux-gnu/libgtk-3-0/gtk-query-immodules-3.0
  1.947s /snap/gnome-calculator/501/bin/desktop-launch

On second launch the timings are much better although I know the bug talks about first launch.

Slowest 10 exec calls during snap run:
  0.021s /snap/core/7910/usr/lib/snapd/snap-confine
  0.017s /bin/mkdir
  0.013s /bin/mkdir
  0.012s /usr/bin/head
  0.013s /usr/bin/head
  0.012s /usr/bin/realpath
  0.012s /usr/bin/realpath
  0.012s /usr/bin/realpath
  0.013s /bin/mkdir
  0.332s /snap/gnome-calculator/501/bin/desktop-launch

There are several other reasons for the extra overhead: the sandbox setup, the xz compression of squashfs, the fact that libraries from the host and the snap may not be the same and are loaded separately, etc.

Overall I'd like to look for precise things we can measure and optimize. A big bug that says things are slower is only useful to the point that we acknowledge the fact.

affects: snappy → snapd
Changed in snapd:
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I'm marking this as a wishlist item as there's nothing concrete we are doing right now to address the broad issue. Separately there are several efforts to improve understanding of what is taking the time and how to make known pieces faster. One such idea is "snap run --explain" which was recently documented on the forum: https://forum.snapcraft.io/t/snap-run-explain/13427

Revision history for this message
Olivier Tilloy (osomon) wrote :

This topic is trendy again, with the advent of the firefox snap in the upcoming release of Ubuntu 22.04. See https://bugzilla.mozilla.org/show_bug.cgi?id=1748076 for a discussion.

Users are reporting a combined overhead for snap-update-ns and snap-confine of more than 2 seconds (I'm guessing this is hardware-dependent, as I'm not seeing these numbers) for cold starts of firefox, so I'm wondering whether optimizations on this front could be investigated?

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

The snap-confine and snap-update-ns numbers look a bit odd to me still. I suspect that actually by running it under strace we make the results worse than they really are. I've added some simple trace lines at specific places in the code, right when /usr/bin/snap start, next right before execing snap-confine, then one when snap-exec starts (note that this is already inside the mount ns) and last one right before it execs the command specified in snap.yaml. The results are different than what I get from strace. Actually what triggered me to do it was a perceivable difference in start times between a `SNAPD_DEBUG=1 snap run firefox` and `SNAPD_DEBUG=1 snap run --trace-exec firefox`.

From the runs:

Slowest 10 exec calls during snap run:
  1.082s snap-update-ns
  1.182s /usr/lib/snapd/snap-confine
  0.584s /snap/firefox/1025/snap/command-chain/desktop-launch
  0.162s /usr/bin/xdg-settings
  0.564s /snap/firefox/1025/usr/lib/firefox/firefox
  5.175s /snap/firefox/1025/usr/lib/firefox/firefox
  0.647s /snap/firefox/1025/usr/lib/firefox/firefox
  14.411s /snap/firefox/1025/usr/lib/firefox/firefox
  0.357s /snap/firefox/1025/usr/lib/firefox/pingsender
  0.407s /snap/firefox/1025/usr/lib/firefox/pingsender

and indeed the traces are there:
      3:2022/03/02 13:16:12.369011 main.go:445: DEBUG: TRACE snap start
     13:2022/03/02 13:16:12.396505 cmd_run.go:919: DEBUG: TRACE exec snap-confine under strace
    617:2022/03/02 13:16:13.615339 main.go:55: DEBUG: TRACE snap-exec start
    618:2022/03/02 13:16:13.617770 main.go:247: DEBUG: TRACE exec app

While without --trace-exec:
      3:2022/03/02 13:10:12.059278 main.go:445: DEBUG: TRACE snap start
     13:2022/03/02 13:10:12.083831 cmd_run.go:1244: DEBUG: TRACE exec snap-confine
    617:2022/03/02 13:10:12.238004 main.go:55: DEBUG: TRACE snap-exec start
    618:2022/03/02 13:10:12.239462 main.go:247: DEBUG: TRACE exec app

So that's like 1.28s with --trace-exec, vs 180ms without. Plus the timings in --trace-exec output are actually calculated by looking at timestamps from strace which diverge even more.

The patch is here: https://paste.ubuntu.com/p/9qftNMjYvP/

It's clear we have to come up with a better way of observing actual events. Perhaps investigate the overhead of perf trace.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.