Please enable CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU

Bug #1980861 reported by Michael Vogt
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
snapd
New
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Dimitri John Ledkov
Kinetic
Fix Released
Undecided
Dimitri John Ledkov

Bug Description

The snapd/desktop/advocacy team are investigating the startup performance of running snaps.

While doing this and comparing various distros (fedora, opensuse, ubuntu) on Igors reference hardware we noticed that fedora is substantially faster at starting snaps.

After some research it turns out that they use the CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y option when we use CONFIG_SQUASHFS_DECOMP_SINGLE.

I build^Whacked a test kernel in my ppa:mvo/tmp PPA that just changes this one option (https://paste.ubuntu.com/p/NvS4GQjnSp/) and with that the first run firefox startup went from 15s to 6s (!) on Igors reference hardware. Given the dramatic performance improvements we would like to get this option switched.

However we need to be careful and double check that the results from https://bugs.launchpad.net/snappy/+bug/1636847 are no longer an issue - back in 2016 the kernel team swtich to _DECOMP_SINGLE because just mounting a snap would use up 131Mb or memory(!).

CVE References

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1980861

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Monitoring for the mentioned bug report will be needed indeed.

Also note that squashfs XZ compression is memory hungry, a much better option would be to have snaps start using ZSTD compression - it is widespread enough right?

Separately I hope / fear that all our ubuntu core systems have multiple xz compressed snaps, and with CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU they may blow the RAM budget. For example, snapd nested-vm spread tests might start failing (?! or are they single core instances)

It would be worthwhile to quickly rebuild pi-kernel with this option turned on, and double check that multi-core pi models with low ram can still boot successfully.

Revision history for this message
Dave Jones (waveform) wrote :

I'm hoping to look into the effects of MULTI_PERCPU on the smaller Pi platforms (in particular the Pi Zero 2 and 3A+ which each have 512MB of RAM and, with the arm64 arch, typically have ~250MB free at runtime). Unfortunately building a local version of the linux-raspi package (just naively with sbuild) has thus far defeated me, but if I can get a version with the relevant config I'll run through some tests.

Revision history for this message
Dave Jones (waveform) wrote :
Download full text (3.4 KiB)

I've finished some research with this on one of the larger Pis (a 400) for performance measurements, and a memory limited Pi (Zero 2W) for an idea of the impact on memory usage.

First the performance side of things: the good news is it doesn't make anything worse, the bad news is it doesn't make them *much* better (unlike on the PC where it apparently makes a substantial difference). The biggest difference on the Pi 400 was a drop from 16s to 13s in the jammy release of Firefox's cold startup time. However, there was no difference at all in the warm startup time, and the difference in the cold startup time disappeared almost entirely when using the candidate version of Firefox with the lang pack fix (diff was 0.3s -- which I'd assume is essentially nothing given I was doing manual timing and thus the times are subject to my reaction time of ~0.2s).

On the memory side of things I used a selection of 6 snaps installed on the Pi Zero 2W: mosquitto, node-red, micropython, node, ruby, and lxd (a combination of relatively common and fairly IoT specific snaps) which is probably about as much as anyone could reasonably expect to install and run on a half-gig system. I measured the system after a fresh boot running only default services, and otherwise idle on the Raspberry Pi jammy (22.04 LTS) arm64 server image. The arm64 architecture was selected partly for the wider availability of snaps (there are very few built for armhf) and partly because memory effects were more likely to be pronounced on this architecture. Over the course of 30 minutes, everything from /proc/meminfo was dumped to an SQLite database, first under the current release of the kernel (5.15.0-1011-raspi), then under a re-built version with CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU enabled (and CONFIG_SQUASHFS_DECOMP_SINGLE disabled).

MemAvailable stayed reasonably stable across each run, and showed a median ~12MB less on the MULTI_PERCPU kernel than on the SINGLE kernel, suggesting that each snap's mount occupied somewhere in the region of 2MB more memory under the MULTI_PERCPU kernel. Other statistics were less clear cut: no significant difference in kernel stack, and MemFree actually showed the opposite (but this should probably be ignored as it doesn't exclude evictable pages like the disk cache, and the kernel has a duty to minimize it so it will typically fall predictably on a freshly booted system meaning that differences in the start time of a measurement run will lead to an irrelevant delta). The kernel slab measure showed a interesting (stable, median) 6MB increase from the SINGLE to MULTI_PERCPU which may account for some of the extra memory being used.

Conclusion:

* There's some memory loss but it's small enough that it shouldn't significantly impact even tightly constrained systems like the Zero 2W
* The performance gains are likely minimal under ARM. As such, were ARM the only set of architectures being considered I'd probably recommend against this
* However, the performance gains on the x86 family are significant so for me this comes down to a "lack of harm" judgment:

If this can be enabled on amd64 and disabled on armhf+arm64 then that would probably be the ideal si...

Read more...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@waveform thanks for this.

Separately will need to evaluate impact on small cloud instances. They may have multiple cpu cores, x86, low memory, and many snap revisions. I.e. 4-5 snaps, with 2-3 revisions of each.

I wonder if it will make sense to set multi_percpu on x86/arm64/s390x/ppc64el (generic+cloud), but not on armhf/arm64 (pi) nor on riscv64 (generic/oem).

On a different tangent, it might be possible to improve _MULTI strategy to implement dynamic decompressor allocation, but also cap it based on available RAM. Such that it behaves closer to _SINGLE when there isn't enough spare RAM.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

We have completed analysis on cloud instances. We have not managed to reproduce the drastic "firefox startup went from 15s to 6s (!) on Igors reference hardware" result. Most of our results are within insignificant low single % differences, with some improved results, and with some good improvements on very beefy machines.

What is "Igors reference hardware"?

Overall there is nothing negative observed (OOM, extreme memory usage, failure to boot small instances).

We can consider to change the config in Kinetic, hwe, and oem kernels. This will cover majority of client/desktop/laptop/cloud instances.

Changed in linux (Ubuntu):
status: Incomplete → In Progress
assignee: nobody → Dimitri John Ledkov (xnox)
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.6 KiB)

This bug was fixed in the package linux - 5.19.0-18.18

---------------
linux (5.19.0-18.18) kinetic; urgency=medium

  * kinetic/linux: 5.19.0-18.18 -proposed tracker (LP: #1990366)

  * 5.19.0-17.17: kernel NULL pointer dereference, address: 0000000000000084
    (LP: #1990236)
    - Revert "UBUNTU: SAUCE: apparmor: Fix regression in stacking due to label
      flags"
    - Revert "UBUNTU: [Config] disable SECURITY_APPARMOR_RESTRICT_USERNS"
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - add an internal buffer""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - don't wait on cleanup""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - don't waste entropy""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - always add a pending
      request""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - unregister device before
      reset""
    - Revert "UBUNTU: SAUCE: Revert "virtio-rng: make device ready before making
      request""
    - Revert "UBUNTU: [Config] update configs after apply new apparmor patch set"
    - Revert "UBUNTU: SAUCE: apparmor: add user namespace creation mediation"
    - Revert "UBUNTU: SAUCE: selinux: Implement userns_create hook"
    - Revert "UBUNTU: SAUCE: bpf-lsm: Make bpf_lsm_userns_create() sleepable"
    - Revert "UBUNTU: SAUCE: security, lsm: Introduce security_create_user_ns()"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: AppArmor: Remove the exclusive
      flag"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Add /proc attr entry for full
      LSM context"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Removed scaffolding function
      lsmcontext_init"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: netlabel: Use a struct lsmblob in
      audit data"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Add record for multiple
      object contexts"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: audit: multiple subject lsm values
      for netlabel"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Add record for multiple task
      security contexts"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Allow multiple records in an
      audit_buffer"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Add a function to report
      multiple LSMs"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Create audit_stamp
      structure"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Keep multiple LSM data in
      audit_names"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: security_secid_to_secctx
      module selection"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: binder: Pass LSM identifier for
      confirmation"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: NET: Store LSM netlabel data in a
      lsmblob"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: security_secid_to_secctx in
      netlink netfilter"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_dentry_init_security"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_inode_getsecctx"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_secid_to_secctx"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM:...

Changed in linux (Ubuntu Kinetic):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-6.0/6.0.0-1005.5 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Michael Vogt (mvo)
affects: snappy → snapd
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.