too many open files crashed snapd and reverted or deleted some snaps

Bug #2084730 reported by Seyeong Kim
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Fix Committed
Undecided
Zeyad Gouda
snapd (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Committed
Undecided
Unassigned
Jammy
Fix Committed
Undecided
Unassigned
Noble
Fix Committed
Undecided
Unassigned
Oracular
Fix Committed
Undecided
Unassigned

Bug Description

[SRU] 2.67: https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/2089691

[ Impact ]

 * Under very specific circumstances can lead to installations reverting and broken snaps
 * For specific circumstance see: https://github.com/canonical/snapd/pull/14671/files#diff-b52eba136e950ff61d364be90afd6ea41c8deeab1b33d36e49e0fc60e34640f1R6

[ Test Plan ]

Refer to the spread test for test steps: https://github.com/ZeyadYasser/snapd/blob/f162216e13a31d423960ec198ad0f586b1192830/tests/regression/lp-2084730/task.yaml

Test by running the above spread test/test steps with snapd from plucky-proposed: https://launchpad.net/ubuntu/+source/snapd/2.67+25.04

--- original ---

a customer faced snapd issue.

at some point snapd had below

Oct 15 21:37:18 HOSTNAME snapd[3320691]: 2024/10/15 21:37:18 http: Accept error: accept unix /run/snapd.socket: accept4: too many open files; retrying in 1s
Oct 15 21:37:19 HOSTNAME snapd[3320691]: 2024/10/15 21:37:19 http: Accept error: accept unix /run/snapd.socket: accept4: too many open files; retrying in 1s
Oct 15 21:37:20 HOSTNAME snapd[3320691]: 2024/10/15 21:37:20 http: Accept error: accept unix /run/snapd.socket: accept4: too many open files; retrying in 1s
Oct 15 21:37:21 HOSTNAME snapd[3320691]: 2024/10/15 21:37:21 http: Accept error: accept unix /run/snapd.socket: accept4: too many open files; retrying in 1s
Oct 15 21:37:22 HOSTNAME snapd[3320691]: 2024/10/15 21:37:22 http: Accept error: accept unix /run/snapd.socket: accept4: too many open files; retrying in 1s

...

Oct 15 21:41:01 HOSTNAME systemd[1]: snapd.service: Watchdog timeout (limit 5min)!
Oct 15 21:41:01 HOSTNAME systemd[1]: snapd.service: Killing process 3320691 (snapd) with signal SIGABRT.
Oct 15 21:41:01 HOSTNAME snapd[3320691]: SIGABRT: abort
Oct 15 21:41:01 HOSTNAME snapd[3320691]: PC=0x557e1548daa1 m=0 sigcode=0
Oct 15 21:41:01 HOSTNAME snapd[3320691]: goroutine 0 [idle]:
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.futex()
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/go-1.18/src/runtime/sys_linux_amd64.s:552 +0x21
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.futexsleep(0x7ffe28dbd4c0?, 0x15467453?, 0xc000046500?)
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/go-1.18/src/runtime/os_linux.go:66 +0x36
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.notesleep(0x557e169a3448)
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/go-1.18/src/runtime/lock_futex.go:159 +0x87
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.mPark()
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/go-1.18/src/runtime/proc.go:1449 +0x25
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.stoplockedm()
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/go-1.18/src/runtime/proc.go:2422 +0x65
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.schedule()
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/go-1.18/src/runtime/proc.go:3119 +0x3d
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.park_m(0xc0002381a0?)
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/go-1.18/src/runtime/proc.go:3336 +0x14d
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.mcall()
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/go-1.18/src/runtime/asm_amd64.s:425 +0x45
Oct 15 21:41:01 HOSTNAME snapd[3320691]: goroutine 1 [select, 97530 minutes]:
Oct 15 21:41:01 HOSTNAME snapd[3320691]: main.run(0xc00055a060)

... a lot of go trace

Oct 15 21:41:02 HOSTNAME systemd[1]: snapd.service: Scheduled restart job, restart counter is at 1.
Oct 15 21:41:02 HOSTNAME systemd[1]: snapd.service: Consumed 31min 4.052s CPU time.
Oct 15 21:41:02 HOSTNAME snapd[2818402]: overlord.go:271: Acquiring state lock file
Oct 15 21:41:02 HOSTNAME snapd[2818402]: overlord.go:276: Acquired state lock file
Oct 15 21:41:02 HOSTNAME snapd[2818402]: patch.go:64: Patching system state level 6 to sublevel 1...
Oct 15 21:41:02 HOSTNAME snapd[2818402]: patch.go:64: Patching system state level 6 to sublevel 2...
Oct 15 21:41:02 HOSTNAME snapd[2818402]: patch.go:64: Patching system state level 6 to sublevel 3...
Oct 15 21:41:02 HOSTNAME snapd[2818402]: daemon.go:247: started snapd/2.63 (series 16; classic) ubuntu/22.04 (amd64) linux/5.15.0-86-generic.
Oct 15 21:41:02 HOSTNAME snapd[2818402]: daemon.go:340: adjusting startup timeout by 2m5s (pessimistic estimate of 30s plus 5s per snap)
Oct 15 21:41:02 HOSTNAME snapd[2818402]: backends.go:58: AppArmor status: apparmor is enabled and all features are available (using snapd provided apparmor_parser)
Oct 15 21:41:05 HOSTNAME snapd[2818402]: services.go:1067: RemoveSnapServices - disabling snap.juju.fetch-oci.service

After this, They faced juju and maas snap issue.

1. they installed maas 3.5 from 3.5/edge channel but it is reverted to 3.4/stable
- mount shows us (deleted)
- - /var/lib/snapd/snaps/maas_36363.snap (deleted) on /snap/maas/36363 type squashfs (ro,nodev,relatime,errors=continue,x-gdu.hide)
- snap list doesn't show 36363
- - maas 3.4.3-14366-g.1ae9d903a 36735 3.4/edge canonical** held
- - maas - 36280 3.4/edge canonical** disabled,broken,held
- maas process is using 3.5 for now, so even if mount shows us (deleted) but it is working with 3.5. I guess if process once down, it will be deleted completely.

2. they had 2 juju snap
- snap list
- - juju - 28362 2.9/stable canonical** broken
- - juju 2.9.50 28040 2.9/stable canonical** disabled,classic

Please let me know if you need any more information.

Revision history for this message
Seyeong Kim (seyeongkim) wrote :
tags: added: sts
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

We might check why too many open files for socket first.

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

Thanks for reporting this. We have identified a problem related to refresh app awareness feature which, in a very specific scenario, can cause snapd to enter a deadlock state. This results in all API calls hanging, and if the API is continuously invoked (as seen in the logs), the snapd process may eventually reach the maximum file descriptor limit. This may affect starting snap applications, but snap services should not be affected. The problem was introduced in snapd 2.63.

Changed in snapd:
status: New → Confirmed
assignee: nobody → Zeyad Gouda (zeyadgouda)
Revision history for this message
Shunde Zhang (shunde-zhang) wrote :

This has more serious impact on the system, some snaps are marked as broken in snap list output, some snap files are removed from /var/lib/snapd/snaps/ directory. The user had to reboot the machine to get everything back to normal. It would be good to have a way to fix this without rebooting.

Revision history for this message
Zeyad Gouda (zeyadgouda) wrote :

We are working on a fix that addresses the root cause of the issue, once it lands no reboots will be needed.

Revision history for this message
Ernest Lotter (ernestl) wrote (last edit ):
Changed in snapd:
status: Confirmed → Fix Committed
Ernest Lotter (ernestl)
Changed in snapd:
milestone: none → 2.67
Ernest Lotter (ernestl)
description: updated
Ernest Lotter (ernestl)
description: updated
Revision history for this message
Ernest Lotter (ernestl) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello Seyeong, or anyone else affected,

Accepted snapd into oracular-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/snapd/2.67.1+24.10 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-oracular to verification-done-oracular. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-oracular. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed verification-needed-oracular
Changed in snapd (Ubuntu Noble):
status: New → Fix Committed
tags: added: verification-needed-noble
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hello Seyeong, or anyone else affected,

Accepted snapd into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/snapd/2.67.1+24.04 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-noble. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in snapd (Ubuntu Jammy):
status: New → Fix Committed
tags: added: verification-needed-jammy
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hello Seyeong, or anyone else affected,

Accepted snapd into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/snapd/2.67.1+22.04 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in snapd (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hello Seyeong, or anyone else affected,

Accepted snapd into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/snapd/2.67.1+20.04 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in snapd (Ubuntu):
status: New → Fix Released
Changed in snapd (Ubuntu Oracular):
status: New → Fix Committed
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (snapd/2.67.1+22.04)

All autopkgtests for the newly accepted snapd (2.67.1+22.04) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

docker.io-app/26.1.3-0ubuntu1~22.04.1 (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#snapd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (snapd/2.67.1+20.04)

All autopkgtests for the newly accepted snapd (2.67.1+20.04) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

samba/2:4.15.13+dfsg-0ubuntu0.20.04.8 (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#snapd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (snapd/2.67.1+24.04)

All autopkgtests for the newly accepted snapd (2.67.1+24.04) for noble have finished running.
The following regressions have been reported in tests triggered by the package:

samba/2:4.19.5+dfsg-4ubuntu9 (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/noble/update_excuses.html#snapd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.