too many open files crashed snapd and reverted or deleted some snaps
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
snapd |
Fix Committed
|
Undecided
|
Zeyad Gouda | ||
snapd (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Committed
|
Undecided
|
Unassigned | ||
Jammy |
Fix Committed
|
Undecided
|
Unassigned | ||
Noble |
Fix Committed
|
Undecided
|
Unassigned | ||
Oracular |
Fix Committed
|
Undecided
|
Unassigned |
Bug Description
[SRU] 2.67: https:/
[ Impact ]
* Under very specific circumstances can lead to installations reverting and broken snaps
* For specific circumstance see: https:/
[ Test Plan ]
Refer to the spread test for test steps: https:/
Test by running the above spread test/test steps with snapd from plucky-proposed: https:/
--- original ---
a customer faced snapd issue.
at some point snapd had below
Oct 15 21:37:18 HOSTNAME snapd[3320691]: 2024/10/15 21:37:18 http: Accept error: accept unix /run/snapd.socket: accept4: too many open files; retrying in 1s
Oct 15 21:37:19 HOSTNAME snapd[3320691]: 2024/10/15 21:37:19 http: Accept error: accept unix /run/snapd.socket: accept4: too many open files; retrying in 1s
Oct 15 21:37:20 HOSTNAME snapd[3320691]: 2024/10/15 21:37:20 http: Accept error: accept unix /run/snapd.socket: accept4: too many open files; retrying in 1s
Oct 15 21:37:21 HOSTNAME snapd[3320691]: 2024/10/15 21:37:21 http: Accept error: accept unix /run/snapd.socket: accept4: too many open files; retrying in 1s
Oct 15 21:37:22 HOSTNAME snapd[3320691]: 2024/10/15 21:37:22 http: Accept error: accept unix /run/snapd.socket: accept4: too many open files; retrying in 1s
...
Oct 15 21:41:01 HOSTNAME systemd[1]: snapd.service: Watchdog timeout (limit 5min)!
Oct 15 21:41:01 HOSTNAME systemd[1]: snapd.service: Killing process 3320691 (snapd) with signal SIGABRT.
Oct 15 21:41:01 HOSTNAME snapd[3320691]: SIGABRT: abort
Oct 15 21:41:01 HOSTNAME snapd[3320691]: PC=0x557e1548daa1 m=0 sigcode=0
Oct 15 21:41:01 HOSTNAME snapd[3320691]: goroutine 0 [idle]:
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.futex()
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.mPark()
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.schedule()
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/
Oct 15 21:41:01 HOSTNAME snapd[3320691]: runtime.mcall()
Oct 15 21:41:01 HOSTNAME snapd[3320691]: /usr/lib/
Oct 15 21:41:01 HOSTNAME snapd[3320691]: goroutine 1 [select, 97530 minutes]:
Oct 15 21:41:01 HOSTNAME snapd[3320691]: main.run(
... a lot of go trace
Oct 15 21:41:02 HOSTNAME systemd[1]: snapd.service: Scheduled restart job, restart counter is at 1.
Oct 15 21:41:02 HOSTNAME systemd[1]: snapd.service: Consumed 31min 4.052s CPU time.
Oct 15 21:41:02 HOSTNAME snapd[2818402]: overlord.go:271: Acquiring state lock file
Oct 15 21:41:02 HOSTNAME snapd[2818402]: overlord.go:276: Acquired state lock file
Oct 15 21:41:02 HOSTNAME snapd[2818402]: patch.go:64: Patching system state level 6 to sublevel 1...
Oct 15 21:41:02 HOSTNAME snapd[2818402]: patch.go:64: Patching system state level 6 to sublevel 2...
Oct 15 21:41:02 HOSTNAME snapd[2818402]: patch.go:64: Patching system state level 6 to sublevel 3...
Oct 15 21:41:02 HOSTNAME snapd[2818402]: daemon.go:247: started snapd/2.63 (series 16; classic) ubuntu/22.04 (amd64) linux/5.
Oct 15 21:41:02 HOSTNAME snapd[2818402]: daemon.go:340: adjusting startup timeout by 2m5s (pessimistic estimate of 30s plus 5s per snap)
Oct 15 21:41:02 HOSTNAME snapd[2818402]: backends.go:58: AppArmor status: apparmor is enabled and all features are available (using snapd provided apparmor_parser)
Oct 15 21:41:05 HOSTNAME snapd[2818402]: services.go:1067: RemoveSnapServices - disabling snap.juju.
After this, They faced juju and maas snap issue.
1. they installed maas 3.5 from 3.5/edge channel but it is reverted to 3.4/stable
- mount shows us (deleted)
- - /var/lib/
- snap list doesn't show 36363
- - maas 3.4.3-14366-
- - maas - 36280 3.4/edge canonical** disabled,
- maas process is using 3.5 for now, so even if mount shows us (deleted) but it is working with 3.5. I guess if process once down, it will be deleted completely.
2. they had 2 juju snap
- snap list
- - juju - 28362 2.9/stable canonical** broken
- - juju 2.9.50 28040 2.9/stable canonical** disabled,classic
Please let me know if you need any more information.
Changed in snapd: | |
milestone: | none → 2.67 |
description: | updated |
description: | updated |
We might check why too many open files for socket first.