celery workers sometimes end up cursed and produce OOPSes for all SnapStoreUploadJobs
Bug #1792920 reported by
Colin Watson
This bug affects 4 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
Critical
|
Unassigned |
Bug Description
We've had two incidents today where a celery worker got into a state where all SnapStoreUploadJobs it ran failed with SSLErrors (e.g. OOPS-5659f581f26c56b85511fa459f81a91d). Other jobs run by the same worker seem to be fine when it's in this state. I suspect that maybe we have some bad connection pooling?
To post a comment you must log in.
This is due to a single worker getting ENOMEM in response to all mmap syscalls even though there is no clear reason why that should be the case (plenty of memory and not an unreasonable number of current maps). We don't yet know why this is happening.