Launchpad itself

celery workers sometimes end up cursed and produce OOPSes for all SnapStoreUploadJobs

Bug #1792920 reported by Colin Watson on 2018-09-17

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Fix Released	Critical	Unassigned

Bug Description

We've had two incidents today where a celery worker got into a state where all SnapStoreUploadJobs it ran failed with SSLErrors (e.g. OOPS-5659f581f26c56b85511fa459f81a91d). Other jobs run by the same worker seem to be fine when it's in this state. I suspect that maybe we have some bad connection pooling?

Tags:

Revision history for this message

Colin Watson (cjwatson) wrote on 2018-10-12:

This is due to a single worker getting ENOMEM in response to all mmap syscalls even though there is no clear reason why that should be the case (plenty of memory and not an unreasonable number of current maps). We don't yet know why this is happening.

Revision history for this message

Colin Watson (cjwatson) wrote on 2018-11-22:

I'm hoping that https://code.launchpad.net/~cjwatson/launchpad/optimise-git-ref-scan/+merge/359171 may improve the situation here. We're also considering no longer scanning refs/changes/*, since it's very large for some repositories and not super-useful to have in the LP database (as opposed to in git).

Revision history for this message

Colin Watson (cjwatson) wrote on 2021-04-12:

This hasn't been a problem for some time. I don't know whether it was the fixes mentioned in my previous comment or something else, but I'll take it.

Changed in launchpad:
status:	Triaged → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.