Provide oom_adj option for ceph-mon/osd

Bug #1428572 reported by Peter Sabaini
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Triaged
Wishlist
Unassigned
Ceph OSD Charm
Triaged
Wishlist
Unassigned
OpenStack Ceph Charm (Retired)
Won't Fix
Wishlist
Unassigned
ceph (Juju Charms Collection)
Invalid
Wishlist
Unassigned

Bug Description

In low-memory situations, a ceph-osd or ceph-mon could be killed by the oom killer. However this will trigger a ceph recovery, which will likely result in a memory usage spike, thus worsening the original problem. It would be good to have an option to tune ceph-osd and ceph-mon processes' oom_adj value so that they are less likely to attract the attention of the oom killer.

James Page (james-page)
Changed in ceph (Juju Charms Collection):
status: New → Triaged
importance: Undecided → Wishlist
milestone: none → 15.10
Changed in ceph (Juju Charms Collection):
milestone: 15.10 → 16.04
Revision history for this message
Chris Holcombe (xfactor973) wrote :

If the oom killer comes out chances are good that all bets are off. It can literally kill almost anything leaving your system in an undefined state. I would suggest that instead of trying to shield ceph from the oom killer that we just reboot the machine. I've seen the oom killer leave systems in all kinds of weird broken states that are hard to debug.

Revision history for this message
James Troup (elmo) wrote : Re: [Bug 1428572] Re: Provide oom_adj option for ceph-mon/osd

Chris Holcombe <email address hidden> writes:

> If the oom killer comes out chances are good that all bets are off. It
> can literally kill almost anything leaving your system in an undefined
> state. I would suggest that instead of trying to shield ceph from the
> oom killer that we just reboot the machine. I've seen the oom killer
> leave systems in all kinds of weird broken states that are hard to
> debug.

Sorry but this makes no sense. In the Canonical Cloud Reference
Architecture, ceph will often be running with a bunch of eminently
killable processes (e.g. user KVM instances) and it makes a ton of
sense to have the OOM killer go after them before ceph. Saying "just
never run out of memory" isn't realistic IMO.

--
James

Revision history for this message
Chris Holcombe (xfactor973) wrote :

elmo: Good point. I wasn't thinking about the hyper converged case. In that case it definitely makes sense to kill off the kvm instances and other stuff first. For dedicated ceph hardware the story is a bit different. You're only running ceph and if the oom killer comes out ceph is pretty much the only target. I've seen cases in the past with dedicated hardware where the oom killer put the server into a very broken state that was hard to repair without rebooting.

James Page (james-page)
Changed in ceph (Juju Charms Collection):
milestone: 16.04 → 16.07
James Page (james-page)
Changed in ceph (Juju Charms Collection):
milestone: 16.07 → 16.10
James Page (james-page)
Changed in charm-ceph:
importance: Undecided → Wishlist
status: New → Triaged
Changed in ceph (Juju Charms Collection):
status: Triaged → Invalid
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Marking the charm-ceph task wontfix as the ceph charm has been removed from support for a while now

Changed in charm-ceph:
status: Triaged → Won't Fix
Changed in charm-ceph-osd:
status: New → Triaged
Changed in charm-ceph-mon:
status: New → Triaged
Changed in charm-ceph-osd:
importance: Undecided → Wishlist
Changed in charm-ceph-mon:
importance: Undecided → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.