oom_adj and -DLINUX_OOM_ADJ=0 should be used

Bug #854590 reported by Franck on 2011-09-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
postgresql-9.1 (Ubuntu)
Low
Unassigned
postgresql-common (Ubuntu)
Low
Martin Pitt

Bug Description

I just had a bad exerience with oom_killer killing the postmaster, and I am wondering if debian/ubuntu package should use the oom_adj trick to avoid this.

As I understand it, this would imply:
1) set oom_adj to -17 in the init script (to make the postmaster unkillable)
2) compile postgresql with -DLINUX_OOM_ADJ=0 cflag (to allow the children processes to be killed)

Does it make sense and could it be considered ?

Franck (alci) wrote :

Here is a link to a discussion on pgsql-hackers(at)postgresql(dot)org about the subject, in case that could help:

http://archives.postgresql.org/pgsql-hackers/2010-01/msg00170.php

description: updated
Franck (alci) wrote :

Also see 17.4.3. Linux Memory Overcommit at http://www.postgresql.org/docs/9.1/static/kernel-resources.html

Franck (alci) wrote :

Just a few more remarks:
- setting oom_adj should probably be an option (maybe in /etc/postgreqsql/xx/start.conf), but defaulting to -17, at least for the 9.x series would probably be what the average user wants
- probably for versions that cannot set their child processes oom_adj value, a default to 0 would be fine (but the user could easily set it to another value).

Martin Pitt (pitti) wrote :

This is certainly an interesting hinting towards the OOM killer. Adding a -common task as the init script lives there.

Changed in postgresql-9.1 (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Changed in postgresql-common (Ubuntu):
importance: Undecided → Low
status: New → Triaged
Martin Pitt (pitti) wrote :

I committed the -DLINUX_OOM_ADJ=0 part to postgresql-9.1 packaging bzr. Will look at the init script counterpart soon.

Changed in postgresql-9.1 (Ubuntu):
status: Triaged → Fix Committed
Changed in postgresql-common (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
Martin Pitt (pitti) wrote :

Committed the pg_ctlcluster side and integration tests to p-common.

Changed in postgresql-common (Ubuntu):
status: Triaged → Fix Committed
Martin Pitt (pitti) wrote :

BTW, I set it to -16, to still be the "last against the wall". Making it unkillable entirely doesn't seem desirable to me, if there is a bug somewhere which keeps allocating memory in a loop, this could very easily lock you out of your server.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package postgresql-9.1 - 9.1.1-3

---------------
postgresql-9.1 (9.1.1-3) unstable; urgency=low

  * debian/rules: Build with LINUX_OOM_ADJ=0 on Linux, to allow the OOM killer
    to slay the backends when the postmaster gets marked as unkillable.
    (LP: #854590)

 -- Martin Pitt <email address hidden> Wed, 19 Oct 2011 09:43:13 +0200

Changed in postgresql-9.1 (Ubuntu):
status: Fix Committed → Fix Released
Franck (alci) wrote :

Thanks Martin for your awesome work and for your reactivity on this request. I love postgresql packaging on Debian/Ubuntu.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package postgresql-common - 125

---------------
postgresql-common (125) unstable; urgency=low

  * Add debian/backport-ppa: Script to generate and upload backport packages
    to my Ubuntu PPA. Only for personal use.
  * Add t/160_alternate_confroot.t: Test creation, operation, upgrading, and
    removal of clusters as user nobody using $PG_CLUSTER_CONF_ROOT. This
    reproduces LP#835630 and other bugs.
  * PgCommon.pm: If $PG_CLUSTER_CONF_ROOT is set, untaint it.
  * pg_upgradecluster: Don't hardcode /etc/postgresql/, use
    $PgCommon::confroot to respect $PG_CLUSTER_CONF_ROOT. (LP: #835630)
  * pg_upgradecluster: Add --logfile option to specify a custom log file for
    the upgraded cluster. Necessary if you want to run this on
    per-user clusters and can't write into /var/log/postgresql/.
  * pg_ctlcluster: When starting as root for >= 9.1, adjust the OOM killer
    protection to -16, so that the postmaster does not get OOM-killed so
    easily (as it appears to claim all the shared memory). 9.1.1-3 and later
    resets oomadj of child processes to 0, so that the client backends can
    still get OOM-killed. Add tests to t/020_create_sql_remove.t.
    (LP: #854590)
  * debian/control: Add Breaks: to postgresql-9.1 versios before 9.1.1-3, as
    they do not reset oomadj for child processes. This is a precaution to
    avoid running all the client backends with -16 as well.
  * Add t/170_extensions.t: Check that all shipped extensions install and
    remove.
  * Add t/180_ecpg.t: Check that ecpg works. In t/001_packages.t, check that
    libecpg-dev is installed.

 -- Martin Pitt <email address hidden> Thu, 20 Oct 2011 12:17:30 +0200

Changed in postgresql-common (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers