PostgreSQL dies with fatal semctl eror

Bug #1649877 reported by Tim Bishop
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
postgresql-9.5 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

PostgreSQL is dying with the following error:

2016-12-14 11:21:02 UTC [4672-1] postgres@postgres FATAL: semctl(4685831, 3, SETVAL, 0) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-2] LOG: server process (PID 4672) exited with exit code 1
2016-12-14 11:21:02 UTC [3203-3] LOG: terminating any other active server processes
2016-12-14 11:21:02 UTC [3217-2] WARNING: terminating connection because of crash of another server process
2016-12-14 11:21:02 UTC [3217-3] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2016-12-14 11:21:02 UTC [3217-4] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2016-12-14 11:21:02 UTC [3203-4] LOG: all server processes terminated; reinitializing
2016-12-14 11:21:02 UTC [3203-5] LOG: could not remove shared memory segment "/PostgreSQL.1769907787": No such file or directory
2016-12-14 11:21:02 UTC [3203-6] LOG: semctl(4489217, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-7] LOG: semctl(4521986, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-8] LOG: semctl(4554755, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-9] LOG: semctl(4587524, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-10] LOG: semctl(4620293, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-11] LOG: semctl(4653062, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-12] LOG: semctl(4685831, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-13] LOG: semctl(4718600, 0, IPC_RMID, ...) failed: Invalid argument

I have determined that this happens when the postgres user has a high numbered UID, and it only happens on xenial and yakkety (trusty is fine).

This is the procedure I used after spinning up a cloud image (but also verified on a normal install):

sudo apt-get update
sudo apt-get -y dist-upgrade
sudo shutdown -r now

sudo groupadd -g 99199 postgres
sudo useradd -u 99199 -c "PostgreSQL administrator,,," -d /var/lib/postgresql -g postgres -s /bin/bash postgres

sudo apt-get -y install postgresql autopostgresqlbackup

sudo crontab -e
06 11 * * * cd / && /etc/cron.daily/autopostgresqlbackup

sudo less /var/log/postgresql/*.log

Obviously set the time for the cron entry so it happens as soon as possible. The manual group and user creation is to demonstrate the problem. In reality I had adduser.conf configured to put system accounts in a higher range, and the account was created normally by package installation.

When I repeat exactly the same procedure but manually set the UID and GID to 199 instead, it works fine.

Versions on xenial:

Linux pgtest 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
postgresql-9.5 9.5.5-0ubuntu0.16.04

And on yakkety:

Linux pgtest 4.8.0-30-generic #32-Ubuntu SMP Fri Dec 2 03:43:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
postgresql-9.5 9.5.5-0ubuntu0.16.10

And on trusty:

Linux pgtest 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
postgresql-9.3 9.3.15-0ubuntu0.14.04

All patches applied as of today 2016-12-14.

I've tried to explore whether this is related to systemd (an obvious difference between trusty and the newer releases), or whether it's the PostgreSQL or kernel version, but haven't yet found any useful information.

Happy to assist in debugging.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in postgresql-9.5 (Ubuntu):
status: New → Confirmed
Revision history for this message
Andrea (maruscya) wrote :

I have the same issue on different servers. All postgres UIDs and GIDs are set as 3003. After the change of UID and GID to 199 as Tim Bishop suggest the errors gone away !

I made some tests changing the UIDs and GIDs.

UID GID Status
999 999 Postgres works
999 1000 Postgres works
1000 999 Postgres crash
1000 1000 Postgres crash

The problem seems related to UID > 999

The postgresql source code compiled on the same server (xenial) works perfectly with UID and GID 3003

Revision history for this message
Naresh G (kumargsvn) wrote :

I have the same issue on RHEL 7.2, and UID set to (id -u postgres)
2011

I have below backups configured -
pg_basebackup --xlog -R -P --gzip --format=t
pg_backup_rotated.sh

postgresql (9.4.5) instance restarts with below error during backup times -

2017-03-19 19:00:21 EDT PANIC: semop(id=152698887) failed: Invalid argument
2017-03-19 19:00:21 EDT PANIC: semop(id=152698887) failed: Invalid argument
2017-03-19 19:00:26 EDT FATAL: semctl(152666118, 6, SETVAL, 0) failed: Invalid argument
2017-03-19 19:00:26 EDT LOG: server process (PID 13090) exited with exit code 1
2017-03-19 19:00:26 EDT LOG: terminating any other active server processes
2017-03-19 19:00:26 EDT WARNING: terminating connection because of crash of another server process
2017-03-19 19:00:26 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-03-19 19:00:26 EDT HINT: In a moment you should be able to reconnect to the database and repeat your command.
2017-03-19 19:00:26 EDT WARNING: terminating connection because of crash of another server process
2017-03-19 19:00:26 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-03-19 19:00:26 EDT HINT: In a moment you should be able to reconnect to the database and repeat your command.
2017-03-19 19:00:26 EDT WARNING: terminating connection because of crash of another server process
2017-03-19 19:00:26 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

memory.

Can you tell me what is the exact root cause of this issue?
Is it something related to user UID?

Revision history for this message
Tim Bishop (tdb) wrote :

An update, years later. Still broken on 16.04, but working on 18.04.

Linux pgtest1604 4.4.0-131-generic #157-Ubuntu SMP Thu Jul 12 15:51:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
postgresql-9.5 9.5.13-0ubuntu0.16.04

Linux pgtest1804 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
postgresql-10 10.4-0ubuntu0.18.04

Disappointing to see this issue received no attention on 16.04, but I suppose there is at least a way forward now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.