PostgreSQL dies with fatal semctl eror
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
postgresql-9.5 (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
PostgreSQL is dying with the following error:
2016-12-14 11:21:02 UTC [4672-1] postgres@postgres FATAL: semctl(4685831, 3, SETVAL, 0) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-2] LOG: server process (PID 4672) exited with exit code 1
2016-12-14 11:21:02 UTC [3203-3] LOG: terminating any other active server processes
2016-12-14 11:21:02 UTC [3217-2] WARNING: terminating connection because of crash of another server process
2016-12-14 11:21:02 UTC [3217-3] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2016-12-14 11:21:02 UTC [3217-4] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2016-12-14 11:21:02 UTC [3203-4] LOG: all server processes terminated; reinitializing
2016-12-14 11:21:02 UTC [3203-5] LOG: could not remove shared memory segment "/PostgreSQL.
2016-12-14 11:21:02 UTC [3203-6] LOG: semctl(4489217, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-7] LOG: semctl(4521986, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-8] LOG: semctl(4554755, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-9] LOG: semctl(4587524, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-10] LOG: semctl(4620293, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-11] LOG: semctl(4653062, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-12] LOG: semctl(4685831, 0, IPC_RMID, ...) failed: Invalid argument
2016-12-14 11:21:02 UTC [3203-13] LOG: semctl(4718600, 0, IPC_RMID, ...) failed: Invalid argument
I have determined that this happens when the postgres user has a high numbered UID, and it only happens on xenial and yakkety (trusty is fine).
This is the procedure I used after spinning up a cloud image (but also verified on a normal install):
sudo apt-get update
sudo apt-get -y dist-upgrade
sudo shutdown -r now
sudo groupadd -g 99199 postgres
sudo useradd -u 99199 -c "PostgreSQL administrator,,," -d /var/lib/postgresql -g postgres -s /bin/bash postgres
sudo apt-get -y install postgresql autopostgresqlb
sudo crontab -e
06 11 * * * cd / && /etc/cron.
sudo less /var/log/
Obviously set the time for the cron entry so it happens as soon as possible. The manual group and user creation is to demonstrate the problem. In reality I had adduser.conf configured to put system accounts in a higher range, and the account was created normally by package installation.
When I repeat exactly the same procedure but manually set the UID and GID to 199 instead, it works fine.
Versions on xenial:
Linux pgtest 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
postgresql-9.5 9.5.5-0ubuntu0.
And on yakkety:
Linux pgtest 4.8.0-30-generic #32-Ubuntu SMP Fri Dec 2 03:43:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
postgresql-9.5 9.5.5-0ubuntu0.
And on trusty:
Linux pgtest 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
postgresql-9.3 9.3.15-
All patches applied as of today 2016-12-14.
I've tried to explore whether this is related to systemd (an obvious difference between trusty and the newer releases), or whether it's the PostgreSQL or kernel version, but haven't yet found any useful information.
Happy to assist in debugging.
Status changed to 'Confirmed' because the bug affects multiple users.